A closer look at tables that contain LOB data type columns
Based on SAP HANA SPS 10
Since SPS06, SAP HANA offers a flexibility around how it manages data type LOB (Large Object Types) columns, namely Hybrid LOBs. Data types, CLOB, NCLOB and BLOB are used to store a large amount of data such as text documents and images. The current maximum size for an LOB on SAP HANA is 2GB.
In the following blog, we will go through some examples of how Hybrid LOBS are defined, specifically around the memory threshold value and the effect that has on keeping the LOB column data on disk versus allowing it to be loaded into memory.
In-memory databases like SAP HANA, ideally you would only want to load data into memory that is actually required for processing. As SAP HANA will not compress LOB columns regardless of whether it resides in disk or in-memory, it is essential that any possible compression algorithm logic (e.g. gzip) are applied at the application layer on writing/reading from the database.
As it is expected that LOB column data will be more than likely stored on disk and temporarily loaded into memory for processing (depending on memory threshold value, 1000 bytes by default), it is worth doing specific analysis and sizing on your tables that contain LOB data columns to see what works best in your environment.
Please check the CREATE TABLE or ALTER TABLE commands for the available options when creating LOB columns.
The following guideline is key for the memory threshold value.
- If <memory_threshold_value> is not provided, a hybrid lob with memory threshold 1000 is created as default.
- If <memory_threshold_value> is provided and its LOB size is bigger than memory threshold value, LOB data is stored on disk.
- If <memory_threshold_value> is provided and its LOB size is equal or less than memory threshold value, LOB data is stored in memory.
- If <memory_threshold_value> is NULL, all LOB data is stored in memory.
- If <memory_threshold_value> is 0, all LOB data is stored in disk.
Create Table with LOB column
From the reference guide on data types, Data Types – SAP HANA SQL and System Views Reference – SAP Library
Let’s try it out, create a simple table with 3 columns
CREATE COLUMN TABLE "HOLLAS"."SH_TEST_XML" ( "XML_ID" VARBINARY(16) CS_RAW NOT NULL , "LINE_NO" NVARCHAR(10) DEFAULT '0000000000' NOT NULL , "XML_STRING" NCLOB, PRIMARY KEY ( "XML_ID", "LINE_NO"))
As we left the memory_threshold_value blank on the create, we expect the default to have the 1000 bytes value. Lets confirm..
(Aside, it wasn’t altogether obvious what table/column I would find the value for memory_threshold, but it’s always good to check table_columns or view_columns for column_name (in this case like ‘%MEM%THRES%’), to see what tables/views may have a column with this name).
I had a similar table with xml string values to hand, so for this example, I inserted 10K rows into our test xml table. The original XML string value had been truncated into 1kb row segments, with the last row in the group holding the overflow value as below.
So for our example, I wanted to unload the data, alter the threshold value to 0, reload the table and confirm that the LOB column data does not reside in memory.
unload "HOLLAS"."SH_TEST_XML"; alter table "HOLLAS"."SH_TEST_XML" alter ("XML_STRING" NCLOB MEMORY THRESHOLD 0); load "HOLLAS"."SH_TEST_XML" all;
Looking at the definition of the table post Load, the memory consumption level is as expected very small compared to overall size of the table, we presume here that the first two non-LOB data columns are loaded into memory. As we had set the memory threshold value to 0, this forced the LOB column rows of the table to move out of memory and stored on disk.
Now, let’s set it back to the default of 1000bytes, our expectation now is that the memory consumption here will only marginally increase, based on the fact that only 37 rows out of the 10k rows had a length < 1000.
Statement ‘Select x.*, length(xml_string) from “HOLLAS”.”SH_TEST_XML” as x where length(xml_string) < 1000’
successfully executed in 485 ms 503 µs (server processing time: 272 ms 18 µs)
Fetched 37 row(s) in 111 ms 384 µs (server processing time: 0 ms 299 µs)
unload "HOLLAS"."SH_TEST_XML"; alter table "HOLLAS"."SH_TEST_XML" alter ("XML_STRING" NCLOB MEMORY THRESHOLD 1000); load "HOLLAS"."SH_TEST_XML" all;
Refreshing the table definition, the memory consumption increase is marginal as expected.
Aside: One thing we’ve noticed with tables with LOB columns, the disk size space shown above is well above the initial size, it appears to increment after every alter command, as if it’s keeping a copy of the table before & after, anyway not entirely sure why that is showing up like that, but not going to spend time on it now, near the end of this blog 🙂
One more test, let’s set the threshold above the max value for the LOB column size, in this case 1024bytes.
unload "HOLLAS"."SH_TEST_XML"; alter table "HOLLAS"."SH_TEST_XML" alter ("XML_STRING" NCLOB MEMORY THRESHOLD 1024); load "HOLLAS"."SH_TEST_XML" all;
As expected, the entire table is now loaded into memory, as the table definition is reflected below.
Just a note on the hybrid LOB container values, the idea with the container or group of LOB column values, I believe it is possible to have a small LOB table be part of the same container, and get loaded in/out of memory at the same time. I was not able to prove this out, but would welcome some documentation links on how to do same.
You can see the container value information for the tables in M_TABLE_LOB_FILES.
What I like about Hybrid LOBs memory threshold approach.
- It gives you a flexibility to control whether the LOB columns are to be loaded into memory or not. In the case above, this was definitely based on character size, I did not explore if this comparison value changes say for a blob column type (e.g. to binary size).
What I didn’t like
- Although the documentation is as always excellent, there was very little discussion or examples give on the table with LOB columns. This is the main reason I put this blog together.
- Limiting by just a threshold value isn’t very flexible, it would be good if it was a % based on LRU type approach.
- To change the threshold value requires an alter command that in turn will lock the table. For a transaction table, this may require downtime, if it’s a large table (.e.g 100s of millions of rows), then it may take hours to change the parameter value. In the case where you increase the memory threshold, It will automatically load the data into memory, which may increase the length of time the alter statement will take to execute.