Managing cold data in SAP HANA database memory
If you already had a chance to check out the new SAP Press e-bite: Data Aging for SAP Business Suite on SAP HANA (New SAP Press book: Data Aging for SAP Business Suite on SAP HANA) you may start wondering how cold data is managed in SAP HANA memory. After all sometimes access to historical data is justifiable.
So once you have the cold data moved away from the main memory you will notice that it is re-loaded whenever needed. But for how long will the cold data stay in memory so that the aging still makes sense? This blog explains how cold data is managed in memory and describes different strategies for keeping it all under control.
HANA database takes care of loading and unloading data to and from memory automatically with the aim to keep all relevant information in memory. In most cases only the necessary columns (i.e. columns that are actually used) are loaded into memory on the first request and kept there for later use. For example after system restart only a few columns might be initially loaded into memory and only for the hot partition as shown in the figure below.
You can monitor history of loads and unloads from system views M_CS_LOADS and M_CS_UNLOADS or Eclipse IDE • Administration • Performance • Load • Column Unloads.
SELECT * FROM “SYS”.”M_CS_LOADS” WHERE table_name = ‘BKPF’;
SELECT * FROM “SYS”.”M_CS_UNLOADS” WHERE table_name = ‘BKPF’;
1. Loading data into memory
The following reasons may cause a column table to be loaded into HANA memory:
- First time access to a table column (for example when executing SQL statement)
- Explicit loads triggered with SQL command LOAD
- Reload after startup for tables and/or columns defined with PRELOAD statement. You can check the status in “SYS”.”TABLES” and “SYS”.”TABLE_COLUMNS”.
- Pre-warming – reload after startup based on columns loaded before the system shutdown (configurable option – see SAP OSS Note 2127458).
As of SPS 09, load and pre-load do not consider cold partitions.
2. Paged partitions
Loads of data into memory happen column-wise. However, if paged attributes are used with partitioning, loads will happen in a more granular way i.e. page-wise. This means that only data that is in the area in which you are searching will be loaded.
In case partitions were not created with paged attribute (see e-bite: Data Aging for SAP Business Suite on SAP HANA) you can use report RDAAG_PARTITIONING_MIGRATION to change the loading behavior (see figure below). This can be done for a single table, Data Aging Object or the whole Data Aging Group. For more information check SAP OSS Note 1996342.
3. Un-loading data from memory
Unloads happen based on a “least recently used” (LRU) approach. In case of memory shortage, columns that have not been used for the longest period of time are unloaded first. You can also prioritize this behavior by using UNLOAD PRIORITY setting. However, unload priority can only be set for a whole table, without a possibility to distinguish between cold and hot partitions. It can be 0 ~ 9, where 0 means not-unloadable and 9 means earliest unload.
SELECT UNLOAD_PRIORITY FROM “SYS”.”TABLES” WHERE TABLE_NAME = ‘BKPF’
In M_CS_UNLOADS (REASON column) you can see a detail reason explaining why the unload happen:
- LOW MEMORY – happens automatically when memory becomes scarce (see SAP OSS Note 1993128).
- EXPLICIT – un-load triggered with SQL command UNLOAD.
- UNUSED RESOURCE – Automatic unloads when a column exceeds the configured unused retention period.
Too many unloads may indicate that memory requirements are exceeded. This will affect performance since tables need to be fetched again from disk on the next access request. Time spent on loading table columns during SQL statement preparation can be found in TOTAL_TABLE_LOAD_TIME_DURING_PREPARATION column in M_SQL_PLAN_CACHE system view. You can find more information on how to handle column store unloads in SAP OSS Note 1977207.
Even with enough resources in normal circumstances, it may happen that complex queries will create high volume intermediate results and thus lead to memory shortage. Once memory allocation limit has been reached on HANA host, the memory manager will start unloading data (cashes, buffers, columns) based on the “least recently used” approach. In such cases it would be desirable to have the cold partitions removed first. As per business definition they have the lowest priority. The following two chapters present such mechanism.
4. Unloading paged memory
Paged attribute for partitioning allows separating data of hot and cold partitions in a way that cold data is not loaded into memory if unnecessary. However once data records (columns) have been loaded into memory from cold partitions they will reside there until memory shortage. Then unloading is triggered according to LRU approach. As already mentioned it is not possible to set unloading priorities differently for each partition.
However, for cold partitions created with paged attribute it is possible to precede this mechanism by setting up limits for paged memory pool. There are two configuration parameters that have to be considered (see also SAP Note 2111649):
- PAGE_LOADABLE_COLUMNS_MIN_SIZE – in case of low memory paged resources will be unloaded first, before any other resources, as long as memory they occupy is bigger than the value set up for this parameter. Only once their size is below this limit, standard LRU mechanism will be triggered.
- PAGE_LOADABLE_COLUMNS_LIMIT – if paged resources exceeds this limit some of them will be unloaded according to LRU approach until the total size of paged memory is back again at min level.
Values should be specified in MB. Default value is 1047527424 MB which equals 999 TB. Current size of the paged memory can be found in the system view M_MEMORY_OBJECT_DISPOSITIONS as shown in figure below.
In the example below the parameters were set to small values for demonstration purposes (Eclipse IDE go to ADMINISTRATION • CONFIGURATION) as shown in figure below.
At this point executing statement SELECT * FROM bkpf WHERE _dataaging <> ‘00000000’ will cause the paged memory to grow above the allowed limits. Paged memory will be unloaded until minimum limit is reached as shown in figure below.
5. Data retention
In order to manage more selectively cold partitions in memory you can use Auto-Unload feature of SAP HANA. It allows unloading of tables or partitions from memory automatically after a defined unused retention period. Configuring a retention for unloads typically increases the risk of unnecessary unloads and loads. However, retention periods (unlike priorities) can be set on a partition level. Therefore their use for managing cold storage might be justifiable.
In the figure below all partitions (including cold) are loaded partially. Almost all columns from recently accessed partitions are loaded into memory. This includes hot and cold (2015) partitions. Only some key columns from 2013 and 2014 cold partitions are loaded.
To see current table setup and memory usage run the following statement in SQL console in Eclipse IDE as shown in figure below:
To set retention periods execute the SQL ALTER TABLE statement with UNUSED RETENTION PERIOD option. Retention periods are provided in seconds for the whole table or for each partition separated with commas. If multiple values are specified, the number of values must match the number of table partitions. The first partition which represents hot storage will have the retention periods set to 0. This means that the global default value from system configuration will be used for this partition (see below). When it comes to cold partitions let’s use some short time frames in the example scenario (5 min.):
ALTER TABLE “SAPERP”.”BKPF” WITH PARAMETERS(‘UNUSED_RETENTION_PERIOD’=’0, 300, 300, 300’);
Next step is to switch on the auto-unload function for HANA database. In Eclipse IDE go to ADMINISTRATION • CONFIGURATION • GLOBAL.INI • MEMORYOBJECTS and set the configuration parameters the following way (see figure below):
- UNUSED_RETENTION_PERIOD – number of seconds after which an unused object can be unloaded. Default value is 0 which means that auto-unload is switched off by default for all tables. Set the default to 31536000 secs (1 year).
- UNUSED_RETENTION_PERIOD_CHECK_INTERVAL – check frequency for objects (tables and partitions) exceeding the retention time. Default value is 7200 (every 2 hours). In the example let’s use short time frame of 600 seconds.
After the check interval has passed you may check again the memory status. The total memory used is now much lower for the fourth partition for the year 2015, for which all the columns/pages were unloaded as shown in figure below.
In addition in the column view you can see that only columns for the hot partition remain in the memory (figure below).
In the M_CS_UNLOADS view you will notice that new events were recorded with reason code UNUSED RESOURCE and only the columns from cold partitions of BKPF were moved away from memory (see figure below). Hot partition and other tables were not affected.
Auto-unload lets you manage cold areas of memory more actively however it also has a side effect. All the columns/pages from the cold partitions are removed from memory. After that even when accessing documents from the hot storage in SAP ERP it may happen that the key columns/pages from cold partitions will be reloaded from disk (see figure below). This may happen for example when trying to display a “hot” document in FB03 without specifying fiscal year. In this case SAP ERP will perform initial search for a full document key without restriction to hot storage only (see note 2053698). This may have negative impact on such queries executed for the first time after auto-unload.
6. SAP HANA Scale-out and table distribution
In case cold storage has already grown significantly over the years, it might be advisable to scale out current HANA system to more than one host. The main purpose of scaling out HANA system is to reduces delta merge processing time, balance the load on database and achieve higher parallelization level for queries execution. However, one of such hosts could also be dedicated for cold storage only. When tables are partitioned over several hosts the unloading mechanism is managed per host. This means that usage of cold data to any extent will not have impact on the performance of queries executed against hot data. Moreover, the cold storage node could be smaller in terms of allocated memory.
You can see the current table and partition distribution in the TABLE DISTRIBUTION editor. You can open it by right-clicking on CATALOG or SCHEMA in the SYSTEMS view in Eclipse IDE and choosing SHOW TABLE DISTRIBUTION – see figure below.
Before distributing partitions in SAP HANA database, first you need to switch off paged attributes property. The status needs to be reset to deactivated. You can do it with report RDAAG_PARTITIONING_MIGRATION – see section 2. Paged partitions for more information. SAP HANA provides several automatic redistribution and optimization methods that helps improving significantly the performance in a multi-node HANA cluster. To manually move cold partitions to a dedicated host run the following SQL statement:
ALTER TABLE “SAPERP”.”BKPF” MOVE PARTITION <part_id> TO ‘<host:port>’ [PHYSICAL];
Where <part_id> is the partition number and <host:port> is the location where the partition is to be moved. The partition will be moved if the target host has sufficient memory.
When moving tables without the PHYSICAL addition then only the link is moved to another host and not the whole table. The physical table is moved when merge process is triggered. You can see details on the current logical and physical locations in the system views:
- M_TABLE_LOCATIONS – shows logical location of a table/partition (1st figure below).
- M_TABLE_PERSISTENCE_LOCATIONS – shows physical data locations (persistence parts). This will include items from M_TABLE_LOCATIONS but also nodes that still contains some persistence of the table. This happens when tables is moved but not yet merged (2nd figure below).