How and why to do Compression of Column Store Tables in SAP HANA
Note: In This document I am only taking example of Dictionary Compression.
The following figure shows where the data in Name column is stored when it is in uncompressed format:
In uncompressed format the data is stored without sorting. Also the data is repetitive is nature for duplicate entries. For Big tables the number of entries in column is also large and is in millions. In most cases the number in columns are not unique and in that case with the increase in the number of entries the repetitive store of same data also getting increased which result in High space utilization. In case of HANA since all the data is in RAM and also on Disk so this increase in size has dual impact on the resource required to store the data.
This is the reason SAP always recommend to use compression for all column store tables which are large in size.
The following figure shows how the data is stored in case of Dictionary Compression:
The actual Data is first sorted out and a value ID has been mapped against each unique value in the column. This information is saved in Dictionary and in actual entry we have only Value ID of the corresponding entry is stored (as shown by ‘Value ID array’ in the middle of the above screen). By doing this we can manage to protect our storage from getting extra utilized for the same data.
In most of the cases only Dictionary Compression is used.
The Compression are as follows which is self-explanatory if you see the figure below:
In this document I am not discussing any other compression.
To identify the Column store which needs compression we have different cases which are as follows:
Case 1: When The Distinct Value of a Table with Size more than 500MB is less than 5%. We need to use the SQL command given in the attached “RECORDS_DISTINCT_Percent_Object_Detail.txt” and execute on HANA Studio SQL Prompt whose output sample would be as follows:
Please see above the highlighted part where the Column store has been selected with option “MIN_RECORD_COUNT = 10000000, MIN_MEM_SIZE_MB = 500, MAX_DISTINCT_PCT = 5”.
Please look into the Distinct percent and Row count. Here we are selecting Column store with distinct value not more than 5% of the total data and also for which row count is quite high. This is make sure that when we compress such Column Stores then we have significant impact on data store required for such objects.
To minimize storage and memory requirement we need to execute the following command:
UPDATE “SAPP1B_AUDIT”.”ADS_EVENT_DETAIL” WITH PARAMETERS (‘OPTIMIZE_COMPRESSION’ = ‘FORCE’);
UPDATE “SAPP1W”.”/BIC/B0001116000″ WITH PARAMETERS (‘OPTIMIZE_COMPRESSION’ = ‘FORCE’);
UPDATE “SAPP1W”.”/BIC/B0001685000″ WITH PARAMETERS (‘OPTIMIZE_COMPRESSION’ = ‘FORCE’);
UPDATE “SAPP1W”.”/BIC/FZAD_C01″ WITH PARAMETERS (‘OPTIMIZE_COMPRESSION’ = ‘FORCE’);
Note: The ‘FORCE’ variant makes sure that a compression optimization is executed in any case.
Case 2: Large UDIV list where RECORDS_UDIV much greater than RECORDS
This case is valid where lots of record deletion is taking place. In uncompressed form after deletions invisible records may remain in tables. A merge can only remove invisible records at the end of the table, because it doesn’t change the record order. This will result in increased number of UDIV and thus causing high memory utilization for storing data.
Only with a compression optimization it is possible to change the table record order and remove deleted records within the table. A high amount of invisible records can be identified by checking for a UDIV record value being much larger than the normal record number. We consider compression of tables with RECORDS_UDIV more than twice that of RECORD and whose size is more than 500MB with minimum record count of 1000000000.
We can sue the SQL command given in the attached “RECORDS_UDIV_Value_Detail.txt” file to identify such objects with high RECORDS_UDIV compared to it’s corresponding value of RECORDS. Please see the output of the SQL command below:
If large tables with RECORDS_UDIV much greater than RECORDS exist, you can manually force a compression optimization through the following SQL command:
UPDATE “”.”” WITH PARAMETERS (‘OPTIMIZE_COMPRESSION’ = ‘FORCE’)