One of the new features in DB2 version 9 which got a lot of attention is certainly ‘Row Compression’. While compression algorithms are in place already in DB2 before version 9 (for instance backup compression, value compression), row compression is unique for several reasons. But first let’s distinguish between value compression and row compression. Both methods are used to compress table data and therefore save space in the database. Value compression, available since version 8.1 of DB2, does so by introducing a new row format which saves space for certain data types if they are empty or SQL null. For more details about value compression and it’s applicability in SAP systems, have a look at OSS note 886231. Row compression on the other hand uses standard data compression algorithms to compress a row. This includes the creation of a ‘dictionary’ which defines symbols for particular groups of bytes which appear frequently in the table data.
To be more precise, row compression in DB2 uses a lossless compression technique with a static dictionary and works on a ‘per table’ basis. Static here means that the dictionary, once build, is fixed and will not change upon INSERTs or UPDATEs. The dictionary must exists before the table can be compressed. Creation of the dictionary and compressing the table is done in one step with an offline table REORG. To create the dictionary, the algorithm looks for repeating entries (or patterns) in rows. These entries may span over several adjacent columns. They are replaced by a symbol and the symbol is stored together with the repeating entries in the dictionary. During decompression (which is completely transparent to the database user) the DBMS replaces the symbols found in a row with the entries which are stored in the dictionary for this symbol. This way, a row gets shorter on disk and more rows will fit into one page. The dictionary itself is stored together with the table data and is not accessible from the outside. Upon access to a compressed table the dictionary is loaded into (DBHEAP-)memory for faster decompression. Dictionaries can not exceed the size of 150 KByte, they are usually about 75 KByte in size.
Here are a few other facts about row compression:
- As indexes are used for fast access to records index data will not be compressed. Also, long and LOB table data is not compressed.
- Compressed data is represented as it is in the buffer pool. In other words, the content of the buffer pool is also compressed if it is related to a compressed table. This allows for a more effective memory usage.
- As already mentioned, row compression works on a per table basis (it can be switched on for single tables). In case of a multi partition environment, it works on a ‘per table/per partition’ basis in the sense that every compressed table has their own dictionary on every partition.
How it works
Assuming you are an admin running a SAP system on DB2 for Linux, UNIX and Windows, let’s see what you have to do to use row compression. I’m using again a SAP system ‘Q1I’ here to demonstrate the steps. The first step is to identify tables which are suitable for row compression. Naturally, the largest tables of the system should be considered first. In my system Q1I I identified table ‘/SAPAPO/MATLOC’ as a large table:
db2 => SELECT data_object_l_size, data_object_p_size FROM TABLE(sysproc.admin_get_tab_info('SAPQ1I','/SAPAPO/MATLOC')) AS X DATA_OBJECT_L_SIZE DATA_OBJECT_P_SIZE -------------------- -------------------- 2075296 2075296
Like described in my first New Features in DB2 UDB V9 – Part 1 about new features in DB2 version 9 I used the table function ADMIN_GET_TAB_INFO to determine the size in KBytes of this table (there are no long- or LOB-columns in this table). 2075297 KBytes are 2026 MByte which means that some of you will laugh if I tell you that a 2 GByte table is ‘large’. Nevertheless let’s go on with this table in this example. To estimate the space savings which can be achieved by row compression, the DB2 command INSPECT can be used with the new option ROWCOMPESTIMATE:
db2 => INSPECT ROWCOMPESTIMATE TABLE NAME "/SAPAPO/MATLOC" SCHEMA sapq1i RESULTS KEEP matlocestimate DB20000I The INSPECT command completed successfully.
Because of the ‘RESULTS KEEP matlocestimate’ option I can find the output of this INSPECT run in the file matlocestimate in the db2dump directory. As my system Q1I has two partitions, the partition number (000) is added to the file name. To read the output it must be converted first with db2inspf:
db2inspf matlocestimate.000 matlocestimate.000.txt
Here is the result:
The two lines ‘Percentage of pages/bytes saved from compression’ tell us here that a space reduction by 87% is expected from row compression which is certainly a lot. (I can now admit that I selected this table as it constitutes a very good showcase for row compression :). Also, the complete dictionary size for the table (on this partition) is estimated with 56832 + 32768 = 89600 Bytes = 87,5 KByte. This table is identified as an rewarding object for row compression.
The next step is to switch on row compression. This is done with the new addition ‘COMPRESS YES’ for the ALTER TABLE statement. Note that this addition is also available for the CREATE TABLE statement. In both cases the table will not be compressed with this statement! Instead it is more that the specified table is now ‘eligible’ from compression. The field ‘COMPRESSION’ in the SYSCAT.TABLES view tells us if compression is switched on for a table or not.
db2 => SELECT compression FROM syscat.tables WHERE tabname = '/SAPAPO/MATLOC' COMPRESSION ----------- N 1 record(s) selected. db2 => ALTER TABLE "SAPQ1I"."/SAPAPO/MATLOC" COMPRESS YES DB20000I The SQL command completed successfully. db2 => SELECT compression FROM syscat.tables WHERE tabname = '/SAPAPO/MATLOC' COMPRESSION ----------- R 1 record(s) selected.
The COMPRESSION field in SYSCAT.TABLES can also contain a ‘V’ for value compression or an ‘B’ for both (value and row compression). To actually compress the table data and create the dictionary an offline table REORG is needed:
db2 => REORG TABLE "SAPQ1I"."/SAPAPO/MATLOC" USE PSAPTEMP16 RESETDICTIONARY DB20000I The REORG command completed successfully.
REORG has the two new options RESETDICTIONARY and KEEPDICTIONARY. So it is possible to do a reorganization without running the compress algorithm again by retaining an existing dictionary. For the exact meaning of these options please lookup REORG in the DB2 Command Reference.
What to expect
How successful was the compression? Let’s check:
db2 => SELECT dictionary_size, data_object_l_size, data_object_p_size FROM TABLE(sysproc.admin_get_tab_info('SAPQ1I','/SAPAPO/MATLOC')) AS X DICTIONARY_SIZE DATA_OBJECT_L_SIZE DATA_OBJECT_P_SIZE -------------------- -------------------- -------------------- 89600 273184 273184
With ADMIN_GET_TAB_INFO I can check the size of the dictionary. Exactly like estimated by INSPECT it’s size is 87.5 KByte. The table now occupies only (273184*100)/2075296 = 13.1% of the original space on disk so that indeed 100% – 13% = 87% of the disk space are saved.
Now you may think that every good thing has a downside. Do we have to pay for these space savings? It is very difficult to find a general answer for the costs associated with row compression. Obviously, we need more CPU time for decompression. If on the other hand your database is more I/O bound the savings from less I/O may (and in most cases will) outperform these additional CPU cycles. So there is no universal answer.
Otherwise we can examine how the various SQL statements are affected by row compression. Remember that the algorithm works with a static dictionary which is not changed by DML statements issued against a compressed table. From measurements conducted by our IBM colleagues here at SAP (thanks!) we found out that there is nearly no overhead for SELECTs and only a small overhead for INSERTs. UPDATEs to existing rows need a bit more attention. If the update changes the table data in a way that a compression symbol must be expanded to the original entries the row will to be larger after the update than before. As a result the row will not fit in its original location. If there is still free space left on the data page DB2 will do a page reorg to assemble the free space and fit the row back in. If there is not enough free space DB2 will need to create an overflow record (that means that only a pointer will be left on the page and the remainder of the row will be stored on another page). These additional operations will bring along some additional cost if they occur. You therefore may see a slight decrease in update performance compared to an uncompressed table. It very much depends on your actual data and your update frequency if you see a decrease in update performance.
Regarding the compression ratio, 87% like in this example is certainly a very good showcase. For the most tables which are suitable for row compression a compression rate around 70% would be more realistic.
Planned support for row compression in SAP systems
Row compression is a very useful feature and it will be supported by various SAP products.
- Starting today you can transparently enable row compression for any table in your SAP system through a manual DBA action. To find suitable candidates for row compression in your SAP system the new UDF ‘INSPECT_TABLE_ROWCOMPESTIMATE’ is available. Check SAP note 980067 to find out how this UDF can be used. The note also describes how to create DB2 scripts to enable row compression on multiple tables.
- SAP BI will fully support DB2 row compression starting with it’s next version. Row compression will become the default for BI fact tables. Support for existing versions of SAP BI/BW is also available soon, see SAP note 926919 for details. There will be more information about row compression in SAP BI posted on SDN soon.
- R3load has the new option ‘-loadprocedure fast COMPRESS’. Started with this option it loads part of the data, does a REORG to create the compression dictionary and then loads the rest of the data. This way the table does not grow up to it’s full size before it is compressed. For more details see OSS note 886231.
- The SAP DBA Cockpit (transaction ‘DBACOCKPIT’) will fully support row compression starting with the next release of SAP Netweaver. See below a sneak preview from one of our development systems. Note that the development is still ongoing here and there might be changes!
The ‘Single Table Analysis’ shows if a row and/or value compression is switched on for the table. The picture shows the state of our table after row compression took place.
On the new compression tab you can find all information regarding row compression. All the things I did above manually (including the INSPECT run) can be scheduled very easy via the DBA cockpit.