DB2 Row Compression Revisited

johannes_heinrich · ‎04-03-2008

Based on all the feedback which I received about DB2 9 I would say that row compression (now sometimes also called 'deep compression') is the feature which probably got the most attention. To compress table rows, DB2 replaces repeating patterns with shorter symbols and stores these symbols together with the patterns in a so called compression dictionary which is part of the table object. This results in shorter rows and considerable space savings especially for BI-like tables. See New Features in DB2 UDB V9 - Part 4 for more details on the basic mechanisms behind row compression.

Automatic Dictionary Creation (ADC) in DB2 9.5

Compressing a table with existing data is a two-step process in DB2 9. First, the compression flag of the table must be set to YES via ALTER TABLE...COMPRESS YES. Second, a compression dictionary needs to be created. The usual method to accomplish this is to perform an offline REORG of the table. Afterwards, all existing data is compressed and all future data will be compressed. The only problem is that during the offline REORG the table cannot be accessed by applications. As an alternative, the ROWCOMPESTIMATE option of the DB2 inspect utility can be used to create a compression dictionary. Using this method the table remains 'mostly' accessible (during insertion of the dictionary into the table object locks are required for a short time), but only rows which are going to be inserted after dictionary creation will be compressed. All existing rows remain uncompressed.

To simplify the process of compressing a table DB2 9.5 introduces 'Automatic Dictionary Creation' (ADC). The idea here is it to automate the second step in the scenario mentioned above so that an explicit dictionary creation (usually triggered by an administrator) is no longer necessary. ADC considers all tables where the compression flag is set to YES. If the size of the table reaches a specific threshold (usually around 1 to 2 MBytes) DB2 checks if there is enough data available to create a suitable compression dictionary. If not, the check is repeated again after the next threshold is reached. If enough data was found, a compression dictionary is created as a synchronous action to the operation which inserted the data. Usually automatic dictionary creation will be a result of an SQL INSERT statement, but ADC considers also LOAD and IMPORT operations and the redistribution of data across partitions in a DPF environment. Again, all data which is inserted into the table after the dictionary creation will be compressed. So ADC leaves a relatively small amount of data uncompressed and avoids offline table REORGs. To track details about the dictionary creation and about the compression of the table in general DB2 9.5 provides the new UDF ADMIN_GET_TAB_COMPRESS_INFO.

As I was curious about this new feature and about the time it takes to automatically create the compression dictionary (as I mentioned, this happens as a synchronous action!) I created a small test scenario and filled a typical table with more or less meaningful data which allowed for compression. Before running the INSERTs I set the table compression flag to YES and I also tracked the response time of each INSERT statement. What I found out is that in my scenario the automatic dictionary creation happened after the table reached 2 MBytes in size (as reported via UDF ADMIN_GET_TAB_INFO) and the dictionary creation itself took about 3.2 seconds (while a normal INSERT took about 2 ms). I repeated the test several times by dropping the table and recreating it, the dictionary creation always happened at the same point and took nearly the same time. An interruption of 3 seconds is certainly something the application (and the end user) can notice. On the other hand, this happens just once for every table with the compression flag set to YES.

Some of the useful data which is reported by ADMIN_GET_TAB_COMPRESS_INFO after the test finished looks like this:

The value 'TABLE GROWTH' in the 'DICT_BUILDER'-column informs us here that ADC took place as part of a SQL INSERT statement. For other valid values and a description of the other columns of ADMIN_GET_TAB_COMPRESS_INFO have a look at the documentation.

The result is of course not representative, the time it takes to create the dictionary depends certainly on a lot of facts like e.g. the data in the table, the underlying hardware and so on. But I guess it should give a hint on what to expect.

Another interesting question is if the time it takes to create the compression dictionary depends on the table size. If you migrate from an older release to DB2 9.5 you likely will have some tables which already contain a lot of records. Once you set the compression flag of these tables to YES a compression dictionary will be automatically created during the next INSERT operations. So I repeated the test above by creating a table where row compression was not enabled and filling this table in the first run with 100 MByte, in the second run with 500 MByte and finally with 1 GByte. Afterwards I set the compression flag to YES and did some more INSERTs. The test result showed that the creation of the compression dictionary always took nearly the same time (around 3.4 seconds in my scenario). This is because ADC uses only a few records at the beginning of the table to construct the compression dictionary. Therefore, the time it takes to create the dictionary does not depend on the table size, which is certainly desirable.

Furthermore, note that ADC is not a switch-on/switch-off feature. In DB2 9.5 all tables which have the compression flag set to YES are always eligible for automatic dictionary creation.

Compressing Tables without an Offline Table REORG

Chances are that you will have large tables in your systems which are not compressed yet. This will especially happen after migrating to DB2 9 or DB2 9.5. An offline table REORG - so far the only option to compress existing table content - is probably something which you will not or cannot do. First, applications cannot access the table during the REORG, which means basically that you have to bring down SAP before you start the operation. Second, a REORG of a large table also needs a lot of space, at least in the temporary tablespace.

Here is another idea how to compress existing data with the help of the 'Online Table Move'. The online table move is in fact based on a quite smart algorithm, and I hope it will go into the DB2 standard product soon. The original intention of this procedure is it to move a table to another tablespace in a way so that it remains accessible during the move operation. To accomplish this the online table move procedure does the following (very roughly):

A copy of the table which is going to be moved (the original table) is created under a different name in the target tablespace. I will refer to this copy of the original table as target table.
Triggers are created on the original table. They will record all changes to the original table (DELETE, INSERT and UPDATE operations) to so called staging tables. The staging tables are created before trigger creation.
The records of the original table are copied to the target table. This usually takes the longest time, but the original table remains accessible to applications. Changes are recorded to the staging tables.
If all records are copied to the target table the changes in the staging tables are replayed to the target table.
The triggers, the original table and the staging tables are dropped and the target table is renamed to the original table.

Note that this is a very high-level and simplified description of what the online table move actually does, the details look much more complicated. Also notice that the online table move needs for a short time (usually only a few seconds) an exclusive lock on the original table - this is especially needed for the rename and drop operations at the end. But this should not harm applications which access the original table.

The complete process is based on the stored procedure SAPTOOLS.ONLINE_TABLE_MOVE. To make it usable within a SAP system, the DB6CONV report (which is used since many years for an 'offline table move') has been enhanced and serves now also as a GUI for ONLINE_TABLE_MOVE.

For more information about the online table move and a guideline how to install the stored procedure refer to SAP note 1039544. See SAP note 362325 for instructions regarding DB6CONV.

So far I described a scenario where a table is moved from one tablespace to another. Now the point is that you can use the online table move also to 'move' a table within a single tablespace - assuming there is enough free space. In the GUI shown above, just enter the tablespaces where the table is currently located as target tablespaces. Acting like this you can compress the contents of an existing table without moving it to another tablespace. The online table move is aware of the table compression flag and will perform in such an operation the following additional steps:

If the original table has the compression flag set to YES the target table will be created first with the compression flag set to NO.
A representative sample of data of the original table will be loaded into the target table.
The compression flag of the target table will be set to YES and a compression dictionary is built.
The entries in the target table will be deleted. The compression dictionary will remain.
Now all records from the original table will be copied to the target table. As a result, all entries in the target table are compressed and records inserted in the future will be compressed.

The advantage of this procedure certainly is that it requires no downtime and that the table remains accessible (nearly) all the time. As a tradeoff, the procedure will take longer (compared to an offline REORG) and you need enough free space (at least as much as the original table size) in the tablespace.

You can use the online table move in a similar way to enable larger record identifiers (LRIDs).

More SAP Support for DB2 Row Compression

In the mean time several SAP tools and programs are enhanced to make the activation of row compression easier.

The R3load utility offers new options for loading data so that the table will be compressed afterwards. See SAP notes 905614 and 1058437 for details.
Newer SAP installations (since SAP NetWeaver 2004 SR3) offer an option so that all tables (with the exception of some volatile tables) will be compressed during data load.
SAP note 980067 contains a transport for an ABAP based compression tool. It is intended for older SAP releases and assists in identifying suitable candidates for row compression and in performing the compression. Also, the stored procedure INSPECT_TABLE_ROWCOMPESTIMATE and some SQL scripts which can be used on the DB2 command line to activate compression are available via this SAP note.
SAP transaction DBACOCKPIT is enhanced since SAP NetWeaver SP12 and offers now also support for finding suitable tables and activating row compression.