Based on all the feedback which I received about DB2 9 I would say that row compression (now sometimes also called 'deep compression') is the feature which probably got the most attention. To compress table rows, DB2 replaces repeating patterns with shorter symbols and stores these symbols together with the patterns in a so called compression dictionary which is part of the table object. This results in shorter rows and considerable space savings especially for BI-like tables. See New Features in DB2 UDB V9 - Part 4 for more details on the basic mechanisms behind row compression.
Compressing a table with existing data is a two-step process in DB2 9. First, the compression flag of the table must be set to YES via ALTER TABLE...COMPRESS YES. Second, a compression dictionary needs to be created. The usual method to accomplish this is to perform an offline REORG of the table. Afterwards, all existing data is compressed and all future data will be compressed. The only problem is that during the offline REORG the table cannot be accessed by applications. As an alternative, the ROWCOMPESTIMATE option of the DB2 inspect utility can be used to create a compression dictionary. Using this method the table remains 'mostly' accessible (during insertion of the dictionary into the table object locks are required for a short time), but only rows which are going to be inserted after dictionary creation will be compressed. All existing rows remain uncompressed.
To simplify the process of compressing a table DB2 9.5 introduces 'Automatic Dictionary Creation' (ADC). The idea here is it to automate the second step in the scenario mentioned above so that an explicit dictionary creation (usually triggered by an administrator) is no longer necessary. ADC considers all tables where the compression flag is set to YES. If the size of the table reaches a specific threshold (usually around 1 to 2 MBytes) DB2 checks if there is enough data available to create a suitable compression dictionary. If not, the check is repeated again after the next threshold is reached. If enough data was found, a compression dictionary is created as a synchronous action to the operation which inserted the data. Usually automatic dictionary creation will be a result of an SQL INSERT statement, but ADC considers also LOAD and IMPORT operations and the redistribution of data across partitions in a DPF environment. Again, all data which is inserted into the table after the dictionary creation will be compressed. So ADC leaves a relatively small amount of data uncompressed and avoids offline table REORGs. To track details about the dictionary creation and about the compression of the table in general DB2 9.5 provides the new UDF ADMIN_GET_TAB_COMPRESS_INFO.
As I was curious about this new feature and about the time it takes to automatically create the compression dictionary (as I mentioned, this happens as a synchronous action!) I created a small test scenario and filled a typical table with more or less meaningful data which allowed for compression. Before running the INSERTs I set the table compression flag to YES and I also tracked the response time of each INSERT statement. What I found out is that in my scenario the automatic dictionary creation happened after the table reached 2 MBytes in size (as reported via UDF ADMIN_GET_TAB_INFO) and the dictionary creation itself took about 3.2 seconds (while a normal INSERT took about 2 ms). I repeated the test several times by dropping the table and recreating it, the dictionary creation always happened at the same point and took nearly the same time. An interruption of 3 seconds is certainly something the application (and the end user) can notice. On the other hand, this happens just once for every table with the compression flag set to YES.
Some of the useful data which is reported by ADMIN_GET_TAB_COMPRESS_INFO after the test finished looks like this:
The value 'TABLE GROWTH' in the 'DICT_BUILDER'-column informs us here that ADC took place as part of a SQL INSERT statement. For other valid values and a description of the other columns of ADMIN_GET_TAB_COMPRESS_INFO have a look at the documentation.
The result is of course not representative, the time it takes to create the dictionary depends certainly on a lot of facts like e.g. the data in the table, the underlying hardware and so on. But I guess it should give a hint on what to expect.
Another interesting question is if the time it takes to create the compression dictionary depends on the table size. If you migrate from an older release to DB2 9.5 you likely will have some tables which already contain a lot of records. Once you set the compression flag of these tables to YES a compression dictionary will be automatically created during the next INSERT operations. So I repeated the test above by creating a table where row compression was not enabled and filling this table in the first run with 100 MByte, in the second run with 500 MByte and finally with 1 GByte. Afterwards I set the compression flag to YES and did some more INSERTs. The test result showed that the creation of the compression dictionary always took nearly the same time (around 3.4 seconds in my scenario). This is because ADC uses only a few records at the beginning of the table to construct the compression dictionary. Therefore, the time it takes to create the dictionary does not depend on the table size, which is certainly desirable.
Furthermore, note that ADC is not a switch-on/switch-off feature. In DB2 9.5 all tables which have the compression flag set to YES are always eligible for automatic dictionary creation.
Chances are that you will have large tables in your systems which are not compressed yet. This will especially happen after migrating to DB2 9 or DB2 9.5. An offline table REORG - so far the only option to compress existing table content - is probably something which you will not or cannot do. First, applications cannot access the table during the REORG, which means basically that you have to bring down SAP before you start the operation. Second, a REORG of a large table also needs a lot of space, at least in the temporary tablespace.
Here is another idea how to compress existing data with the help of the 'Online Table Move'. The online table move is in fact based on a quite smart algorithm, and I hope it will go into the DB2 standard product soon. The original intention of this procedure is it to move a table to another tablespace in a way so that it remains accessible during the move operation. To accomplish this the online table move procedure does the following (very roughly):
Note that this is a very high-level and simplified description of what the online table move actually does, the details look much more complicated. Also notice that the online table move needs for a short time (usually only a few seconds) an exclusive lock on the original table - this is especially needed for the rename and drop operations at the end. But this should not harm applications which access the original table.
The complete process is based on the stored procedure SAPTOOLS.ONLINE_TABLE_MOVE. To make it usable within a SAP system, the DB6CONV report (which is used since many years for an 'offline table move') has been enhanced and serves now also as a GUI for ONLINE_TABLE_MOVE.
For more information about the online table move and a guideline how to install the stored procedure refer to SAP note 1039544. See SAP note 362325 for instructions regarding DB6CONV.
So far I described a scenario where a table is moved from one tablespace to another. Now the point is that you can use the online table move also to 'move' a table within a single tablespace - assuming there is enough free space. In the GUI shown above, just enter the tablespaces where the table is currently located as target tablespaces. Acting like this you can compress the contents of an existing table without moving it to another tablespace. The online table move is aware of the table compression flag and will perform in such an operation the following additional steps:
The advantage of this procedure certainly is that it requires no downtime and that the table remains accessible (nearly) all the time. As a tradeoff, the procedure will take longer (compared to an offline REORG) and you need enough free space (at least as much as the original table size) in the tablespace.
You can use the online table move in a similar way to enable larger record identifiers (LRIDs).
In the mean time several SAP tools and programs are enhanced to make the activation of row compression easier.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
13 | |
10 | |
9 | |
7 | |
6 | |
5 | |
5 | |
5 | |
5 | |
4 |