What’s the benefit of using transparent tables over cluster tables on Db2 for Linux, UNIX, and Windows?
The conversation starter for me on declustering was an ABAP statement (see below) in a reported incident where the customer observed bad system performance. Later I found very similar examples in SAP standard code of ERP 8 systems.
SELECT GJAHR BELNR, BUZEI DMBTR SHKZG INTO TABLE lt_bseg FROM BSEG FOR ALL ENTRIES IN lt_fae_bseg WHERE BUKRS = ls_bukrs AND BELNR = lt_fae_bseg-belnr AND GJAHR = lt_fae_bseg-gjahr AND ANLN1 = ls_anln1 AND ANLN2 = ls_anln2 AND HKONT = ls_hcont.
The first thing I noticed in the customer system was that the table BSEG was a cluster table. So I would like to recap what cluster tables are and what their strengths and weaknesses are.
Cluster tables are a special table type in the SAP data dictionary. The data for these tables is not stored in the tables directly, but in the corresponding table cluster. In these “table cluster” tables, the data is stored in a compressed format. Compression and decompression take place in the SAP application server.
Cluster tables are often large SAP application tables like BSEG, BSES, BSET, CDPOS. In transaction SE11 on the tab page Delivery and Maintenance, you can look up to which table cluster a cluster table is assigned. For example, data for cluster table BSEG is stored in table cluster RFBLG, and data for cluster table CDPOS is stored in table cluster CDCLS.
The cluster tables may be defined with a large number of columns. For example, table BSEG contains more than 300 columns in an ERP 8 system. The following screenshot shows only the first 10 columns:
The corresponding table cluster RFBLG has a much smaller number of columns. The following screenshot shows all the columns of table RFBLG:
You can see that both tables have similar key columns. Instead of column BUZEI in table BSEG, there‘s a column PAGENO in table RFBLG. The compressed data for the non-key columns of table BSEG is stored into column VARDATA in table RFBLG.
The idea of a cluster table is that the application often selects all functionally dependent data from a cluster table in one sweep. For example, the SAP kernel may translate an OPEN SQL SELECT * FROM BSEG statement where the key columns BUKRS, BELNR, and GJAHR are specified into an SQL statement on the table cluster RFBLG that selects all compressed data from table RFBLG. The typical form of these SQL statements on Db2 for Linux, UNIX, and Windows (DB6) is:
SELECT "MANDT", "BUKRS", "BELNR", "GJAHR", "PAGENO", "TIMESTMP", "PAGELG", "VARDATA" FROM "RFBLG" WHERE "MANDT" = ? AND "BUKRS" = ? AND "BELNR" = ? AND "GJAHR" = ? ORDER BY "MANDT", "BUKRS", "BELNR", "GJAHR", "PAGENO" WITH UR
Applications that follow this programming paradigm usually show good performance. The advantages of cluster tables in this case are:
- The SAP application server compresses the table data, and therefore the table cluster is much smaller compared to a transparent BSEG table that doesn’t use compression on database level. The compression rate using SAP compression in cluster tables can be very large, for example, factor 10 or more.
This was very important in the days when databases didn’t support data compression. And even today, the declustering of SAP tables on database platforms that don’t support compression can be a bad idea.
- If the application always selects all functionally dependent data in one sweep, all data can be transferred between the application server and database server in compressed format, thereby reducing the amount of transferred data.
Disadvantages of Cluster Tables
Apart from the advantages above, cluster tables have severe disadvantages when ABAP applications require more flexibility.
- Cluster table data can only be read using the SAP application server that decompresses the table cluster data. Reads with other SQL tools are not possible.
- Only transparent tables can be used in JOINS, Views, and CDS objects.
- Only on transparent tables, other predicates in the WHERE clause can be used for filtering on the database server.
- Only on transparent tables, additional indexes on non-key columns can be defined if needed.
- Even if not all columns are selected from a cluster table, all data from the table cluster needs to be transferred to the application server.
- It’s not possible to aggregate columns on a cluster table on database level.
Coming back to the customer code example above, the FOR ALL ENTRIES statement is a typical example where a cluster table cannot provide good performance:
- The SAP kernel is able to translate the FOR ALL ENTRIES statement on cluster table BSEG into a series of SELECT statements on RFBLG. The number of SELECT statements on RFBLG is determined by the number of rows in the FOR ALL ENTRIES table. Executing this series of SELECT statements requires a lot of network roundtrips between application server and database server.
- The FOR ALL ENTRIES statement requests only 5 columns from table BSEG. However, since BSEG is a cluster table, all cluster data needs to be transferred to the application server. This causes much more network traffic than needed.
- The WHERE clause of the FOR ALL ENTRIES statement contains columns that are not contained in the primary key of table BSEG. These filtering predicates cannot be applied to table RFBLG. Therefore, many records from table RFBLG need to be read from the database, but don’t contribute to the result set in the end. This would not be necessary on a transparent table BSEG.
This FOR ALL ENTRIES statement on cluster table BSEG is much slower compared to a transparent table BSEG. In tests with a FOR ALL ENTRIES table with only 1000 rows, the performance on a transparent table BSEG is already faster by a factor of 100 compared to a cluster table BSEG. For the test, we didn’t even create an additional secondary index on the transparent table BSEG. On a transparent table BSEG, the ABAP statement can be translated into SQL much more efficiently. For example using the FDA transformation, only one SQL statement is necessary. All predicates in the WHERE clause of the ABAP statement can be evaluated on database level. No unnecessary data needs to be sent from the database to the application server.
Declustering of Cluster Tables
Starting with SAP BASIS 7.40, SAP applications have been more and more optimized for SAP HANA. To get a performance benefit of column-oriented tables in SAP HANA, it’s very important to select only the required columns instead of using a SELECT *. It’s also important to push down all aggregations to the database. This is only possible on transparent tables. On SAP HANA, the missing compression in table clusters is no problem since SAP HANA compresses data well in transparent tables. Therefore, no cluster tables exist after a migration to SAP HANA. They are automatically declustered during the migration process. Several SAP applications have been optimized to use the advantages of transparent tables. Some of them implicitly benefit from transparent tables with the same application code and some of them may use optimized ABAP code if they find a transparent table instead of a cluster table.
SAP systems on other databases like Db2 for Linux, UNIX, and Windows can also take advantage of declustering as of SAP Netweaver 7.4 SP03 and higher. The process and prerequisites are described in detail in SAP Note 2227432 . If you meet the prerequisites and perform a migration of your SAP system to Db2 for LUW, or if you use a homogeneous system copy of a Db2 for LUW system using R3load, by default all cluster tables are declustered similar to SAP HANA systems. It’s also possible to decluster tables in an existing Db2 for LUW system that still uses cluster tables.
The SAP cluster table technology has been very important in times when databases did not support compression of table data. The database size will not grow during table declustering if the database effectively supports data compression. Db2 for Linux, UNIX, and Windows is a good example where declustering works well, and Db2 adaptive compression can compensate for the lost SAP compression in the table clusters. Tests showed that after declustering, the overall database size doesn’t change significantly.
All in all, I’m a strong advocate of declustering tables in SAP systems running on Db2 for LUW with SAP BASIS release 7.4 SP03 and higher. If you’re still using cluster tables in such a system, you should make the effort of declustering. The customer example showed that SAP application developers are sometimes not aware of the cluster table performance restrictions and they might use ABAP statements that cannot provide good performance on cluster tables.
Have you declustered tables in a Db2 for LUW system? Please feel free to share your experience in the comments section.