Part 1: Competitors, analysts and customers frequently confront SAP with a statement that NW BI (a.k.a. BW) could provide better performance if it simply employed some of the performance-enhancing features that have been added to the mainstream RDBMS software over the last few release cycles. They often point to materialized views (or summary tables), new index types (e.g. bitmaps or domain vectors), aggregation operators (like grouping sets, rollup), loaders, etc. In a serie of 3 weblogs I’d like to briefly tackle the argument cited above and proof it to be wrong. The weblogs will discuss the following 3 topics:
- Which are some of the DB-specific performance-enhancing features used in NW BI? (part 1)
- Materialized views vs. NW BI aggregates (part 2)
- Grouping sets and loaders (part 3)
So let’s dive into the techy part and look at some of the advanced DB features on which NW BI relies. Probably the most prominent one is the usage of Oracle’s bitmap indexing technology since BW 1.2 (based on Oracle 8.0). A typical infocube query processing plan employs 2 to 3 bitmap indexes that are merged in order to propagate the selectivity given by several filters in various dimension branches. Infocube queries that do not use bitmap indexes are likely to show poor performance and it has to be analysed why Oracle’s optimizer has not chosen to use them. For real-time infocubes using standard B-tree indexes on the F fact table (to avoid deadlocks due to the non-existing row-level locks for bitmap indexes) NW BI relies on Oracle’s B-tree-to-bitmap conversion, i.e. the conversion of standard B-tree indexes to bitmaps at query runtime.
Another good example is partitioning. NW BI uses partitioning for several purposes:
- partition pruning: E fact tables can be range partitioned along a time characteristic.The underlying assumption is that almost every analytic query has a filter on time. Typically, it does not make sense to analyse data without defining (i.e. restricting) the time scope. In query processing terms this means that a query optimizer can focus on the partitions that hold the data and indexes relevant for the query and discard (“prune”) all the others. By the way: there are plans to extend the same concept to data store objects.
- parallelization: Whenever table scans have to be performed on a partitioned table – independent of the type of partitioning, i.e. hash, range, list, round-robin etc. – the optimizer can use the partitioning setup to parallelize the table scans. This is, for example, useful when building up new aggregates or rolling up recently loaded requests into aggregates. Infocube compression can also benefit as well as query processing in case that partition pruning is not an option, e.g. due to a hash partitioned fact table.
- bulk deletion: If partitions are organized in such a way that data that might form a ‘unit of deletion’, i.e. data that is likely to be deleted together then an expensive DELETE command can be translated into dropping one or more partitions. Many hand-crafted data warehouses are built on that notion, e.g. the frequently cited data warehouse at Walmart.
In NW BI, a typical ‘unit of deletion’ is the data request. It constitutes data that has been loaded together, that originated in the same system or business process and that might be corrupted due to a failure in the communication to that source system, wrong conversions, faulty tracking or whatever can go wrong. In such a situation, the data is removed from a NW BI system and reloaded. Removing should be fast even if the data volume is huge. Therefore a fast deletion of a data request is an important operation. This is supported through the request-wise partitioning of an F fact table of an infocube.
Similarly, bundling data along the time dimension in order to move it into an archive requires fast removal of huge amounts of data. This is a second reason (beside the one for employing partition pruning) to partition data in the E fact table of an infocube along a time characteristic.
There is a bunch of other non-standard DB features used in NW BI. It would be too much to describe them all in the same detail as the ones listed above. Others worth to be mentioned comprise DB2’s multi-dimensional clustering (MDC) which is in many ways similar to range partitioning, Oracle’s MERGE instruction for improved performance within the infocube compression, bulk INSERTs during data loads, common table expression to reduce catalog congestion, avoiding unnecessary logging and many other things.
Hopefully this weblog has provided some insight. The next two will discuss some features that look attractive initially but that have proofed to be not viable … at least up to now.