Part 3: In this final part of the weblog serie, I will discuss the possible use of grouping sets and loaders in the context of NW BI. As in the case of materialized views (see part 2), both constitute attractive features that could be beneficial for NW BI in theory. However, performance benefits are either not existent or so minimal that it is currently not worthwhile to seriously consider them. But let’s look at some more details.
Grouping sets are attractive in the sense that multiple GROUP BYs can be done within one processing step. Data has to be accessed only once and the result of more detailed GROUP BY clauses can be used as input for the less detailed ones thereby saving considerable efforts, e.g. expensive sort operations. This is attractive when building up aggregates: essentially, they constitute materialized GROUP BYs. The aggregate hierarchy in the aggregate maintenance screen already shows how one aggregate can be built up from another aggregate and this is what happens in reality. However, it is done without using grouping sets but by using the order imposed by the aggregates’ hierarchy. The difference is that results are materialized (as an aggregate) before the next aggregate is processed. In the case of grouping sets all aggregates would be written in the end. So there is no real saving here.
Our experiments have shown exactly that effect: there have been no performance benefits. In some situations, grouping sets performed even worse as more data is handled in the processing context in comparison our approach (where results are materialized in between).
Still, grouping sets are interesting features. The successor release to NW 2004s will probably use grouping sets in a different context that I cannot disclose for now.
Every major RDBMS product provides a tool that allows bulk loads of data that bypass certain transactional properties (i.e. ACID). Performance benefits originate in bulk operations, direct access to physical storage and avoiding transactional overhead. So far so good …
The typical situation even for straightforward upload is that data is extracted, then transformed (i.e. normalized, e.g. via the ALPHA exit routine that asserts consistent alphanumerical keys in the master data) and finally loaded. Commonly this is summarized by the 3-letter abbreviation ETL. Loaders can be applied in the L-phase of ETL.
In practice however, the E- and the T-phases make the major share of the ETL process, even for simple and – from a consistency point-of-view – compulsory normalizations. We have seen approx. 20% shares of the L-phase in straight uploads. In other words: loaders could improve a marginal part in the upload. Naturally, one can argue that loaders can be combined with user-defined functions so normalizations like the ALPHA could be integrated. Well, experiments have shown that the loader’s performance then comes approx. down to same level as it is available today without the loaders. Another frequently cited argument is that many times the data is already normalized implying to skip that phase. This is true but then the responsibility for consistency is shifted to the person who makes that statement; the system cannot assert it anymore. One could now easily jump into discussing legal requirements, SOX, accounting etc. but this would go beyond the purpose of this blog. So in summary, it is fair to say that loaders are interesting but need to be able to incorporate at least simple transformations thereby maintaining a performance benefit over the standard procedure. Once that happens, we should go for it.
Hopefully the 3 weblogs have provided some insight. It is important to state that all the DB features that have been discussed are interesting without doubt. We have analyzed and evaluated them and will continue to do so. But the oversimplifying statement by some competitors, analysts or customers that they are ignored by SAP is clearly wrong.
Many thanks to my colleagues Stefan Dipper, Arndt Effern, Klaus Majenz and Uwe Knorr for their feedback. They have done all the analyses that I have mentioned in the 3 weblogs. Thanks also to Anja Heisrath for the results of her recent thesis on loaders and related features.