Part 2: Competitors, analysts and customers frequently confront SAP with a statement that NW BI (a.k.a. BW) could provide better performance if it simply employed some of the performance-enhancing features that have been added to the mainstream RDBMS software over the last few release cycles. They often point to materialized views (or summary tables), new index types (e.g. bitmaps or domain vectors), aggregation operators (like grouping sets, rollup), loaders, etc. In this 2nd part of a serie of 3 weblogs I’d like to briefly discuss the frequently raised questions whether materialized view (I’ll use this popular term; this does not mean that what I say refers to Oracle only; it also applies to DB2’s automatic summary tables or SQL-server’s view indexes) would not be a good alternative or substitute to NW BI aggregates.
Well, I can tell you that we have evaluated materialized views with almost every new RDBMS release in order to check out the new promises with respect to improved materialized view maintenance or query rewrite technology. Unfortunately, experiments have shown mostly inferior, sometimes similar performance in comparison to NW BI’s aggregates. The fundamental reason behind this result is that aggregates are maintained and queried in a well known and organized context while materialized views have to be so generic that they can handle all theoretically possible situation. This in turn imposes overhead which can be avoided in the case of aggregates. I will try to give you a few examples.
Maintenance: Firstly, NW BI aggregates only materialize key figures values that are aggregated using functions SUM, MIN and MAX. All of them are associative which allows for a nice and simple delta management of aggregates. In other words: facts that have been recently loaded can be easily incorporated into the already materialized data set as maxnew = max(maxold, new value) or sumnew = sumold + new value applies. Just imagine that materialized views need to cater for averages, standard deviations, counts of distinct values and the like!
Secondly, NW BI provides a very sophisticated infrastructure to handle or identify deltas. In the context of infocubes (and their associated aggregates) deltas are handled through data requests (see part 1). They translate into the amenity that there are only INSERT operations, i.e. no UPDATEs. This in turn makes life in the delta management of aggregates so much easier: essentially it consists of a bulk INSERT of a data set provided by a SELECT statement. That’s all in most situations. This stands in contrast to materialized views whose underlying tables face all kinds of changes to their data, some of them triggering a complete recomputation of the materialized view.
Thirdly, typical mechanisms to maintain materialized views are based on change logs. Every table that is part of a materialized view suffers from a slower INSERT performance. We have observed deterioration factors of 2 and beyond, depending on the number of materialized views.
Querying: The part of a NW BI query that reaches the DB layer comprises basically a set of characteristics, a set of key figures and a set of filters. Using this kind of description it is fairly straightforward to pick the best existing aggregate to support the query. In contrast, query optimizers need to resolve and carefully analyze SQL statements to find candidates of materialized views that might be usable. In a second step, they have to decide which one is best and if that is better than simply using the basic tables. The latter decision depends on the usual bunch of parameters like existing indexes or statistics. Overall it is again a fairly complex context that is susceptible to fail every now and then. And even if the optimizer makes a great job: all our query performance measurements have shown at least the same performance with BW aggregates as compared to materialized views.
Looking back one can certainly argue that our aggregate technology was introduced when there were no materialized views available. If they had been there we might not have created our own stuff. But now that it is there, settled, stable and still at least as good – if not better – as the alternative it is not worth to change that. But maybe our RDBMS colleagues come up with something very savvy at one stage …