This blog is about a functionality that I consider as one of the crowne jewels of BW-on-HANA. The approach has evolved over many years; early discussions started around the time when the BW accelerator (BWA) got initiated (around 2003, project Euclid) and were ignited by the fact that BWA (and its sequel HANA) provided a layer for processing multiple and sequenced calculations close to the data layer. This has not been possible before as we did not have control over the data processing layer of the standard RDBMS underneath the classic BW. The latter is restrained by SQL as the standard API. This blog – as a side effect – will show in what way SQL limits analytic processing, especially fast analytic processing. Also, it will become apparent that BW's OLAP has converted into a kind of sophisticated compiler for HANA's calculation engine. That combination will be hard to beat when you go beyond demo, "hello world style", single table group-by and other simple analytic examples.
Let's look at an example which looks as if it was of "hello world style" but which quickly reveals some challenges. In figure 1, a standard OLAP query result is displayed.
Figure 1: Example of a result of an OLAP query
Now when you carefully look at the example of figure 1 then you see some of the challenges:
Figure 2: Challenges in the example
In order to calculate the result of figure 1, classic BW (or SQL-based OLAP tools in general) would issue a SQL statement that retrieves a rowset (!) similar to the one shown in figure 3. That rowset constitutes the base set of data from which the result of figure 1 can be calculated. It is not the final result yet but from that data it is possible to calculate the final result as indicated in figure 1. Don't get fooled by the small amount of data shown in figure 3! As you can see, it is necessary to get the details on the customers in order to calculate the distinct customers per group-by level. In real world scenarios – just imagine retailers, mobile phone or utility companies – the customer dimension can be huge, i.e. holding millions of entries. Consequently and caused by real-world combinatorics, the result of the SQL query in figure 3 is likely to be huge in such cases. That "sub-result" needs to be further processed, traditionally in an application server, e.g. BW's ABAP server or the WebIntelligence server. This implies that huge amounts of data have to be transported from the DB server to such an application server or a client tool.
Figure 3: Rowset retrieved by a SQL query to calculate result of figure 1
By the way: BWA 7.0 accelerated exactly this part of OLAP query processing, i.e. the basic data query. Subsequent processing on top has still been executed in the application server. This is not a big issue as long as the OLAP query is dominated by the SQL processing. However, it comes short – as in this example – when the result of the basic SQL query gets huge and requires significant data transport from the DB to the application server and then significant data traversals to calculate the final result.
Now based on the result shown in figure 3 there is a natural sequence of how to calculate the various formulas (behind the calculated key figures) and the various group aggregations (i.e. the subtotals and totals). Many of those subsequent calculations can be considered as SQL queries on top of figure 3's result. Figures 4a and 4b show the resulting dependency graph: LQ is the label for the query of figure 3; L1, L2, ..., L6 are "queries" or calculations on top. BW's OLAP compiler basically derives that graph and sends it down to HANA's optimizer (using a proprietary interface), HANA optimizes and processes that graph and sends back the result. Please beware that the result is not a rowset, at least not in the normal sense. It is heterogeneous sets of rows that are returned and that need to go into the appropriate result line in figure 1. In short: the compiler creates a graph to be sent to HANA and then there is a postprocessing step that receives the result and converts it to the desired result set of the OLAP query (i.e. a cellset as in fig. 1).
Figure 4a: Graph derived for processing the final result (as in fig. 1) from the data in fig. 3
Figure 4b: Structure of the OLAP Calculation Graph
I think there is a few fundamental things that become apparent even by looking at the simple example discussed in this blog:
Let me emphasize that I'm the mere author of this blog. The real credits go to the bright minds behind this approach who sit in BW's OLAP and data manager teams.
This blog is cross-published here. You can follow me on Twitter via @tfxz.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
17 | |
14 | |
12 | |
10 | |
9 | |
8 | |
7 | |
7 | |
6 | |
6 |