The OLAP Compiler in #BWonHANA
This blog is about a functionality that I consider as one of the crowne jewels of BW-on-HANA. The approach has evolved over many years; early discussions started around the time when the BW accelerator (BWA) got initiated (around 2003, project Euclid) and were ignited by the fact that BWA (and its sequel HANA) provided a layer for processing multiple and sequenced calculations close to the data layer. This has not been possible before as we did not have control over the data processing layer of the standard RDBMS underneath the classic BW. The latter is restrained by SQL as the standard API. This blog – as a side effect – will show in what way SQL limits analytic processing, especially fast analytic processing. Also, it will become apparent that BW’s OLAP has converted into a kind of sophisticated compiler for HANA’s calculation engine. That combination will be hard to beat when you go beyond demo, “hello world style”, single table group-by and other simple analytic examples.
Let’s look at an example which looks as if it was of “hello world style” but which quickly reveals some challenges. In figure 1, a standard OLAP query result is displayed.
- It shows the quantities of sold items per product and country.
- In addition, the number of distinct customers who bought those products can be seen.
- Finally, the quantity relative to the overall number of sold products in a country are presented as percentages.
Figure 1: Example of a result of an OLAP query
Some Challenges in the Example
Now when you carefully look at the example of figure 1 then you see some of the challenges:
- The numbers of distinct customers do not sum up. There are 5 distinct customers buying pencils and 3 buying paper, both in Germany (DE), but only 6 – and not 5+3=8 – buying products in DE. There must be 2 customers that have bought both, pencils and paper. In processing terms this means that the subtotal (e.g. by country) cannot be calculated out of the preceeding aggregation level (e.g. by country and product) but needs to be calculated from the lowest granularity (i.e. by country, product, customer).
- The calculated key figure quantity per country refers to the key figure quantity and sets the latter’s values in relation to its subtotals. This means that the quantity key figure and its subtotals has to be calculated prior to calculating key figure quantity per country. This means there is an order of processing imposed by mathematics.
Figure 2: Challenges in the example
What you can do with SQL
In order to calculate the result of figure 1, classic BW (or SQL-based OLAP tools in general) would issue a SQL statement that retrieves a rowset (!) similar to the one shown in figure 3. That rowset constitutes the base set of data from which the result of figure 1 can be calculated. It is not the final result yet but from that data it is possible to calculate the final result as indicated in figure 1. Don’t get fooled by the small amount of data shown in figure 3! As you can see, it is necessary to get the details on the customers in order to calculate the distinct customers per group-by level. In real world scenarios – just imagine retailers, mobile phone or utility companies – the customer dimension can be huge, i.e. holding millions of entries. Consequently and caused by real-world combinatorics, the result of the SQL query in figure 3 is likely to be huge in such cases. That “sub-result” needs to be further processed, traditionally in an application server, e.g. BW’s ABAP server or the WebIntelligence server. This implies that huge amounts of data have to be transported from the DB server to such an application server or a client tool.
Figure 3: Rowset retrieved by a SQL query to calculate result of figure 1
By the way: BWA 7.0 accelerated exactly this part of OLAP query processing, i.e. the basic data query. Subsequent processing on top has still been executed in the application server. This is not a big issue as long as the OLAP query is dominated by the SQL processing. However, it comes short – as in this example – when the result of the basic SQL query gets huge and requires significant data transport from the DB to the application server and then significant data traversals to calculate the final result.
The “OLAP Calculation Graph”
Now based on the result shown in figure 3 there is a natural sequence of how to calculate the various formulas (behind the calculated key figures) and the various group aggregations (i.e. the subtotals and totals). Many of those subsequent calculations can be considered as SQL queries on top of figure 3’s result. Figures 4a and 4b show the resulting dependency graph: LQ is the label for the query of figure 3; L1, L2, …, L6 are “queries” or calculations on top. BW’s OLAP compiler basically derives that graph and sends it down to HANA’s optimizer (using a proprietary interface), HANA optimizes and processes that graph and sends back the result. Please beware that the result is not a rowset, at least not in the normal sense. It is heterogeneous sets of rows that are returned and that need to go into the appropriate result line in figure 1. In short: the compiler creates a graph to be sent to HANA and then there is a postprocessing step that receives the result and converts it to the desired result set of the OLAP query (i.e. a cellset as in fig. 1).
Figure 4a: Graph derived for processing the final result (as in fig. 1) from the data in fig. 3
Figure 4b: Structure of the OLAP Calculation Graph
I think there is a few fundamental things that become apparent even by looking at the simple example discussed in this blog:
- Even though individual processing steps can be expressed via SQL, it is in the end a well defined sequence of processing steps that yield the result.
- Accelerating the individual steps helps but falls short. For example, an OLAP client tool can derive an OLAP graph like the one in fig. 4a. One alternative is that it issues for each node in that graph a SQL or SQL-like query. To avoid the data transport, it can materialize intermediate results. However, this constitutes an overhead. As a second alternative (the one frequently met in practice), it is possible to issue only the basic query – labeled “LQ” in fig. 4 – and transport a potentially huge result set over the network to the client in order to calculate the rest on the client level. This is the traditional approach which obviously also suffers from the transportation overhead.
- BW-on-HANA resolves those issues by:
- providing a powerful option to define an OLAP query – i.e. the BEx query – this is a precondition to allow all of that in the first place, sending down the entire “processing graph” to HANA and allowing HANA to optimize and pipeline the individual processing steps, and
- having the capability to assemble the partial results of the processing steps into the final (OLAP) result.
Let me emphasize that I’m the mere author of this blog. The real credits go to the bright minds behind this approach who sit in BW’s OLAP and data manager teams.
This blog is cross-published here. You can follow me on Twitter via @tfxz.
Good stuff. Thanks for putting this together - Rama
Fantastic explanation of how OLAP processing works. Excellent work!
what's wrong with "single table group-by"? also, not sure why i can't see the figures you have been referring to (in IE 8). otherwise, i can't imagine reporting without some kind of OLAP/GROUPBY processing. the faster the better, esp when a spreadsheet tool can be the client.
ad pictures: they work fine for me on IE9, Firefox, Safari, at SAP and at home. Have a try here to check whether you can see them.
Ad "single table group-by": here, I refer to the demos that show simple SQL statements doing a GROUP BY on one huge table (i.e. no joins). HANA is very good at this and from a pure technical perspective such a query still shows its strengths. However, this is more of an artificial expample; in real OLAP, this is not a frequent case. The emphasis is on the single table rather than questioning the use of GROUP BY. Of course, OLAP naturally aggregates data thus uses GROUP BY. But the naked truth is that the GROUP BY cannot always be pushed down. Other calculations need to happen beforehand (e.g. currency conversion).
I hope this helps.
FInally some tangible stuff. Refreshing considering all the marketing around HANA.
I find myself referencing this blog over and over again to showcase one of the hidden advantages of BW on HANA vs. BW on [insert any other RDBMS here].
I do have one (three) question(s): how mature is this in the currents releases (BW 7.3 SP8/7.31 SP6 & HANA SP5)? How much more can we expect in HANA SP6? And what about BW 7.4, will it improve even more on this regard?
so this is already in BW 7.30 SP5 (on HANA; or even earlier in BW with BWA). It gradually evolves over the SPs and will receive a further boosts with BW 7.4 and HANA SP6. Fundamentally, BW's Analytic Manager works along the OLAP calc graph as outlined above. As HANA's calc engine evolves, more and more of the nodes in that graph can be processed by HANA's calc engine rather than in the application server.
Actually, there is a great and simple demo on this. I'll check if that's available as a video and post the URL.
I'd love that demo. 😉
Correction: I love that demo. 😀
A very interesting article. I have a question. Let's say we model the data in both BW on HANA and native HANA. We then have BO (eg WEBI or AOLAP) calling a Bex Query on BW on HANA, and we have a similar BO report directly on say a calculation view in Native HANA.
1) If BO pulls from A Bex query (BICS connection) - I presume we still benefit from BEx generating the OLAP Graph?
2) For the native HANA scenario - I presume the calculation view would generate a similar graph?
3) Would you expect similar performance from BO on native HANA to the Bex BW on HANA?
So it's not so much comparing BW on HANA vs Native HANA. It more about comparing Bex vs eg BO MDX pulling from BW on HANA.
I'm writing a paper to position BW on HANA and native HANA, I just want to be clear if you are saying BW on HANA for complex queries is better performing than native HANA modelled in Studio? I don't think you are, my understanding is that you are saying Bex is better integrated with HANA than say an MDX connection, but presumably the views in Studio will also give the same integration and benefit in a native Hana scenario?
1) Yes, if you built a BO Universe on a BEx Query then you go BO -> BW OLAP -> HANA which gives you the same benefit. There will naturally be some overhead in transferring from BW -> BO, depending on the data volume.
2) That's hard to say. From what I understand of the BW connector, they probably turn into the same calculation plan operators inside the HANA DB engine (Tom will confirm this I'm sure). I doubt though that the exact same calc plan operator set is generated for BW and for a Calc View - they will probably be different but give the same result. Because BW auto-generates the "graph", I would expect a custom-built and well-designed Calc View to be faster than the equivalent BW function.
3) Similar performance yes, but native HANA should in general perform better because it have fewer layers. In other cases, there may be features that the BW OLAP Engine efficiently creates HANA graphs for, that you don't know how to model efficiently in HANA. Or indeed where a calc plan operator exists that BW exploits where no equivalent Calc View functionality exists. In this case, BW could be faster.
The main benefit of BW on HANA over HANA Enterprise is that you get BW's loading and modeling capabilities, and data extractors, plus integrated master data, hierarchies etc. etc. Depending on your use case, this can add significant value but if you are building an EDW around SAP data or have an existing BW system, BWoH is a good choice.
In general, you can assume that HANA Enterprise will be faster than BW on HANA and it provides significant functionality around Text, Search, Predictive, Spatial and (soon) Graphing, only some of which is exposed in BW. Moreover if you are using non-SAP datasets and want real-time streaming data, BW on HANA Is often not the best choice.
One additional point - what I read to be the purpose of this blog - is that BWoH provides significant processing ability above what BWA can offer, other RDBMS's can offer currently, and above what the next generation of in-memory databases like DB2 BLU can offer for the BW data warehouse.
Specifically, I believe that the BLU support has been written as an extension to the DB2 OpenSQL connector and therefore it can only accelerate SQL queries. Therefore in the example below, it will materialize the dataset above into the BW OLAP Engine and then traverse that to get the result. This will mean that BW on HANA will be 300-500x faster than BW on DB2 BLU, for those scenarios.
ad 1) Yes, every tool that leverages a BEX query automatically leverage the OLAP graph generation. The benefit of doing that depends on how much logic is put into the BEX query definition. A simple (i.e. "SQL-like", not calc-rich) BEX query will yield little advantage.
ad 2) To a certain extent, you can hard-code that into some kind of view or calc scenario. But this is expert work. Which end user is going to do that? Which end user codes even SQL?
ad 3) This question is too generic: Analysis Office is "BO" and will likely perform better on BW for normal (= productive) queries while more SQL-oriented tools like Lumira might work better on native HANA. In the end, it's the nature of the tool and what kind of (analytic) functionality it exposes.
I had a question of my own, on reflection. I haven't tried to model this, but couldn't you model the example above using the ANSI-SQL-2006 GROUP BY GROUPING SETS?
This allows you to define multiple levels of aggregation, to generate multiple result-sets to efficiently process these sorts of questions.
This would be compatible with all modern RDBMS's.
just saw the question and thought I reply although you specifically asked for Thomas.
So, if you don't like what I write, just ignore it 🙂
With grouping sets you can ask for multiple different groupings at the same time. That's correct and many DBMS provide this feature already.
However, it's not possible to reference results from a group on a "higher" level in "lower" groups.
So, for the key figure calculation (L3 - L5) we wouldn't know the corresponding L1 values.
The provided example could be implemented with common table expressions (CTE) in standard SQL - but I'd say this would be rather complex already, while the example in itself is of low complexity for SAP BW reports.
The way the OLAP processor can leverage SAP HANAs calc-engine (which had been built for exactly this use case) is much more flexible here. We can reuse the same models, even when more/less attributes are requested and SAP HANA will always optimize on that and only work on the absolutely necessary data.
Hmm... in my opinion the information of blogs like this actually should be in a concepts-chapter of our product documentation. Unfortunately we (SAP) are really good in not putting information about our products where it should be (documentation). Instead it's always somewhere in blogs, wikis or SAP notes... 🙁
Hi John & Lars,
yes, Lars has already made an important point. In addition, there is one significant aspect that we have seen is tricky for customers modeling views in native HANA to then access the latter via SQL: the modeled view constitutes a specific drill-down. Typically, this is only the starting point for the analysis as there is subsequent slicing & dicing. If there are only simple / "nicely behaving" aggregations and formulas involved (like SUM, MIN, MAX) subsequent drill-downs show consistent results. However, if there are formulas with non-commutative or non-associative (mathematical) operators involved you get all sorts of funny effects. Just remember that there is typically layers of calculations: firrst currency conversion, then aggregation, then a formula, potentially more aggregation or reuse in another key figure's formula. BW OLAP takes care of that. If you do use a client tool that generates and issues SQL or MDX then you need to rely on that tool to handle such effects properly.
I hope this helps.
Actually the SQL standard was extended many years ago to support the sort of OLAP scenario you describe. So for example the following query would do what you want, efficiently in the database without exporting large rowsets for post-processing:
t.country, t.material, t.quantity, t.distinct_customers,
COALESCE(100 * t.base_quantity / SUM(t.base_quantity) OVER (PARTITION BY t.country), 100) AS quantity_per_country
SUM(quantity) AS quantity,
COUNT(DISTINCT customer_id) AS distinct_customers,
CASE WHEN GROUPING(material) = 0 THEN SUM(quantity) ELSE NULL END AS base_quantity
) AS t
t.country NULLS LAST, t.material NULLS LAST;
COUNTRY MATERIAL QUANTITY DISTINCT_CUSTOMERS QUANTITY_PER_COUNTRY
------- -------------------- ----------- ------------------ --------------------
DE Paper 5 3 33
DE Pencil 10 5 66
DE - 15 6 100
US Glue 11 5 61
US Pencil 7 3 38
US - 18 7 100
- - 33 11 100
Both DB2 and (I believe) Oracle have long supported these parts of the SQL standard.
SAP could easily have extended HANA's SQL support, and then used standard SQL to improve the performance for customers using DB2, Oracle or HANA. Of course the proprietary route benefits only HANA customers.
Jeremy, you're right that grouping sets had been with SQL since SQL99 (but the window function you used was introduced only with SQL:2003...)
However, the actual adoption of it to a point where it was actually usable on a large installation basis took much longer (as usual). Meanwhile TREX/BWA had been developed for SAP BW and eventually this technology was included in SAP HANA.
So, for a long time, the standard was there, but intangible to any user.
Having said that, If I remember correctly, grouping sets had been considered and tested to be used with other supported DBMS for SAP BW and the effect on performance at that point in time didn't rectify the additional implementation effort.
I guess it boils down to that every development does have a history and a long chain of decisions made on the way.
Also: nowadays (SPS 7) you can easily take your standard SQL and run it on SAP HANA as well, including the window function, grouping sets and automatic OLAP query optimization. 🙂
SAP BW ports are and always had been very platform specific (think indexing scheme, think fact table request compression, think parallel query processing... ). SAP HANA is not different here.
thanks for elaborating on this in SQL. I've done a similar exercise and written an equivalent MDX statement.
However, the point is not expressing it in a different language. The point is the way it is processed. Yes, if the (SQL) optimizer in the end comes up with a processing plan that is similar to the OLAP graph (fig. 4a, 4b) then we are fine. However, my/our experience is that SQL is so generic and that it is hard for those optimizers to recognise the "intent" behind the query. Probably most readers - incl. myself - have a hard time to understand the SQL statement that you have created. Remember that this is a simple example.
So, yes, in theory, the OLAP compiler could go via an intermediate step and generate SQL or MDX or another suitable statement to then expect an OLAP graph being generated from that. I think that makes the process more complex and error-prone in reality.
Hi Lars and Thomas, thank you for your replies.
Envisage an OLAP tool that maximizes the use of standard SQL and which translates user queries into single, standard OLAP SQL statements whenever the underlying DBMS technology supports it. Such a tool would of course eliminate the need for Thomas or any other user to write SQL directly. Only in two cases, either (1) where the DBMS technology does not support the standard SQL or (2) where the DBMS does not optimize for acceptable performance would the tool need to compensate by breaking the user query into multiple, simpler SQL statements.
This approach not only would have the benefit of working in a similar fashion for all supported DBMS products, but would also encourage all DBMS vendors to support the SQL enhancements for OLAP that came in about a decade ago (assuming those vendors want their products to do well in benchmarks).
Regarding Thomas's point about the DBMS not understanding the intent (which relates to case 2 above), I don't understand why this should necessitate the current proprietary approach. Once it has parsed the SQL to a query graph, the DBMS will anyway perform a similar set of inherently ordered operations as the OLAP compiler described.
look, as I said, there is theory and then there is reality. In reality, I've seen DB optimisers struggling over a number of releases recognising a star schema (and thus the underlying optimisation plan schemes) inside SQL queries. It has taken many years and many dev cycles for the optimisers to do a good job here. And this has been a fairly simple task in comparison to what you suggest and which clearly is a valid approach!
In contrast, the HANA calc scenarios generated BW are much more low level artifacts which are less error-prone from an optimisation perspective. This is my major point. Beyond that, it would also be necessary to consider examples beyond the simple one discussed in this blog.
Great work, thanks for sharing.
This is one of my favorite blogs and one that I often share with customers and consultants to explain one of the key features of BW on HANA.
I have a question. How do we know whether a function in BW on HANA will be accelerated using the OLAP Compiler? For instance, I have seen some customers where the data for certain types of aggregation is still materialized in the ABAP OLAP Engine, causing performance problems with BW on HANA.
Is there a set of best practices or guidelines for common reasons for this, and ways to remodel to ensure that the calculations are pushed down into the HANA DB?
Very good blog...appreciate the effort in putting this together in a simple & succinct manner
Thanks for sharing!
Enjoyed reading this interesting blog!