BW on HANA - Performance Comparison of Different E...

kevin_small · ‎08-04-2014

This article compares the performance of three different ways of doing a simple exception aggregation in a BW on HANA scenario. The goal is to see what design will give best performance for a BEx query that uses exception aggregation.

Introduction

A performance problem can be experienced in BW when a large amount of exception aggregation has to be done at query run-time. Before BW 7.3, exception aggregation happened on the application server during the OLAP part of query execution. It was not done on the database layer. This meant that a potentially large volume of data had to make the journey from the database to the application server. With BW 7.3 (and BWA 7.2), or with BW on HANA, it became possible to "push down" some of these exception aggregations to the database layer.

The performance benefit of pushing down these exception aggregations can be considerable. This push down is well documented (see chapter 4 of this document) and takes only a switch in RSRT to implement. In RSRT you make this setting:

By making this setting, the system will attempt to perform the exception aggregation in the database/BWA layer, but depending on query complexity this may not always be possible.

But could performance be improved further?

If we consider a mixed scenario in a BW on HANA landscape, then we have more options available to us, so perhaps the answer is yes. With HANA as the database there is the possibility to build custom HANA models and so push down the entire query calculation, leaving the query itself as more or less a "pass through". This is only possible if the query and the exception aggregation are fairly simple. As queries get more complex, BEx may be the better option, and in general you don't want to mix complex BEx with complex HANA models in one report. For the sake of this experiment imagine performance is the only driver and the query has only simple exception aggregation.

Data Flow for Test Queries

So let's consider three queries that can be used to give the same exception aggregation output and see which performs best as the number of records increases:

The different queries/test cases are named like this:

1) "Vanilla"

A standard BW cube with a BEx query on top. Exception aggregation is defined in the query as normal and executed in the OLAP layer at runtime. I've called this vanilla as it is the default situation.

2) "Exc.Aggr."

This is the same BW cube as case 1, with a different query on top. The query has it's RSRT setting for "Operations in BWA/HANA" = "6 Exception Aggregation". In this case, exception aggregation is still defined in the BEx query, but it is executed in the database layer (or on BWA if that were the scenario).

3) "Custom"

This uses a custom HANA model. The same BW cube is used as per cases 1 and 2, but here we use a system-generated HANA Analytic View on top of the cube. Above that, a custom Calculation View is used to do the equivalent of an exception aggregation, in fact all the required report calculation work is done here, not in the query. A Virtual Cube above that lets BW see the custom HANA model, and lastly a BEx query that does pretty much nothing sits on the very top.

Test Scenario

Let's consider a very simple case where the exception aggregation is used to count occurrences. Consider the scenario where you have measurements of man hours of effort associated with line items on a document. Perhaps these are measurements of effort on help-desk tickets, or efforts to process financial documents. Exception aggregation can be used to count documents and so you can get averages at a higher level, for example average man hours of effort per document across different company codes. Here is some sample data from the cube YYCUB02 (see data flow diagram above):

The above sample data is used to give the following report output, where exception aggregation is used to count the number of documents:

Generating Test Data

To generate the test data, a small data set was flat-filed into the cube, then a self-load was used to generate more random records based on the initial set. This self-load was then repeated to double up the volumes with each load, with measurements running from 1m to 100m rows.

Gathering Test Results

To gather the test results, each query was run with an increasing numbers of records in the cube. Performance measurements were taken from RSRT (RSRT -> Execute + Debug -> Display Statistics):

These raw performance measurements from RSRT were then grouped into DB and OLAP time using the groupings defined in help page Aggregation of Query Runtime Statistics. Since RSRT was used in all cases the Frontend time can be regarded as constant and was ignored.

Test Results

Comparing the 3 scenarios with increasing records produced these charts:

The Exc.Aggr. and Custom scenarios both perform much better than the Vanilla scenario, giving a 95% drop in runtime. We can zoom in to see how these two scenarios are made up, by separating their OLAP time and DB time:

The above shows that the OLAP time for both these scenarios is very low, as we'd expect since the query work is not really being done in the database layer. The difference lies in the DB time, and the Custom model outperforms the Exc.Aggr model.

Excursion into Big O Notation

If you've not come across it before, Big O Notation is a way to describe how a process or algorithmbehaves when one of it's inputs change. You'd expect a report to get slower as the number of records it has to process increases, but how much slower could it be? Big O Notation can be used to describe the broad categories of how much slower something gets.

This chart shows the main Big O curves (the below chart stolen from http://apelbaum.wordpress.com/2011/05/05/big-o/):

In the chart above, as the number of records increases on the x-axis, the runtime on the y-axis also changes. The names of each curve are the Big O Notations. Looking back at our test results the query execution times can be seen to form a straight line so we can say that are all O(n). It is true that the Vanilla scenario is a worse O(n) than the other scenarios, in that it's slope is steeper, but they are still all categorised as O(n).

Real World Examples of Big O

O(log n) - is what you'd see using an ABAP BINARY SEARCH, for example in a transformation. O(log n) is a good place to be if volumes get high.

O(n) - is what we see in our query runtime results.

O(n2) and worse - is what you'd see if there is nested loops in a transformation. You may be familiar with performance being fine in a development box with limited data, and suddenly in a test environment performance becomes very bad. A nested loop in a transformation can cause this O(n2) pattern.

When I first carried out these tests, my results looked like this:

This looked like the Vanilla case was showing O(log n) but that didn't make any sense! How could increasing records cause performance to stabalise? On further investigation this turned out to be an error in my design of the test. The random number generator was only generating values up to 7 digits, or 10 milion values. As the number of records ramped up to 100 million, the random generator was tending to repeat the document numbers rather than create new ones. The amount of processing done on the OLAP layer was then becoming more constant. Lesson learned - always sanity check the results!

Conclusion

Pushing down exception aggregation using the RSRT setting gives a huge improvement for virtually no effort. In this simple test case, a hand crafted custom HANA model did perform a little better, but that would need weighed against the additional effort to build and maintain.