Real-World Load Performance Results On BW-on-HANA
During the BW-on-HANA ramp-up many customers have evaluated copies of an existing BW (migrated to run on HANA) to that original BW (running on a conventional RDBMS which, in the remainder, is referred to as XDB). While many of them were already convinced on the query performance – frequently because they had already been using BWA – they mostly focused on the advantages that they could gain when loading data into BW-on-HANA. This led to the fact that many of them compared runtimes for process chains on BW-on-XDB to BW-on-HANA. I’ve scanned through many emais, PPTs, DOCs etc. that document those efforts. The result is an anonymised bar chart that documents what benefits have been achieved in those real-world scenarios and by the customers on their own.
To state it right at the beginning: this is NOT a performance benchmark in the traditional way with a clearly defined scenario, clearly defined tasks, software, hardware, costs etc. so that the results can theoretically be recreated by using all the information provided. We do not have all the information around those measurements. It is even likely that the comparisons are not perfect: for example, the BW-on-XDB and BW-on-HANA might not have run on perfectly comparable hardware. On the other hand, it is possible that process chains in the original systems have undergone years of tuning (on the respective XDB) while there was hardly any tuning done in the BW-on-HANA context. In summary, this is not perfect but it reflects reality, this is what happens in real customer projects, this is what the product achieves in the real world and this makes those numbers appealing to me.
The bar chart below shows the results of 40 process chains. Each bar shows the runtime of the respective process chain running in BW-on-HANA relative to the same process chain running on BW-on-XDB which is assumed to be 100. For example: process chain #6 shows a value of approx. 20. This means that the runtime of that process chain was only 20% of the original runtime in BW-on-XDB. Or in other words: it ran 100 / 20 = 5 times faster in the BW-on-HANA system. For better visualisation, the process chains were sorted so that the process chains with the most benefit are on the left and the ones with the least or no benefit to the right.
Process chains are load processes that consist of many intermediate steps like reading data, changing or checking data (traditionally in the ABAP server), deriving the delta load (i.e. the data activation in a data store object), writing that delta into one or more targets etc etc. A process chain constitutes an entity that is on a customer’s mind, that he manages, around which he defines SLAs. It describes the entire path from loading the data from a source system to making that data available (visible) to an end user in a cleansed, compliant and consistent way. As such, a process chain is much more tangible in the real world than academic analyses of fast running SQL statements like mass INSERT, UPDATE or DELETE operations.
What can be derived from those numbers?
- Most process chains run faster (blue bars below the red dashed line).
- There are instances that run slower (the three on the right hand side). The reasons here are sometimes non-optimal access of data in HANA, e.g. many individual fetches rather than a few mass SELECTs in (ABAP-based) customer coding.
- The median (process chain #21) is at approx. 40, i.e. 40% of the original runtime or 100 / 40 = 2.5 times faster.
- One comment to process chain #1: by chance, I know that this one has undergone some tuning by rewriting (customer) coding to be optimised for HANA. The result is tremendous as the new runtime is only 3% of the original one, i.e. 33 times faster! This is the other extreme of what has been commented under 2.
In total, the sheer number of loading scenarios that experience a benefit from the migration to HANA is encouraging. Most importantly, they are achieved by using standard tooling and by the customers themselves.
Loading data from an ECC environment into HANA in my view can still be a bottleneck imo. We know HANA will speed up query times as good as the BWA does, modelling will be easier by directly reporting on HANA optmised DSOs (or cubes) therefore reducing complexity and TCO, but will HANA really improve load times up to 10 times? Pumping data around in the EDW itself will be speeded up for sure, but extraction still takes place on a source system aswell right? If the source system is not scaled properly (or just holds an enormous amount of data without proper delta management) or is extensively used during the day when the extraction takes places or... well you name it.. it will not speed up things in HANA tremendously
I can imagine SLT scenario's might help, but selecting correct tables and rebuilding the transformations could be a massive job. I assume we can only truly take full advantage once the full ECC system is running on HANA aswell. Anxiously awaiting Q4..
I'm with you. That's why I've emphasized that these are measurements of process chains. The latter constitute complete loading scenarios rather than focusing on individual steps of the load process like extraction. You are right: getting the data out of a source system has nothing to do with HANA or not HANA; there won't be any difference. But having that data 1:1 in BW is not the end of the story either. That's why SLT can help to a certain extend to more easily manage the extraction and also to spread the load over time. However, if you need to harmonize that data with data from another source and/or enrich it by adding other fields and/or manage the data lifecycle via a DSO because there is no delta then you do this inside BW. And those (logical, not performance) problems don't go away even with SLT.