HANA and BW 7.30 - Part 2

tfxz · ‎06-23-2011

Over the past few months, I've presented on this topic to many customers and colleagues. As there seems to be such a high demand, I've decided to convert the underlying slide presentation into two blogs, with HANA and BW 7.30 - Part 1 focusing on the motivation, scenarios and use cases while the second looks at the combination of HANA and BW from a technical angle. Before I continue with the second blog please note that the usual disclaimer applies.

Overview HANA and BW 7.30 - Part 1

Overview Part 2

Evolving the In-Memory Footprint in SAP BW
RDBMS + BWA Becomes One HANA Box
In-Memory Planning
In-Memory DSO
Conclusion

Evolving the In-Memory Footprint in SAP BW

Let's first look back to 2003 when the in-memory efforts were started within BW and have led to BWA (aka HPBI, HPA, BIA) Pushing BI To A New Frontier. Understanding the evolution will make you understand the rationale also on future developments.

Figure 1 pictures a matrix listing various layers of BW in the rows and the recent releases in the columns. You can see that initial investments have been around those areas which were natural first targets, namely the SQL processing behind infocube based queries. With BWA 7.2 this could be enhanced to multiproviders and there is even de facto support of DSOs via the hybrid provider. See this document for more details on the BW 7.3 / BWA 7.2 combination. Looking beyond the latter, I will show some examples of the in-memory impact on the planning engine and the data warehousing layer below.

Figure 1: Evolving In-Memory Footprint in SAP BW

RDBMS + BWA Becomes One HANA Box

There is certain advantages of moving from a BW setup based on a classic RDBMS server complemented by BWA to a BW system sitting on top of a single HANA server that has the combined abilities. From a technical perspective, the single HANA instance removes the necessity to manage consistency across two servers (as with an RDBMS and a BWA). This is particularly interesting for planning (write-back) scenarios and simplifies matters a lot. From an admin perspective, it is more ambiguous: some BW customers like the separation that the RDBMS caters for the warehousing while the BWA for the querying load respectively. On the other hand, two servers, hardware installations, licenses need to be maintained.

In-Memory Planning

Let's turn to a category of features that goes beyond the traditional outstanding query performance advantages of HANA or BWA respectively, namely the evolution of BW-IP into in-memory.

Figure 2 shows a "Hello World" example for a planning operation: a set of cells are displayed to an end user who decides to increase one of the values from 250 to 300. What happens now when this change is submitted to the server? Remember that planning is typically done on an aggregated granularity (here: countries and years) while the data is much more detailed - in the example of figure 2: countries break down to branches, years to weeks. This means that changing a single value in the UI frequently translates into a large number of changes on the data level. This is called disaggregation and constitutes a frequent operation in a planning context.

In the traditional approach - meaning that processing is mostly done in the application server - the delta of the change (here: increase by 50) is calculated, then that delta is broken down to the actual data granularity (here: 52 weeks in 2011 and 500 branche in Germany) resulting in a potentially large number of values (here: 52 * 500 = 26000) that are then sent over a network to the RDBMS to be saved.

In the in-memory based approach, processing is pushed down to HANA, i.e. close to the data, by turning around steps 2 and 3, i.e. only a single value (here: 50) is sent to the DB engine accompanied by the disaggregation instruction and its associated parameters. Performance gains in that approach originate in the following:

reduced network traffic (1 value being sent to the DB rather than 26000)
operations in the DB engine are implemented directly on the data structures existing in that layer, i.e. it is not necessary to convert or cast
disaggregation can be highly parallelized

This example follows a very generic pattern that can be applied in many other areas too. Some colleagues use terms like data shipping(traditional approach) vs function shipping (in-memory approach).

Figure 2: Comparing the traditional vs the in-memory approach of a "Hello World" planning example.

In-Memory DSO

The pattern shown in the in-memory planning example, namely (a) to avoid sending huge amounts of data between application and DB servers, and (b) to implement performanc-critical operations directly on the engine-based data structures, can be applied to the BW data store object (DSO) too. To that end, it makes sense to recall how a DSO works (conceptually) and where performance becomes critical. Figure 3 shows how a DSO works:

There are 3 data containers:
- one for the current data - current image,
- one for the recently uploaded data - future image, and
- one for the result when the differences between the current and the future images are calculated - delta image, the result of data activation.
In traditional RDBMS, these 3 containers are represented as a table respectively.
There are 4 fundamental operations on a DSO:
- upload new data,
- query the current data,
- read the delta data,
- activate - see above.

In a traditional RDBMS environment, querying and data activation are performance critical. Now and as we know, querying is a fundamental strength of HANA and is not of concern anymore. However, data activation needs to be carefully considered: reconciling the current and the future images typically translates into moving lots of data from the DB to an application server where the data is matched and the delta gets calculated. The data activation is a clear candidate to be pushed down into the HANA engine.

Figure 3: Traditional DSO

The DSO has been natively implemented in HANA. Figure 4 basically indicates that it has been implemented as a black box that behaves as the traditional DSO shown in figure 3 and described above. The queryingand delta read operations become logical views on top of the black box. The upload operation is straightforward. The data activationhas been natively implemented and shows significant performance gains. As indicated, the sources of those gains are less data traffic and implementing it natively.

Figure 4: In-Memory DSO

Conclusion

This concludes this second part. Hopefully, we will be able to write about and discuss more details in the course of the next months. However, the examples above should provide a good flavour of what is possible and what can be expected.