Over the past few months, I’ve presented on this topic to many customers and colleagues in and outside Walldorf. As there seems to be such a high demand, I’ve decided to convert the underlying slide presentation into two blogs, with the first focusing on the motivation, scenarios and use cases while HANA and BW 7.30 – Part 2 looks at the combination of HANA and BW from a technical angle. Before I start with the first blog please note that the usual disclaimer applies. Everything here has been announced at some SAP event – see The SAP Run Better Tour – BW Roadmap, for example. So I’m focusing on bringing pieces into context rather than revealing something that has not been known before.
Overview Part 1
Overview HANA and BW 7.30 – Part 2
For a start, let’s review the fundamentals behind in-memory computing. To that end, let’s have a look at the table in figure 1 that I’ve gratefully borrowed from Andy Bechtolsheim’s presentation at HPTS 2009. It shows what the semiconductor industry predicts on how the listed components will evolve – see the ITRS.
It is sufficient to look at the first two lines, the clock rateand the cores. Two things can be concluded from that:
- Moore’s law will continue to apply.
- However, it will be based on scaling the number of CPU cores rather than the CPU clock rate – with power efficiency being the main reason for this change.
The “however part” (B) is fundamental and carries a big mandate for the software industry, namely that parallelism will be key on those future CPU architectures. SAP’s response to this is what has been labeled in-memory computing. However, this term over-emphasizes the aspect of main memory and comes a bit short of some other aspects that are at the heart of the performance benefits achieved in this context. The logic goes along the following lines:
- parallelism: as seen in figure 1, supporting the multi-core architectures via software parallelism is key
- in-memory: a prerequisite for parallelism is to have the related data located close to the cores in local memory
- columnar data structures: this, in turn, is a prerequisite to fit data into main memory; the columnar approach is extremely I/O efficient and is an enabler for the next bullet
- compression: columnar data can be more efficiently compressed than row-based data due to a higher repetition of values and thus a higher potential to compress
- application-awareness: this is separate from the previous four technology arguments and comes down to building an engine tailored towards the SAP applications; the second blog will provide examples in the context of BW for this.
In my opinion, the last item is one of the most overlooked and undervalued in the current debate. Actually, it is something that many other companies already and successfully do, namely exploiting inherent properties of the underlying applications to relieve some of the traditional RDBMS constraints in order to build innovative data processing clusters, e.g. based on MySQL nodes or Hadoop. The CAP theorem is an instance of that; see here for a few examples implemented by Ebay. SAP’s BWA is another good example as it is tailored towards the BW schema.
SAP’s response to the imperative for a new software architecture is its In-Memory Computing Engine (IMCE; aka NewDB). I don’t want to engage into a deep essay on IMCE and think that – for simple purposes – you can look at IMCE as
- an evolution of BWA, albeit not tied to BW alone anymore,
- SAP’s implementation of an in-memory DB, tailored towards SAP applications,
- a full, stand-alone SQL database,
- an OLAP processor for MDX queries.
Now, HANA is the acronym for High Performance Analytical Appliance. Also, in a simplified (albeit not 100% technically correct) way, you can look at HANA as
- roughly: IMCE as an appliance
- however, it comprises more than just IMCE
- HANA is the term you likely hear in public
- for the remainder of this presentation: IMCE ≈ HANA (to avoid too much confusion)
The following pseudo equations originate from some joky internal discussions that we had but have proven to be helpful:
- Today: EDW = RDBMS + X
This means that an enterprise data warehouse (EDW) is not equal to a database system but requires a complement (here: X). Under Xyou can imagine code that is manually written or generated by tools, e.g.
- extraction programs
- DDL code (like CREATE TABLE statements)
- constraints, validation rules
- data transformations and harmonization
- process definitions, schedules and monitoring, failure handling (especially consistent restart)
- KPI definitions
- business semantic like rules on how to convert currencies or fiscal year definitions
- management of shared and private dimensions, including hierarchies
- defining and interpreting semantics on top of tables and columns, e.g.
- column X is the parent column of a parent-child hierarchy H associated to dimension D
- column Y is a unit key figure with the associated unit stored in column U
- column Z is an attribute of dimension memberswhose key is compound in columns A and B
- table T holds natural language descriptions for dimension member keys, whereby column L indicates the language and column C the description
- column P in table Q is a foreign key of members of dimension D; referential integrity is guaranteed (yes/no)
- time and calendar semantics, e.g. based on hierarchies like day – month – quarter – year, week – year
- table and data management like defining standards on how to store a dimension (tables and their respective layouts), how to index and/or partition those tables
- (meta data) lifecycle of models and tables, like versioning, changes including impact analysis and propagation, development / test / production setup
- (data) lifecycle: archiving and the underlying management of archives (what has been archived and what not, avoid overlapping data containers, etc.)
- security, especially modeling and management based on higher conceptual levels like dimensions, members, hierarchies
- logging, auditing and other compliance-related features
- etc etc etc
- Now: RDBMS ⇒ HANA
This indicates that traditional RDBMS technology gets overhauled by in-memory computing as implemented in HANA.
- Thus: (new) EDW = HANA + Y
Now, 1. and 2. get combined into 3. As HANA is not an exact 1:1 replacement of an RDBMS and as the constraints and “physical rules” of in-memory computing changes – especially the performance cost model – the software that sits on top (i.e. previously the X) needs to be adjusted to accommodate those new constraints and rules. This is indicated by moving from X in 1. to a Y in 3. Still, Y needs to address the same requirements as X but in a different way. Beyond that, there are even new and more opportunities given by the new constraints and rules, meaning that many more options are possible in Yin comparison to X. It is a paradigm shift similar to moving from analog to digital photography. Simply think of all the additional things that are possible with digital photography today!
BW will follow this transformation from X to Y by tailoring it towards HANA. First steps will be visible with the BW 7.3 enablement of HANA planned for end of 2011.
In summary, X addresses those requirements. It can be a bundle of generated code, meta data definitions, manually written programs etc. BW is an off-the-shelf instance of X.
From my experience, the slides shown in figures 2, 3 and 4 are extremely helpful as they trigger fruitful discussions with customers. Essentially, I’ve discussed figures 2 and 3 in my blog on The BW – HANA Relationship. Please note that there is no “best” scenario but that each of the scenarios in figure 2 over-emphasize a certain property at the expense of another one. So, there are trade-off decisions behind those scenarios. This confuses many people who would like SAP to give a simple answer. But, I guess, it’s like when you buy a car: you need to trade off various aspects for choosing the right model for your specific purposes.
Figure 2: HANA scenarios.
There have been many questions on the BWA-HANA relationship, e.g. whether there will be new releases of BWA, whether investsments into BWA would be safe etc. The basic plan is to enable HANA to play the role of a BWA in the future. In other words: in 2012 (plan!), it should be possible to buy a HANA box that can be set up and configured to run as an accelerator next to a BW like BWA did before.
This offers two options to bring HANA into an existing BW 7.3 landscape – note that release 7.3 is a prerequisite for running BW with HANA:
- “conservative approach” (the two small arrows in figure 4): you bring in HANA as an accelerator for your existing BW. That way, you gain confidence with HANA, learn to operate HANA and already see a large amount of benefits. For example, HANA has a calculation engine that has been improved in comparison to the one in BWA.
- “progressive approach” (the long arrow in figure 4): this translates into migrating the DBMS server underlying your BW system to HANA. BWA as accelerator becomes obsolete as HANA already incorporates the BWA calculation capabilities.
This concludes this first part. Hopefully, it has clarified what role HANA will play in a BW context. It should become obvious that there is a significant complement even though, and on a technical level, performance critical operators that are today implemented in the BW application stack are moved into the HANA engine. BW will eventually become a pure management software implementing a best practice approach that orchestrates the heavy data lifting inside HANA. HANA and BW 7.30 – Part 2 will describe some examples on what is possible.