Lots of people think of those questions in terms of black/white, good/bad, legacy/new. As in many, many situations in software and real life, it usually is a trade-off decision. This blog attempts to clarify the differences in both approaches. Ideally, this leads to a less ideological and biased but factual discussion of the topic. Furthermore it should become apparent that the HANA EDW approach allows to work along both approaches within the same (HANA) system. So it is not the case to have – for example – one data warehouse system based on BW (managed approach) and a second, purely SQL-based (freestyle approach) data warehouse system on an RDBMS. Fig. 1 pictures the two different approaches.
On the left-hand side, jigsaw pieces represent various tools that are harmonised, meaning that their concepts / objects “know” each other and have the same lifecycle. For instance, when a data source(as an example of such an object) gets changed then the related data transformations (further examples of such objects), cubes, views, queries, … (i.e. other objects) can be automatically identified, in the best case even automatically, adjusted or at least brought to the attention of an administrator so that the necessary adjustments can be manually triggered.
Another example is BW’s request concept which manages consistency not only within the data warehousing (data management) layers but is also used in the analysis layer for consistent reporting. It’s a concept that spans many tools and processing units within BW.
The individual tools to build and maintain such a data warehouse need to understand and be aware of each other. They work of the same repository which allows one consistent view of the meta data that consitutes the organisation of the data warehouse. When one object is changed, all related objects can be easily identified, adjusted and those changes can be consistently bundled, e.g. to apply them in a production system.
Due to the dependencies, the tools are integrated within one toolset. Therefore, they cannot be replaced individually by best-of-breed tools. That removes some of the flexibility but at the benefit of integration. SAP BW is an example of such an integrated toolset.
On the right-hand side, you see similar jigsaw pieces. On purpose, they are more individually shaped and do not fit into each others slots from the very beginning. This represents the situation when a data warehouse is built on a naked RDBMS using best-of-breed tools, potentially from various vendors. Each tool only assumes the presence of the RDBMS and it’s capability to process (more or less standard) SQL. Each tool typically comes with its own world of concepts and objects that are then stored in a repository that is managed by that tool. Obviously, various tools are needed to build up a data warehouse, like an ETL tool for tapping into source systems, a programming environment – e.g. – for stored procedures that are used manage data flows and transformations, a tool that monitors data movements, a data modeling tool to build up analytic scenarios etc. Technically, many of those tools simply generate SQL or related code. Frequently, that generated code can be manually adjusted and optimized which provides a lot of freedom. Per se, the tools are not aware of each other. Thus their underlying objects are independent, meaning with independent lifecycles. Changes in one tool need to make it “somehow” to the related objects in the other tools. This “somehow” can be managed by writing code that connects the individual tools or by using a tool that spans the repositories of the individual tools. SAP’s Information Steward is an instance of that. This is pictured as “glue” in fig. 1.
The freedom to more easily pick a tool of your own choice and the options to manually intercept and manipulate SQL provide a lot of flexibility and room to optimise. On the other hand, it pushes a lot more responsibility to the designers or administrators of the data warehouse. It also adds the task of integrating the tools “somehow”. Beware that this is an additional task that adds revenue for an implementation partner.
SAP’s Product Portfolio and the two Approaches
As can be seen from this discussion, each approach has its merrits; there is no superior approach but each one emphasises certain aspects. This is why SAP offers tooling for both approaches. This is perceived as a redundancy is SAP’s portfolio if the existence and the merrits of the two approaches is not understood.
I frequently get the question whether BW will be rebuilt on SAP HANA. Actually, it is philosophical to a certain extent as BW evolves: e.g. today’s BW 7.4 on HANA has less ABAP code in comparison to BW 7.3 on HANA. This can be perceived as BW being rebuilt on HANA if you wish. However, what does not make sense [for SAP] is to kill the approach on the right-hand side of fig. 1 by integrating the “freestyle tools” into a second, highly integrated toolset which mimics the BW approach because that would simply remove the flexibility and freedom that the right-hand approach has. Fig. 2 pictures this.
What is true, however, is that HANA will see a number of Data Warehousing Services arising over time. They already do exist to some extent as they surged when BW was brought on to HANA. They can be generalised to be usable in a generic, non-BW case. Nearline storage (NLS), extended storage, HANA-based data store objects, a data distribution tool etc are all execellent examples for such services that can be used by BW but also by a freestyle approach.
Finally, I like to stress – again – the advantages of the HANA EDW approach that allows to arbitrarily combine the two approaches pictured in figure 1. You can watch examples (demos) of this here or here.