HANA Data Warehousing: The #HANADW
With this blog, I like to shed some light into the direction that SAP is taking towards a unified offering for building a data warehouse on top of HANA. The unofficial working title is the HANA DW. I’ve divided the blog into 3 sections, each addressing the most pressing questions that I’ve received from customers who have already seen flavours of this.
The Vision for Data Warehousing on HANA: the HANA DW.
As outlined in my blog Data Warehousing on HANA: Managed or Freestyle? BW or Native?, there are two approaches (preferences) for building a DW, not only on HANA but in general:
- SQL-based: Meaning that the DW architects use SQL as the main building paradigm which gives them a lot of freedom but also bears the risk that too much diversity jeopardises the lifecycle of the DW as it becomes increasingly complex to manage dependencies (e.g. impact of changes) and integration (e.g. same entities – like products, customers – represented in different ways, using different data types etc).
- Managed via best practices: Here, high-value building blocks (like data flows, transformations, hierarchies, BW’s DSOs, BW’s data requests but also naming conventions) are used to construct and manage the DW. This is a faster way as long as the building blocks serve the need. It gets cumbersome whenever there is a scenario that requires deviating from the standard path offered via the building blocks.
In recent years, BW-on-HANA has offered approach #2 being extended and combined with #1, the so called mixed case scenario. A tangible example is described here. Many customers have adopted such a mixed approach; in fact, it has become the mainstream for BW-on-HANA. The HANA DWtakes a similar direction but starts with #1 and complements with #2 which, in the end, yields the same result. It goes along the following notion:
- Start with a naked HANA DB that offers all sorts of SQL capabilities that you need. Fundamentally, you can now write your SQL code in Notepad, Emacs, VI etc, store that SQL code in files and execute them in HANA either manually or via generic tools like cron.
- Now, writing SQL code from scratch in a text editor is cumbersome, even if there is some syntax highlighting or automatic syntax completion. Most people acquire tools that allow them to graphically model / design / create stuff to generate the underlying SQL statements.
- Whichever method you use to get to the SQL statements, there will be the need to maintain them. Scenarios get extended or adjusted. This translates into changes on the SQL level. For purposes like auditing or simply for having the option to return to an earlier setup it is good practice to track the evolution (i.e. the changes) and to keep the versions of those (SQL or higher-level) artifacts. This is nothing else than in all kinds of programming environments and one can lend infrastructure from there like GIT. The latter and services related to it are (or will be) offered by the HANA platform. They constitute a repository.
There are two more tasks that the repository should support:
- managing the dependencies between the objects (e.g. a transformation using certain stored procedures who, in turn, use certain tables), and
- the release management of those (SQL or higher-level) artifacts, e.g. to allow them being developed and tested in one system w/o jeopardising the production system.
- Finally, there are certain recurring patterns of SQL: things that you need to do over and over again. Examples are tracking incoming data (e.g. via something like the data request in BW), how to derive data changes (like in a DSO), how to store hierarchies etc. Such “patterns” basically translate into higher-level (abstract) artifacts that are created and maintained at the abstraction level to then be translated into a series of SQL statements.
The HANA DW will support this process in the following way; figure 1 below visualises this:
- The HANA DB provides all the SQL functionality you need.
- The HANA platform will provide the development infrastructure, especially to support a repository and related services.
- Tooling on top will create either direct HANA SQL* or higher-level artifacts that translate into HANA SQL*.
- Those tools will keep their artifacts in the HANA repository, allowing to support the complete lifecycle incl. auditing, versioning, dependency analysis (especially also between artifacts maintained by different tools).
- Tools constitute optional added value that you can use but that you don’t have to use. Consider BW-on-HANA as such a tool too.
It is planned to bring the currently existing SAP products related to data warehousing into this HANA DW setup. This will allow SQL-based data warehousing (1.) enriched via higher-level / higher-purpose artifacts (2.). The second pillar in figure 2 describes that evolution. The third pillar indicates that tooling will evolve, potentially into a series of apps or services that can also manage a cloud-based DW.
Figure 1: The vision for the HANA DW.
Figure 2: Short-, mid- and long-term evolution of the HANA DW.
The Role of BW-on-HANA.
From the above, it should have become obvious that BW-on-HANA will form an important, but optional part of the HANA DW. If it is convenient for the purpose of the DW, then it should be used or added to a HANA DW. Another potential scenario is that an existing BW-on-HANA will gradually evolve into a HANA DW as it is complemented with other tooling in the fashion described above. The border line will be blurry. In any case, BW-on-HANA will extend and enhance its existing functionality enabling more and more direct SQL access + options and leveraging / interacting with the HANA repository. A stand-alone BW-on-HANA system, as it exists today, can be considered as a special instance of a HANA DW. It will continue to exist, evolve, excel. Anyone investing into BW-on-HANA today is on a safe track.
The Role of HANA Vora and Hadoop: the HANA Big DW.
Many customers are looking at ways to complement existing data warehouses with Hadoop. HANA Vora will play a pivotal role in combining the HANA and Hadoop platforms. Therefore, HANA Vora will allow to extend the HANA DW into a HANA Big DW (current working title). We will elaborate on that at a later stage.
* Please consider HANA SQL here as a placeholder comprising all sorts of more specialised languages and extensions like MDX, SQLscript, calc engine expressions etc.
Thank you for the insights.
Thank you Zurek for providing the direction from SAP.
You insightful info begets few questions:
HANA DW, as apparent from your details, is positioned to have all the Data Warehousing services and will support all aspects : from Modeling to Monitoring and more.
Therefore, investing in further innovations in a separate standalone BW on HANA seems to be driven only by the fact that SAP needs to support exisiting BW customers, which is a noble cause and is perfectly justified.
However, doesn't it makes sense to direct the efforts towards developing automated tools to migrate from standalone BW(HANA/non-HANA) to HANA DW. This way :
1. SAP doesn't have to ruffle the feathers for existing BW customers by offering a migration and
2. SAP can focus on one DW offering instead of two.
I am not questioning SAP's direction here, just suggesting a food for thought.
I recommend that you have a look at BW 7.5 (edition for HANA), the release that has been launched these days. Good sources are
If you look at the features, deliveries, roadmap then you will understand that BW is neither dead nor being replaced by the HANA DW but it is an important part of the HANA DW. That fact, its underlying motivation and reasoning has been explicitely stated and described in my blog. However, HANA DW is more than BW. A HANA DW can be built using BW but it can also be built w/o BW. Whether BW is used or not is a matter of convenience. There is bits and pieces of BW that you can build on your own but that are hard to build on your own and/or hard to be maintained on your own. But, of course, you are not forced to use BW.
Thank you Thomas - I always enjoy and appreciate your blogs.
One element that always seems to linger in the shadows is the physical instance on which the HANA EDW (independent whether it is BW or not) resides.
It is extremely difficult to recommend sharing the same HANA instance with either SOH -S/4 given concerns about HANA release level requirements/compatibility.
etc... I could go on and on but I won't
I know these items have been discussed in the past to some degree but I was wondering if you had any specific guidance/direction that could be shared that SAP will support moving forward on this topic. Again this is not strictly about operational reports since I think we have clear direction on that subject - it is on everything else in the DW that uses some of the operational data combined with other data whether it be internal/external and requires more complex processing.
I realize that the answer may not be a one size fits all and situational - however understanding what will/will not be supported technically from an SAP perspective is critical to a sound path forward.
This is all in the spirit of helping SAP customers rationalize how to simplify, reduce/removing redundancy, minimize costs and ultimately increase their analytic capabilities.
Look forward to your reply.
you are raising valid issues. I can only refer you to a number of initiatives that are set up in the general HANA context to make it more manageable to host a variety of applications on top of one instance: MDC, workload management etc. Some deliveries ship with HANA SPS11 - see Mike's blog - some are on the roadmap. And yes, having multiple apps on one instance and being able to share the data (rather than copying it) moves us closer to a logical DW. At the moment, it still has to be a trade-off between the benefits of sharing (the data via one HANA instance) and the drawbacks / constraints imposed on the applications that share the HANA instance.
Relational (pure SQL based) DWHs have been around since a few decades now. As well as tools which allow the intelligent, dynamic generation of SQL based on a user's choice of 'business objects' when querying those DWHs.
So no one could nowadays seriously suggest *hand-written* SQL as a means of retrieving data from such DWHs. The tools for intelligent SQL generation are there - even at SAP - and my question is: are there also plans to integrate those 'universe'-like querying-mechanisms into HANA? This would be by far the best option, modeling BO universes directly into HANA and using the patented 'to-be-built-into-HANA' query technology from BusinessObjects on top... 🙂
Thanks for the great article. I have a question though. Are there any courses/trainings provided by SAP that would cover the creation of SQL based DWH on top of HANA? There are plenty of possibilities to learn BW (and get certified), but I can't find any learning material related to HANA DWH.
Or maybe you (or someone else) could outline the main skills and technologies that a person has to posess an understand in order to prepare for the future of data warehousing with HANA?