HANA Data Warehousing: The #HANADW
With this blog, I like to shed some light into the direction that SAP is taking towards a unified offering for building a data warehouse on top of HANA. The unofficial working title is the HANA DW. I’ve divided the blog into 3 sections, each addressing the most pressing questions that I’ve received from customers who have already seen flavours of this.
The Vision for Data Warehousing on HANA: the HANA DW.
As outlined in my blog Data Warehousing on HANA: Managed or Freestyle? BW or Native?, there are two approaches (preferences) for building a DW, not only on HANA but in general:
- SQL-based: Meaning that the DW architects use SQL as the main building paradigm which gives them a lot of freedom but also bears the risk that too much diversity jeopardises the lifecycle of the DW as it becomes increasingly complex to manage dependencies (e.g. impact of changes) and integration (e.g. same entities – like products, customers – represented in different ways, using different data types etc).
- Managed via best practices: Here, high-value building blocks (like data flows, transformations, hierarchies, BW’s DSOs, BW’s data requests but also naming conventions) are used to construct and manage the DW. This is a faster way as long as the building blocks serve the need. It gets cumbersome whenever there is a scenario that requires deviating from the standard path offered via the building blocks.
In recent years, BW-on-HANA has offered approach #2 being extended and combined with #1, the so called mixed case scenario. A tangible example is described here. Many customers have adopted such a mixed approach; in fact, it has become the mainstream for BW-on-HANA. The HANA DWtakes a similar direction but starts with #1 and complements with #2 which, in the end, yields the same result. It goes along the following notion:
- Start with a naked HANA DB that offers all sorts of SQL capabilities that you need. Fundamentally, you can now write your SQL code in Notepad, Emacs, VI etc, store that SQL code in files and execute them in HANA either manually or via generic tools like cron.
- Now, writing SQL code from scratch in a text editor is cumbersome, even if there is some syntax highlighting or automatic syntax completion. Most people acquire tools that allow them to graphically model / design / create stuff to generate the underlying SQL statements.
- Whichever method you use to get to the SQL statements, there will be the need to maintain them. Scenarios get extended or adjusted. This translates into changes on the SQL level. For purposes like auditing or simply for having the option to return to an earlier setup it is good practice to track the evolution (i.e. the changes) and to keep the versions of those (SQL or higher-level) artifacts. This is nothing else than in all kinds of programming environments and one can lend infrastructure from there like GIT. The latter and services related to it are (or will be) offered by the HANA platform. They constitute a repository.
There are two more tasks that the repository should support:
- managing the dependencies between the objects (e.g. a transformation using certain stored procedures who, in turn, use certain tables), and
- the release management of those (SQL or higher-level) artifacts, e.g. to allow them being developed and tested in one system w/o jeopardising the production system.
- Finally, there are certain recurring patterns of SQL: things that you need to do over and over again. Examples are tracking incoming data (e.g. via something like the data request in BW), how to derive data changes (like in a DSO), how to store hierarchies etc. Such “patterns” basically translate into higher-level (abstract) artifacts that are created and maintained at the abstraction level to then be translated into a series of SQL statements.
The HANA DW will support this process in the following way; figure 1 below visualises this:
- The HANA DB provides all the SQL functionality you need.
- The HANA platform will provide the development infrastructure, especially to support a repository and related services.
- Tooling on top will create either direct HANA SQL* or higher-level artifacts that translate into HANA SQL*.
- Those tools will keep their artifacts in the HANA repository, allowing to support the complete lifecycle incl. auditing, versioning, dependency analysis (especially also between artifacts maintained by different tools).
- Tools constitute optional added value that you can use but that you don’t have to use. Consider BW-on-HANA as such a tool too.
It is planned to bring the currently existing SAP products related to data warehousing into this HANA DW setup. This will allow SQL-based data warehousing (1.) enriched via higher-level / higher-purpose artifacts (2.). The second pillar in figure 2 describes that evolution. The third pillar indicates that tooling will evolve, potentially into a series of apps or services that can also manage a cloud-based DW.
The Role of BW-on-HANA.
From the above, it should have become obvious that BW-on-HANA will form an important, but optional part of the HANA DW. If it is convenient for the purpose of the DW, then it should be used or added to a HANA DW. Another potential scenario is that an existing BW-on-HANA will gradually evolve into a HANA DW as it is complemented with other tooling in the fashion described above. The border line will be blurry. In any case, BW-on-HANA will extend and enhance its existing functionality enabling more and more direct SQL access + options and leveraging / interacting with the HANA repository. A stand-alone BW-on-HANA system, as it exists today, can be considered as a special instance of a HANA DW. It will continue to exist, evolve, excel. Anyone investing into BW-on-HANA today is on a safe track.
The Role of HANA Vora and Hadoop: the HANA Big DW.
Many customers are looking at ways to complement existing data warehouses with Hadoop. HANA Vora will play a pivotal role in combining the HANA and Hadoop platforms. Therefore, HANA Vora will allow to extend the HANA DW into a HANA Big DW (current working title). We will elaborate on that at a later stage.
* Please consider HANA SQL here as a placeholder comprising all sorts of more specialised languages and extensions like MDX, SQLscript, calc engine expressions etc.