SAP HANA & Data Warehousing for non-experts
First published on March 27, 2019.
A quick blog, triggered by 2 questions of a customer last week:
- What exactly is SAP HANA?
- What does SAP mean by “modern data warehousing”?
The answers to these questions may interest a broader audience of non-SAP-experts, so let’s briefly discuss these topics, illustrated with slides from openSAP courses. On the top right corners of the images you will see something like “(bw4h2, w2u1)” in purple letters. This means the slide is taken from week 2 unit 1 of a training you can find at following url: https://open.sap.com/courses/bw4h2. You are smart enough to derive the other references.
What is SAP HANA?
I can think of 3 answers to this question:
- it is a database;
- it is an environment for data modeling;
- it is an enabler of SAP’s view on the “Intelligent Enterprise”.
HANA as a database
Databases used to have issue with reading from and writing to the same tables simultaneously. That is why transactions systems for writing data – On-line Transaction Processing or OLTP – and reporting systems for reading data – On-line Analytical Processing or OLAP – are traditionally separated. It was also believed that database technology for both types of processing should be widely different.
Research at the Hasso Plattner Institute, connected with SAP via its founder, indicated that workloads for OLTP versus OLAP systems are in fact not that different (see image above). One can use the same database technology for both processing types. The HANA database is such a “dual purpose” database. It is “in-memory” and “column-based”, delivering great processing speed and making it possible to read and write simultaneously. One can argue whether SAP was the first to come up with this technology (probably not) or whether it is technically the best (maybe not), but SAP has been quite decisive in bringing it to market and integrating it with all its software products. Better processing speed is great to have, as all PC users will confirm. But it also makes it possible to do things differently. More “virtual” processing of data. Or processing in “full mode” instead of “delta mode”. I will address thus further on.
HANA as an environment for data modeling
Having access to a HANA database with a development tool like Eclipse + Hana Studio add-ons gives opportunity for powerful data modeling. Most powerful objects are probably “Calculation Views”. An example is shown below.
Standard data warehouse transformations, like various types of joins, unions and aggregations, can be developed in a graphical environment. Mostly just drawing lines and arrows and clicking on little icons. A child could do it. Well, almost. More complex transformations are of course possible, especially if option “Script” is chosen for the calculation view. In that case, SQL expertise is required. This type of development is often called “Native HANA (Modeling)”.
And the good news: it is all virtual! The data remains in the tables where the calculation view starts from.
HANA as enabler of the “Intelligent Enterprise”
In recent presentations, SAP emphasizes its view on the “Intelligent Enterprise”, always using the slide shown below.
The “Intelligent Suite” consists of all SAP’s transaction system, like S/4HANA, C/4HANA (a cannibal that ate C4C, Hybris, Gigya and Callidus Cloud), Ariba and SuccessFactors. These systems are either Software-as-a-Service (SaaS) only, or SAP is pushing customers towards the SaaS version, like with ERP system S/4HANA.
The “Intelligent Technologies” are loosely coupled offerings grouped by the marketing term “Leonardo”. These offerings include functionalities in following domains: Internet of Things (IoT), Artificial Intelligence (AI)/ Machine Learning (ML), Data Science/ Predictive Analytics and Blockchain. The analytics system “SAP Analytics Cloud” (SAC) with the Digital Boardroom on top and Big Data solution “Data Hub” (see further on) are also sometimes covered by the Leonardo umbrella. And then there is unit 7 of the openSAP course “SAP Leonardo – Enabling the Intelligent Enterprise” (leo1) with the title “Data Intelligence”. I just finished rewatching the movie for this unit looking for recognizable products or solutions … but still have no clue what it is about. As usual, some of these tools will stay, some will disappear. For the latter my money is on “Data Intelligence”. And probably Blockchain. And Digital Boardroom. Let’s stop here for the moment.
The “workhorse” of the Intelligent Enterprise is the “Digital Platform”, consisting of “Data Management” and the “Cloud Platform”. Data Management means … HANA. The “SAP Cloud Platform” (SCP) as it is called nowadays used to be called the “HANA Cloud Platform” (HCP) as it relies heavily on HANA database(s) underneath. It is a powerful Platform-as-a-Service (PaaS) offering for data storage, data modeling and app development. This is good stuff!
My next two blogs will be more technical ones describing some of the work we did at Ciber on the SCP.
What does SAP mean by “modern data warehousing”?
Let’s start with what SAP definitely does not mean by it, and that is what I became particularly good at over the past 18 years: the old-school “Business Warehouse” system, also known as BW, pronounced “Bee Double-You”. At least not in its currently often used form in which it is not residing on a HANA database. Somewhat respectlessly, SAP calls this configuration “BW on anyDB”. For years now, no development effort has been put into this configuration.
An improvements is already “BW on HANA” or “BW powered by HANA”. Replacing “anyDB” by a HANA database immediately brings improved data loading and query performance. But also, new data warehousing objects are introduced in higher software versions simplifying and speeding up developments, and also promoting the use of virtual data layers. In BW on HANA, old and new data warehousing objects coexist, leaving customers the option not to migrate to the new objects, thus missing the benefits these objects bring.
Next step to be taken by customers is towards “BW/4HANA”, pronounced “Bee Double-You For HANA”. In this version, only the new data warehousing objects are available. As SAP states it, BW/4HANA “… leaves behind the legacy of SAP BW on anyDB”. Moving to this version will usually require a re-implementation instead of an upgrade type of migration. This version is first introduced in 2017, and not many customers have taken this step yet. And what is next? Rumor has it that SAP is working on a SaaS offering for data warehousing under the name “Project Blueberry”. Interesting, but not for now.
Evolution in a nutshell:
- “BW on anyDB” “BW on HANA” “BW/4HANA” SaaS data warehouse
Leave the data where it is!
What SAP does mean by “modern data warehousing” can be summarized in one sentence: “leave the data where it is!”. At least, as much as possible. I am a firm supporter of this view. Moving data around and storing multiple versions of the same data not only raises data storage costs, but also – and in my view more importantly – introduces errors in data used for reporting compared with data in the source system. Data transformation and integration is of course required, but should be done virtually as much as possible.
Mixed data warehousing BW + SQL
For data warehousing, SAP has lately been promoting a “mixed architecture with SAP BW/4HANA and SAP HANA”. See image below. “Native HANA modeling” described earlier on the left, more conventional BW on the right.
Idea is to use best of both worlds by developing “mixed scenarios” that are partially Native HANA and partially BW. Great strength lies in the ability to “store data once – use multiple times”. Actually storing data in the warehouse needs to be kept to a minimum. A new word is now used for this thing to avoid: “data persistency”. In practice, the improved processing speed of the HANA database has its limitations, and on occasions intermediate data persistency is required to deliver sufficient performance in the analytics tools put on top.
Data warehouse + data lake
SAP is struggling how to enter the world of “Big Data”. Unstructured data, e.g. from social media or IoT, is stored as a “data lake” in cheap databases like Hadoop and processed with Open Source tools. How to compete with stuff that is for free? Well, basically by embracing this stuff. See image below.
SAP is investing heavily in connectivity between Hadoop and other data lakes with SAP’s data warehouse, in which structured business data resides. If you can’t beat them, join them! Good thinking.
Orchestration by Data Hub
More embracing: SAP’s new product “Data Hub”. Quote: “SAP Data Hub provides data orchestration and metadata management across heterogeneous data sources”. See image below.
Data Hub uses SAP Vora technology, is well-integrated with other SAP systems, and is particularly good at leaving the data where it is. It also cleverly integrates or orchestrates darlings of the Open Source community like Hadoop, S3, Kubernetes, Docker, Kafka, Spark, Python, Grafana, Kibana and what more. Again, good thinking! A colleague of mine who spent notably more years in the world of Big Data than I did sees potential in this product, even though he is far from an SAP-fan. But it will be hard to convince this and other representatives of the Open Source community to actually pay for software.
I hope I gave my customer and others some insights in SAP’s views in these domains. Please feel free to comment, or contact me by e-mail: firstname.lastname@example.org.