Realtime Business Intelligence with Hana

werner_daehn · ‎05-14-2015

The desire of enabling Business Intelligence on current data was always present and multiple approaches had been suggested. One thing they had in common, the failed miserably because they never met the expectations.

With Hana we do have all the building blocks from a technical point of view to finally implement that vision.

Requirements

Speed: A Business Intelligence solution that has response times greater than a second will not be used by customers
External data: Reporting within an application is no Business Intelligence, it is dumb reporting. BI means comparing data and the more there is to compare with, the more intelligent findings will be made.
Historical correct data: If past data is reported on, the result should stay the same. Even if the master data did change. For example last year's revenue per customer region should remain the same although a customer moved to a different location.
Data consistency: When data is filtered or grouped by a column, this column should have well defined values, not duplicates but with different spelling. Also consistency between tables becomes important, e.g. a sales line item without a matching sales order row would be a bad thing.

The goal should obviously be all green in each of the categories

Speed

External Data

Historical Correct Data

Data Consistency

What had been suggested in the past

To accomplish realtime Business Intelligence two major suggestions had been made: Near Realtime loads and EII.

The idea of a near-realtime data warehouse is simple, instead of loading the Data Warehouse once every night, load it every hour. Well not very "near" real time, load it every 5 minutes, every minute, every second even?

This approach is feasible down to a certain frequency, but how long does a Data Warehouse delta run take? One part of the time is the data volume for sure. But assuming the data is loaded that frequently that most of the time there were no changes in the source system at all, this factor can be reduced t zero. The most time is usually spent in finding out what has changed. One table has a timestamp based delta, hence a query reading all rows with a newer timestamp is executed. For other tables a change log/transaction log is read. And the majority of the tables do not have any change indicator, hence a read entirely and compared with the target.

Above logic does not only take time, it costs resources as well. Constantly the source is queried "Is there a change?" "Is there a change?" "Is there a change?". For every single table.

While this approach has all the advantages of the Data Warehouse, fast query response time, no issue with adding external data, no issue preserving historical data, it is simply not feasible to build, aside from exceptional cases.

Speed

External Data

Historical Correct Data

Data Consistency

Another idea became popular in mid 2000 was to create a virtual data warehouse, meaning you create a simple to understand data model via views but data is not loaded into that data model, instead data is queried from the various sources on request. Therefore called Enterprise Information Integration (EII). So all the complexity of the transformations are done inside the database view instead of in the ETL tool. As the source data is queried directly, it returns current data per definition and the entire delta logic can be spared.

This works as long as the queries against the source systems are highly independent, e.g. System1: select quarter, sum(revenue); System2: select quarter, business_year. And the source system can produce the results quickly enough.

For typical Business Intelligence queries both points are not fulfilled usually.

Also, often you have to cut down on the amount of transformations being done, else the query speed would suffer even more. A common example would be standardizing on search terms, finding duplicates in the master data. These things are either be done during data entry - slowing down the person entering the data - or not done at all with negative impact on the decision being made due to wrong assumptions.

Hence, although the idea as such has its merits it died quickly due to the bad query performance.

Speed

External Data

Historical Correct Data

Data Consistency

The situation with Hana

Data Federation- EII

From a technology point of view Hana supports EII, it is called Smart Data Access (Data Federation) there. The pros and cons of EII are the same however. When reading from a Hana Virtual Table, the required data is requested from the remote database, hence the overall query performance depends on the amount of data to be transferred, how long the remote database requires to produce the data and the time to create the final query result in Hana.

And as only data that is available can be queried, and changes in an ERP system are usually just that, changes, there is no history available way too often.

Speed

External Data

Historical Correct Data

Data Consistency

Sidecar - S/4Hana

As a temporary workaround, until the ERP system itself runs on Hana and therefore does participate on the Hana query performance, the Side-by-Side scenario is used. The idea is to copy the source database to Hana and keep it updated in realtime, all the queries that would take too long for the other database are executed within that Hana box instead. And once the entire ERP system runs on Hana, those queries can be kept unchanged but run on the ERP system tables now.

So basically this is reporting on the ERP tables directly. Due to the raw computing power of Hana the speed is much better and this became feasible again but it is not as fast as a data model optimized for queries. The reasons for this I have listed in this blog post: Comparing the Data Warehouse approach with CalcViews - Overview

Another issue is again the history of changes. If a sales order entered last month got updated and the amount reduced from 400USD to 300USD, the sum of revenue for last month will be different than it was yesterday. In BW you would see the old amount of 400USD in last month and another row with the amount -100USD for today. Hence the data is historical correct.

Speed

External Data

Historical Correct Data

Data Consistency

Realtime Data Warehouse

One feature Hana got with the Smart Data Integration option is to combine realtime feeds with transformations. Previously this was not possible with any other tool because of the complexity. Realtime had been used as synonym from Replication, meaning the source data is copied 1:1 into the target, just like in above sidecar approach. With this the downsides of EII are combined. But with Hana a realtime subscription can push the data into a task instead of a table, inside the task the data is transformed and loaded into the query optimized Data Warehouse data model.

Therefore the advantages of realtime and Data Warehouse are combined without introducing more complexity.

The query speed is based on Hana and all complex transformations are done whenever the data is changed, not every single time somebody queries the data.
External data is no problem, new sources can be added and harmonized with the other data easily.
Historical correct data is possible as well, either a change triggers an update in Hana or the change information is added as new row. In other words, a task might either load the target table directly or there is a History Preserving transform used prior to the target table.
Data consistency is no problem either. A realtime push of the source data preserves the transaction of the source, so if a sales line item got added and hence the sales order's total amount updated, both are done in one transaction in the source and in Hana - Smart Data Integration feature takes care of that. Also all transforms to standardize the data are available. Their execution takes a while but that does not matter as it processes the changed data only, not all, and only once, not every time somebody queries the data.

Speed

External Data

Historical Correct Data

Data Consistency

S4/Hana with external data

Using above technologies, Federation and Realtime Transformations, external data can be added to the S/4Hana database into a different schema. This allows to pick the proper tchnology for each case, e.g. it was said that Federation works only for cases when the amount of data returned is small. Very often the remote dataset is tiny anyhow, hence Federation is perfect. And if it is not the data can be brought into Hana in realtime, either by simply copying the data and hence having to do all the hamronization with the EPR data at query time. Or even better, pushing the realtime changes into a task object which does all the harmonization already. Therefore the resulting view is as simple as a union-all of two identical table structures, both being in Hana already.

While this approach allows for all the flexibility on the external data, the local ERP data has the same issue as before - missing history, data consistency and not optimal speed due to the number of transformation done in the view.

Theoretically realtime replication from the ERP system into another schema of the very same Hana database could be enabled to preserve the history. But that will not be liked a lot.

Speed

External Data

Historical Correct Data

Data Consistency