Getting SAP ERP Data Into HANA The Smart Way
“Data is the fuel of the digital economy”. I like that quote as it points out the importance of data in its various aspects, good quality fuel versus a swamp, the more fuel the more power etc. Yet there is the question on how to get ERP data into HANA easily, preferably as easy as putting a hose into the car tank. The answer is the Smart Data Integration feature of HANA.
One obvious answer is to migrate the entire ERP system to HANA – neat and simple. Customers not running their ERP system on HANA can still benefit from the unique advantages by putting HANA next to the ERP system and loading the required data into it. And even if S/4HANA is used already, the ERP system’s main goal is to support the daily operation, not some complex and long running analysis, potentially taking away significant portions of the resources. A simply way to ensure a strict separation of these two workloads is by having a second dedicated server, isn’t it?
One large server running the ERP system with high availability, enough power for the daily operations, mission critical.
A smaller one for the analytic use cases and a lower SLA. This “smaller one” could even be a HANA Cloud Instance.
To get data into HANA, all that needs to be done is installing the SAP Smart Data Integration Agent from the Service Market Place download site. This agents hosts a set of adapters, some of them can be used to connect to an ERP system (and many other adapters for various other sources), and allow the HANA instance to reach into the source system. Basically these adapters act as a translator from the HANA syntax to the source database language and vice versa.
1:1 Realtime Replication
If the goal is just to copy ERP data 1:1 into HANA, this can be accomplished using the RepTask UI quickly. It shows all objects of the source, the user picks the ones containing the needed data and decides if the copy should be a one-time copy (initial load only) or a realtime replication of later changes.
This UI creates the target tables, does setup the realtime replication, performs the initial load, deals with changes that happened during the initial load,… it does everything.
(see the post Realtime Table Replication for details)
More than simple data replication
Once the data is made available, the next step would be to define the KPIs and the logic how to calculate those based on the available data. For example the VBAP.NETWR value is of little help although it contains the net value of each order line as there are more things to consider for an order. This order line could have been cancelled, the entire order might be cancelled, the order might be a goods return or a service order. To help the user and to make sure everybody talks about the same numbers when it comes to the sales order amount, this logic is added into calculation views of HANA and therefore executed whenever a user queries that CalcView.
It might be more clever to calculate the KPIs directly when loading the data, then they are readily available and are calculated once only. The same calculation, but performed in the data provisioning layer instead of the CalcView layer. This approach has some more advantages, technical details can be found here and actual performance measurements here.
To achieve that, the transformation is built in the FlowGraph UI, which is similar to the RepTask, but allows for transformations as well. There, the user has full control, can even define if the data should be loaded historically correct and the such.
The Virtual Data Model
At one point in time the question will be asked of “Why have the data twice actually?”. We answered that part already above, to protect the ERP system from spending too much resource for analytic queries and slowing down the operational users. And in the second part, to optimize the data for fast query by transforming it prior to loading it into the final table.
But there will be cases where there is nothing to transform – The list of Company Codes is a list, period. And reading those few lines from the ERP system will not slow it down at all. On top of that, how frequently does the data change?
The nice thing of above solution is that it provides all the options, for 1:1 replication, for realtime transformations but also reading the data directly from the source system. The adapter makes the remote data available in HANA in form of a so called Virtual Table, a HANA table with structure that can be read like any other table, but its data is coming directly from the source system, no data being copied. In fact the RepTask UI can be used to create the virtual tables as well.
Too good to be true?
I have made a few statements above that are worth looking at one level deeper. For example there was the idea of using a HANA Cloud instance and putting the data from the on-premise ERP system into that. How much work is it to enable that? How can a remote HANA system reach into the on-prem system even and if, in a secure manner?
When installing the Agent, the process hosting all adapters, near the source system, it has to be registered in the HANA system so the database knows about it. What is the difference between reaching a HANA system in the same network or a cloud instance? Only the protocol. A local database can be reached via TCP/IP directly, a cloud instance is using the https protocol to pass the corporate firewall. This is also a first level of defense. Since the agent is instantiating the connection to HANA, only this HANA database has a connection to the agent and its adapters.
(see the Architecture post of the solution)
Another question would be about the available adapters and how they capture the changes in realtime. Especially since I have experience with ERP data and ETL processes, this sounds almost impossible to do. So what is the truth here?
The answer is quite simple, it is all a matter of the adapters. They are responsible for the translation between HANA and the source system and how to get the data in realtime. And what kind of adapters exist?
- ECC Logreader Adapters for SQL Server, DB2, Oracle: The ERP tables themselves usually have no change indicator at all. Therefore these adapters read the data one level deeper, from the database transaction log, and return it in an ABAP datatype format. For ABAP Pool and Cluster tables, the database table gets decoded in addition.
- ABAP Adapter, the Extractors: This allows to use the ERP Extractors in order to get the data. None of these Extractors are realtime in the sense of pushing change information themselves, they just have a flag how often they can be queried. The so called Realtime Extractors are lightweight and can be queried frequently, other Extractors once or multiple times a day at least. The Adapter is using that information to invoke the change reading in this frequency.
- ABAP Adapter, the ABAP Tables: This does not support Realtime. It is used to query the ERP data and, in case the table has a change indicator, a batch delta can be implemented manually.
- SLT Adapter: This does not exist yet, an Adapter connecting to SLT and its trigger based change data capture. We have a working prototype but it is not released yet. In case there is an interest for this, please let us know in SAP ideaplace.
Using these HANA features everybody can take advantage of HANA now. And making data available in HANA, be it virtually, having a realtime copy or already transformed for the consumer’s convenience, has never been easier than today.
Hi Werner -
Is there a way to filter or search the table in the RepTask UI? When I tried to "Add Objects" and selected an ECC source - the list of tables are so long that it cut off at certain point without displaying everything that's available.