Near real-time SAP HANA data replication with SAP Data Intelligence
Out of the box SAP Data Intelligence comes with a trigger based near real-time data replication mechanism for SAP HANA similar to the SAP Landscape Transformation Replication Server.
The magic lies within the SAP HANA Monitor operator. It watches an SAP HANA table and outputs newly inserted rows as a message by creating a trigger on the table to copy each inserted row into a temporary table. The temporary table is periodically queried and cleared so that it contains the new rows since the last poll. No output is produced if there were no new rows since the last query:
As an example I use my IoT scenario from Restore your NEO Internet of Things Service scenarios with Apache Kafka and the SAP Data Hub, now of course leveraging my SAP Cloud Platform Trial Account on Cloud Foundry:
When I look at the table definition, I see that a trigger statement has been added:
CREATE COLUMN TABLE "DBADMIN"."T_IOT_985EAC8DCBD198C302F5"( "G_DEVICE" NVARCHAR(255), "G_CREATED" LONGDATE, "C_TEMPERATURE" DOUBLE ) UNLOAD PRIORITY 5 AUTO MERGE; CREATE TRIGGER "DBADMIN"."T_IOT_985EAC8DCBD198C302F5_314021906" AFTER INSERT ON "DBADMIN"."T_IOT_985EAC8DCBD198C302F5" REFERENCING NEW ROW NR FOR EACH ROW BEGIN INSERT INTO "T_IOT_985EAC8DCBD198C302F5_TEMP_284535295"(G_CREATED,C_TEMPERATURE) VALUES (:nr.G_CREATED,:nr.C_TEMPERATURE); END;
And subsequently, with each reading added into my IoT table, I get a respective message:
So, this is in fact no magic but works incredibly well without any manual trigger creation or temporary table management.
Question #1: What is the performance impact on the source system when for each insert/update/delete the data of all n columns is inserted into yet another table. Instinctively I would argue it requires twice the resources. Before one insert did write one row, now it writes two rows.
Question #2: I'd like to have a latency of one second. And I am monitoring 5000 tables. That means just to know that there were no changes at all, 5000 individual queries have to be executed every second on the source system.
Question #3: How does this solution maintain transactional consistency? A new sales order header plus the line items are updated in one ERP transaction. The queries read the shadow tables individually and post the data individually. Hence the transactional relationship is lost.
Am I right with my concerns and this solution is only for the most trivial use cases or are there features that have not been talked about yet?
Thanks in advance for any insight!
Thank you for your very valid questions:
#1: Correct, but only for the tables that you monitor which should be a fraction of all the tables with frequent inserts, updates or deletes.
#2: For me this works for replicating a few dozen tables into Microsoft Azure for analytical purposes where the latency does not really matter.
#3: The transactional relationship might be broken for the few latest replicated records but then restored once the dependent records have been replicated too. For my analytical requirements this is sufficient.
These are all valid points but notice that they are also points of concern even when using SLT.
I’d argue they’re points of concern for when using trigger based replication, no matter from which source.
So just think of HANA Monitor is the way to enable trigger based replication on native HANA, with all pros and cons it brings.
My main point of concern is not technology but rather licensing. This type of direct DB access requires a full use HANA license (aka Enterprise HANA). And if the data being pushed is SAP App data, because it’s a push method it would break the Indirect Static Read rule of no real time streaming and hence it would (in theory) require the user accessing this data on the target side (e.g. a data lake) to be a SAP named user. Sounds paradoxal, but it’s what the SAP Software Usage Rights states. I am afraid that the way the SAP licensing is written as of now, there wont be much usage for this type of approach.
Good point, Henrique.
Of course the business value has to outweigh the investment at least in the long run. This is not always easy to articulate but as an example, I described The business case for IT architecture. I hope this might help.
Does it also take care of updates and deletes ?