Skip to Content
Author's profile photo Werner Daehn

Hana Smart Data Integration – Transactional consistency and delta handling

When it comes to comparing SDI with other solutions, one major difference is the support of transactional consistency. Some examples where this might come in handy:

1. The source has an order table and a reserved goods table. Whenever something is ordered, the goods are reserved and taken out of the stock and then both changes are committed together. So a user in the source system will always see the orders and reserved goods being in sync.

If both tables are copied independently, a user querying the sum of orders and the sum of reserved goods will constantly see those two numbers to be different as there are new orders all the time.

Transactional consistency across tables.

2. In the financial booking table every debit booking has a matching credit booking and both are committed together. Hence the sum of bookings is zero always, money is moved from one account to another. It would be outright wrong if the data is replicate and committed in the target differently.

Transactional consistency within tables.

3. A new customer called and ordered something. So first the customer master data was entered and a minute later a sales order placed. If both tables are replicated independently, then it might happen that in the target table there is a sales order for a customer did does not exist yet.

Transactional order.

4. In the source system a new material was entered, hence inserted into the database but not saved/committed. The user went to lunch and saved it 30 minutes later. The record in the database has a create data, which is 30 minutes before but whoever queries the table does not find the record until it is committed. This is a major problem if for example a delta process relies on this timestamp as it might read all changes with a recent create date but does not see this record for 30 minutes and then it is not recent anymore.

Long running transactions.

All of the above examples are huge problems for any ETL tool in the market. But because they ignore above problems they can be very fast during the initial load and partly for the delta loads.

With SDI the initial load does happen by reading in parallel for maximum speed and depending on the settings, it works just like all other ETL tools. But during the initial load all changes in the source tables are recorded already and later merged into the target to get a consistent state. The subsequent realtime push of the changes does happen in a transactional fashion honoring the above cases 1-3.

This is possible because the adapter does send all change rows of all tables in the correct order and a commit is a change row of its own, hence the Hana Data Provisioning server can replay the changes with the same transactional bracket and in the same order.

This can be seen in the following example, where an insert loader semantic is used:

The execution order in the source had been

(change 1 via initial load)

insert into address_input

values (‘1’, ‘Innsbrucker Straße 34’, ‘Salzburg’, ‘5070’, ‘AT’);

commit;

(change 1)

insert into address_input

values (‘2’, ‘North 1st Street’, ‘San Jose’, ‘????’, ‘US’);

commit;

(change 2)

insert into address_input

values (‘3’, ‘Hill Rd Drive 532’, ‘LA’, ‘90003’, ‘US’);

commit;


(change 3 modifying the old record to change 4, the new values)

update address_input

set “postalcode” = ‘95121’ where “customer_id” = ‘2’;

commit;


(change 5)

delete from address_input

where “customer_id” = ‘1’;

commit;


After all these changes the various records look like in the target

customer_id = 1: Had been inserted into Hana at 03:26:55 and deleted at 03:28:56

customer_id = 2: Inserted at 03:28:02 and updated 03:28:44

customer_id = 3: Inserted at 03:28:22

/wp-content/uploads/2016/03/replication_load_behavior_preserve_all_target2_916043.png

Preserving the order is key here, e.g. customer_id=2 has to be inserted before being updated, else it does not make sense.

Looking at the CHANGE_TIME more closely reveals more details.

First, all rows within the same source transaction all have the same value. In above example all changes have a truly ascending timestamp except for the update, where a before and after image row was generated. This does not have the same value because these rows followed so quickly after each other, they have the exact same value, even at the nano second.

This is more obvious when executing a few more changes in the source database.

(change 6)

insert into address_input

values (‘4’, ‘Shoreline Bld 1’, ‘Los Angeles’, ‘90002’, ‘US’);


(change 7)

insert into address_input

values (‘5’, ‘Page Mill Dr’, ‘Mountain View’, ‘95111’, ‘US’);


(change 8 & 9)

update address_input

set “street” = ‘Page Mill Dr 435‘ where “customer_id” = ‘5’;


(change 10, as this is executed on the Hana system against the virtual table, the insert statement is using the Hana(!!) current timestamp)

insert into address_input

values (‘control’, now(), now(), null, null);


commit;

/wp-content/uploads/2016/03/replication_load_transactional_consistency_916044.png

All these changes have the same CHANGE_TIME although they have been entered over a period of 5 minutes.

And the last row proves the values does not reflect the change in the source as it was inserted at 03:36:49 in the source but in the target at 03:37:42.

Hence the CHANGE_TIME is not only strictly ascending, it also has no issues with system times being slightly off between different databases, different timezones or whatever.

Simply because this value does neither come from the source nor from the adapter but from the Data Provisioning server process of Hana. The DP Server does receive all change rows from the adapter and queues them up until a commit row is received. This is the point in time the CHANGE_TIME variable gets a value, the current timestamp (down to the nanosecond) of the Hana(!) instance, and the Applier process starts merging the data into the target tables.

Hence it is the perfect column for a timestamp based delta of subsequent processes.

Note: Above works identical if multiple tables are used. The examples show the behavior with a single table for better readability only.

Assigned Tags

      2 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Marc Clijsters
      Marc Clijsters

      Good morning,

      I have a question at which moment consistency is reached and how this impacts the others subscriptions.

      The initial load of a table consist of multiple phases :

      • queue changes
      • copy content of the table
      • distribute changes that were made to the table during the copy of the content (state = auto_correct)
      • normal replication (state = apply_change_data)

      Let me illustrate this with an example.

      We are replicating the vbap table for a longer period and now want to add the vbak table. Clearly these tables are logically linked.

      At which moment during the process is consistency reached?

      Does the auto_correct phase replay the entire queue or only the vbap changes?

      Are the others subscriptions halted during the auto_correct phase?

      Regards,

      Marc

      Author's profile photo Werner Dähn
      Werner Dähn

      First, a qualifier: I am not 100% sure, just 90%. But when testing you can find out easily by looking at the various states.

       

      alter subscription s_vbak queue;

      This command tells the adapter to start capturing changes in the source and send those to the Hana SDI queue. I believe it also tells the SDI applier, the thing that reads the queue to writes into VBAP to pause. So this is the point in time where the VBAP table would no longer be updated.

       

      Then you can do whatever you like, e.g. perform the initial load of the VBAK table.

       

      alter subscription s_vbak distribute;

      Now the SDI queue is read by the applier and VBAK and VBAP get updated with the changes that happened since. In other words, the two tables will be loaded with the same transactional order and commit points as in the source, creating a perfect replica. As the applier finds only in the VBAK table the markers for auto_correct_start and end, only that table is upserted.

       

      Can you validate that with a simple example?