Skip to Content
Author's profile photo Werner Daehn

Hana Smart Data Integration – Inside Realtime streams

Most Hana Adapters do support realtime push of changes. Not only for databases as sources but everything, e.g. Twitter or adapters you wrote. While the documentation contains all the information needed from an Adapter developer and user perspective, I’d like to show the internals that might be helpful.

As shown in the blog entry about adapters and their architecture (see Hana Smart Data Integration – Adapters)  and the Adapter SDK manual (SAP HANA Data Provisioning Adapter SDK – SAP Library), Adapters provide an interface to interact with the Hana database that revolve about remote sources (=connection to the remote source system) and virtual tables (=the structure of remote information).

For realtime the remote subscription is the central Hana object.

Hana remote subscriptions

The syntax for this command is quite self explanatory:

create remote subscription <subscriptionname> using (select * from <virtual_table_name> where …) target [table | task | procedure] <target_name>;

Hana will send the passed SQL select of that command to the Smart Data Access layer and depending on the capabilities of the adapter, as much as possible is passed to the Adapter. The Adapter can do whatever it takes to get changes, send them to Hana and there the change rows are put either into a target table, a target task (=transformation) or a target stored procedure.

The remote subscription object contains all of this information, as a query on the catalog object shows:

select * from remote_subscriptions;

/wp-content/uploads/2015/04/realtime_insight1_688398.png

As seen from the Adapter, above create remote subscription command does nothing. All it does is basic logical validations like checking if such virtual table exists, if the SQL is simple enough for being pushed to the adapter, if the target object exists and the selected columns match the target structure. All checks performed on metadata Hana has already.

Activating a remote subscription

Replicating a table consists of two parts, the initial load to get all current data into the target tables and then applying the changes onward. But in what order?

The usual answer is to set the source system to read only, then perform the initial load, then activate the change data processing and allow users to modify data after. As the initial load can take hours, such down time is not very appreciated. Therefore we designed the remote subscription activation to support two phases.

First phase is initiated with the command

alter remote subscription <subscriptionname> queue;

With this command the Adapter will be notified to start capturing changes in the source. The adapter gets all the information required for that, a Hana connection to use, the SQL select so it knows the table, columns and potential filters. The only thing the adapter is required to do is to add a BeginMarker row into the stream of rows being sent. This is a single line of code in the Adapter and it will tell the Hana receiver that at this point the Adapter started to produce changes for this table.

Example:

The adapter does replicate CUSTOMER already, now above remote-subscription-queue command was issued for a subscription using the remote table REGION. Such stream of changes might look like

Transaction Table Change Type Row
13.01:55.0000 CUSTOMER insert insert into customer(key, name) values (100,’John’);
13.01:55.0000 commit
13:02:56.0000 CUSTOMER update update customer set name = ‘Franck’ where key = 7;
13:02:56.0000 CUSTOMER update update customer set name = ‘Frank’ where key = 7;
13:02:56.0000 commit
13:47:33.0000 REGION BeginMarker BeginMarker for table REGION
13:55:10.0000 REGION insert insert into region(region, isocode) value (‘US’, ‘US’);
13:55:10.0000 commit

The Hana server will take all incoming change rows and process them normally, rows for subscriptions that are in queue mode only, that is a BeginMarker was found in the stream but no EndMarker yet, will be queued on the Hana server. In above example, the CUSTOMER rows end up normally in its target table, the target table for REGION rows will remain empty for now.

Therefore the initial load can be started and it does not have to worry about changes that happened. From the looks of the initial load, the target table is empty and not a single change will be loaded.

Once the initial load is finished, the command

alter remote subscription <subscriptionname> distribute;


should be executed. This will tell the Adapter to add a EndMarker into the stream of data.

When the Hana server finds such EndMarker row it starts to empty the queue and apply the changes to the target table. All rows between the Begin- EndMarker for the given table are loaded carefully, as it is unknown if those had been covered by the initial load already or haven’t. Technically that is quite simple, the insert/update rows are loaded with an upsert command, hence either inserted if the initial load did not find them or updated if already present. Rows of the ChangeType Delete are deleted of course.

All rows after the EndMarker are processed normally.

Error handling

During operation various errors can happen. The Adapter has a problem with the source and does raise an exception. The Adapter or the Agent itself dies. The network connection between Hana and Agent is interrupted. Hana was shutdown….

In all these cases the issue is logged as exception in a Hana catalog table.

select * from remote_subscription_exceptions;

/wp-content/uploads/2015/04/realtime_insight2_688573.png

In above instance the connection between the Agent and Hana was interrupted. Therefore the adapter got a close() call and should have cleaned up everything. Brought in a state where nothing is active anymore. On the Hana side an remote subscription exception is shown with an EXCEPTION_OID. Using this unique row number the exception can be cleared and the connection re-established, using the command

process remote subscription exception 42 retry;

This command will reestablish the connection with the Agent and its Adapter, send all remote subscription definitions to the adapter again plus the information where the adapter should start again. The adapter then has to start reading the changes from this point onward.

Pausing realtime propagation

Another situation might be to either pause the capture of changes in the source or to pause applying the changes into the target objects. This cannot be done on remote subscription level but for the entire source system using the command

alter remote source <name> suspend capture;

alter remote source <name> suspend distribution;

and the reverse operation

alter remote source <name> resume capture;

alter remote source <name> resume distribution;

The magic of Change Types

Whenever an adapter creates realtime changes, these CDC Rows have a RowType, in the UI called Change Type, which is either insert, update, delete, or something else. This Change Type information is used when loading a target table or inside a task to process the data correctly.

For simple 1:1 replications the handling of the Change Type is quite straight forward, the Change Type is used as the loading command, so an insert row is inserted, a delete row deleted etc.

Therefore it is important that the adapter sends useful Change Types. Take the RSS Adapter with its virtual table RSSFEED. The adapter polls the URL, gets the latest news headers and they should be loaded. The primary key of the virtual table is the URI of the news headline and so has the replicated target table.

If the adapter would send all rows with Change Type = Insert, the first realtime transaction would insert the headlines, the second iteration fail with a primary key violation. An RSS Feed simply does not know what was changed, what had been received already. Not even the Adapter knows that for sure as the Adapter might have been restarted and as seen from its perspective it is the first read, it has no idea what happened before it was stopped.

One solution to this would be to send two rows, a Delete row plus an insert row. Would certainly work but cause a huge overhead in the database as twice as many rows are sent and deleting rows and inserting again, even if not changed, is expensive as well.

The solution was to add more Change Types to simplify adapter development. In case of above RSS Adapter, the RowType Upsert was used.

Another special Change Type is the eXterminate value. Imagine a subscription using the SQL “select * from twitter where region = ‘US'” and let’s assume this filter cannot be passed to the Adapter but is executed in the Hana Federation layer.

So Twitter sends rows from all regions to Hana, in Hana the filter region = ‘US’ is applied and only the resulting ones are loaded. No problem. Except for Delete messages from Twitter. Because Twitter does not tell all values, only the TweetID of the Tweet to be removed. So the adapter does send a row with the column TweetID being filled, all other columns are null, especially the region column. Hence this delete row does not pass the filter region=’US’ and will never make it to the target table. Therefore, instead of sending such row as Delete, the Twitter adapter does send this row as eXterminate row.

This tells the applier process that only the primary key is filled and it does not use the filter condition on those rows.

Another Change Type is Truncate. Using this the Adapter can tell to delete many rows at once. An obvious example is, in a database source somebody emptied the source table using a truncate table command. The adapter might send a truncate row with all columns being NULL, instead of deleting every single row. But with the Truncate Change Type subsets of data can be deleted as well. All the Adapter has to do is sending a truncate row with some columns having a value. For example, an Adapter might send a truncate row with region=’US’ to delete all rows where the region = ‘US’. That might sound as a weird example but imagine a source database command like “alter table drop partition”.

Another use case of the Truncate Change Type goes together with the Replace rows. Imagine an Adapter that does notknow what row has been changed, only that something changed within a dataset. Let’s say it is a file adapter and whenever a file appears in a directory, the entire file is sent into the target table. It might happen that a file contained wrong data and hence is put into the directory with the same name as previously. None of the above Change Types can deal with that situation. Insert would result in a primary key violation, upsert would work but what if the file contains less rows as one got deleted?

The solution is to send a first truncate row with the file name column being set, hence the command “delete from target where filename = ?” will be executed and now all rows of the file can be inserted again. But use the Change Type Replace instead of Insert. It does the same thing internally, all replace rows are being inserted but it helps to understand that these Replace rows belong to the previous Truncate row and additional optimizations and validations can be done.

All of the above Change Types work with Tasks as target as well. Understanding what each transform has to do for each row was hard, very hard in fact. But the advantage we get is, complete dataflows do not work in batch but can transform realtime streams of data as well. No delta loads needed, the initial load dataflow can be turned into a realtime task receiving the changes. Per SPS09 for single tables only, but how to deal with joins in realtime is the next big thing.

Assigned Tags

      5 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo James Rapp
      James Rapp

      Hi Werner,

      Thanks for these details. When I try to set up real-time CDC to the Twitter adapter using the syntax in the EIM administration guide:

      http://help.sap.com/download/multimedia/hana_options_eim/SAP_HANA_EIM_Administration_Guide_en.pdf

      6.11.7.3 Set Up Real Time CDC

      I receive an error message in my remote_subscriptions table:

      com.sap.hana.dp.adapter.sdk.AdapterException: Server does not support row type UPSERT

      I can access Twitter data for the initial load but not for successive loads. Any clue what might be happening here for me?

      Thanks,

      Jim

      Author's profile photo Werner Daehn
      Werner Daehn
      Blog Post Author

      Hi Jim, you seem to have a mismatch between the Hana Server (older revision) and the Adapter SDK/Agent (more recent version).

      We have added these checks recently for the SP10 build.

      Author's profile photo Richard LeBlanc
      Richard LeBlanc

      Hi Jim,

      As Werner mentionned, you have the SPS10 version of the DP Agent but are using it with an SPS9 HANA database.  You need the SPS9 version of the DP Agent, which contains the adapters.

      SPS9 File Name - IMDB_DPAGENT100_00_2-70000175.SAR

      SPS10 File Name - IMDB_DPAGENT100_01_0-70000175.SAR

      Best Regards,

      Richard

      Author's profile photo Former Member
      Former Member

      Hi

      While developing SDI flow for Remote Soruce tables, I am getting below error. It seems that this is related to log path mapping of Remote Source at Remote Source. Below are the DP Agent frame work logs of remote sorce. ========================================================================================= Could not execute 'CALL "XXXADMIN"."XXXADMIN.XXXADMIN_TEST::REALTIME_TEST_SP"' in 3.232 seconds . SAP DBTech JDBC: [256]: sql processing error: "XXXADMIN"."XXXADMIN.XXXADMIN_TEST::REALTIME_TEST_SP": line 6 col 1 (at pos 217): [256] (range 3) sql processing error exception: sql processing error: QUEUE: XXXADMIN.XXXADMIN_TEST::REALTIME_TEST_RS: Failed to add subscription for remote subscription XXXADMIN.XXXADMIN_TEST::REALTIME_TEST_RS.Error: exception 151050: CDC add subscription failed: No original/mirror paths mapping found for databasein file. : line 1 col 1 (at pos 0) ===========================================================================================

      My Remote Source is MSSQL 2012 DB and DP Agent is instlled on same server, so I am not able to figure out the reason of the above error.

      Do anyone has an idea on this...

      Thread https://scn.sap.com/thread/3747659  is also related to my error but its not solved yet

      Regards

      Shontu

      Author's profile photo Richard LeBlanc
      Richard LeBlanc

      Hi Shontu,

      Have you configured your SQL Server for real time replication?

      The steps are described in the HANA EIM Administration guide

      SAP HANA Enterprise Information Management – SAP Help Portal Page

      Best Regards,

      Richard