Skip to Content
Technical Articles
Author's profile photo Seungjoon Lee

A new approach for replicating tables across different SAP HANA systems

Update on June 29, 2022:

  • Adjustment for making the supported scenarios more clear
  • Replacement of ‘Other Limitations’ by the official help documentation link

Update on January 21, 2021:

  • Hybrid scenario with Cloud Connector

Starting with SAP HANA Cloud, SAP HANA supports a new remote table replication feature for replicating tables across different SAP HANA systems natively. With this new feature, users can handily replicate remote tables as it directly sends DDL and DML logs from a source system to a target system through SAP HANA smart data access (SDA) ODBC connection.In a nutshell, the new remote table replication was developed with the following five key characteristics.

  • Transient (in-memory) log-based replication: As this new feature is log-based in transactional layer, it can provide better performance with less overhead to a source system compared to trigger-based replication
  • ODBC connection: It ensures optimized performance between HANA systems because it uses SDA ODBC connection with hanaodbc adapter for its log/data transfer
  • Transactional consistency: If multiple source tables are updated within one transaction, it applies the same behavior on the target tables to ensure database level transactional consistency (transactional consistency is only guaranteed if the transaction is handled on database level, and it doesn’t guarantee the transaction consistency on application level like ABAP)
  • Near real-time replication: It keeps replicating when changes happen (change data capture)
  • Auto reactivation for failures: When an indexserver is turned off for any reasons such as crash or upgrade, all replications are disabled but when the indexserver restarts, those are reactivated and resumes replicating

Getting started with the new remote table replication

First of all, this new feature is supported between SAP HANA Cloud systems. And it is also supported when replicating tables from SAP HANA Platform 2.0 SPS05 or later (source) to SAP HANA Cloud (target) as a hybrid scenario. For this hybrid scenario, Cloud Connector is needed to create a SDA remote sources to SAP HANA Platform 2.0 SPS05 or higher which is protected by a firewall.

Please also see the below table as the supported scenarios.

Scenario Supported?
Replication between SAP HANA Cloud systems Yes
Replication from SAP HANA Platform 2.0 SPS05 or later (source) to SAP HANA Cloud (target) Yes
Replication from SAP HANA Platform 2.0 SPS04 or earlier (source) to SAP HANA Cloud (target) No
Replication from SAP HANA Cloud (source) to SAP HANA Platform 2.0 (target) No
Replication between SAP HANA Platform 2.0 systems No

That is, this remote table replication feature is only supported when the target is SAP HANA Cloud as this is available in SAP HANA Cloud exclusively. So, it cannot be used for replicating tables between SAP HANA Platform 2.0 (on-premise) systems. For replicating tables between SAP HANA Platform 2.0 (on-premise) systems, SAP HANA smart data integration (SDI) is the right solution.

Configuration and activation

In order to use this new remote table replication, the first step is creating a remote source with SDA in the target SAP HANA Cloud. SDA supports 3 types of credential modes to access a remote source with SAP HANA Cloud: technical user, secondary credentials, and single sign-on (SSO) with JSON Web Tokens (JWT). Currently, this new feature only supports the technical user mode, and users need to configure ‘DML Mode’ of a remote source property as ‘readwrite’.

The ‘readwrite’ is a default value for the remote source property ‘DML Mode’ in SAP HANA Database Explorer but please make sure that the created remote source is configured correctly. And the remote source should be named in UPPERCASE to initialize the remote table replication.

For more details of creating an SAP HANA Cloud remote source, please see the below official documentation.

SAP HANA Cloud Data Access Guide: Import Certificates for SSL Connections to Remote Sources
SAP HANA Cloud Data Access Guide: Create an SAP HANA Cloud Remote Source

For creating an SAP HANA on-premise remote source as the hybrid scenario which becomes ever more important, please see the below official documentation.

SAP HANA Cloud Data Access Guide: Set Up the Cloud Connector
SAP HANA Cloud Data Access Guide: Create an SAP HANA On-Premise Remote Source

The next step is creating a virtual table in the target system. The following statement will create a virtual table V_CUSTOMER of MARKETING schema in target system based on CUSTOMER table of SALES schema in source system.

CREATE VIRTUAL TABLE MARKETING.V_CUSTOMER AT "REMOTE_REP"."<NULL>"."SALES"."CUSTOMER";

Then creating a replica table and remote subscription is needed.

-- create a replica table
CREATE COLUMN TABLE MARKETING.R_CUSTOMER LIKE MARKETING.V_CUSTOMER;

-- create a remote subscription
CREATE REMOTE SUBSCRIPTION SUB ON MARKETING.V_CUSTOMER TARGET TABLE MARKETING.R_CUSTOMER;

As the replica table is created with COLUMN LOADABLE by default, the entire columns will be loaded into memory and it ensures the performance. However, if needed, the user can switch it to PAGE LOADABLE. It means that column data is loaded by page into the buffer cache and it can reduce the memory usage. For switching, the user can use the following statement.

-- switch to page loadable
ALTER TABLE MARKETING.R_CUSTOMER PAGE LOADABLE;

-- switch to column loadable
ALTER TABLE MARKETING.R_CUSTOMER COLUMN LOADABLE;

Those are all about the configuration for new remote table replication, and the following statements can be used for activating, deactivating, or dropping the replication.

-- activate the replication
ALTER REMOTE SUBSCRIPTION SUB DISTRIBUTE;

-- deactivate the replication
ALTER REMOTE SUBSCRIPTION SUB RESET;

-- drop the replication
DROP REMOTE SUBSCRIPTION SUB;

If users drop the replication and want to reactivate it, they need to truncate or recreate the replica table. Please find the further details in the below links.

SAP HANA Cloud Data Access Guide: Configure Remote Table Replication with the SDA HANA Adapter
SAP HANA Cloud Data Access Guide: System Views for Monitoring Remote Table Replication

Auto reactivation

As mentioned, this new remote table replication feature supports auto reactivation for failures. During the reactivation, it synchronizes each pair of source and target table first then changes their replication status to ENABLED. Users can also manually do global reactivation with the following statement.

ALTER SYSTEM ENABLE ALL TABLE REPLICAS USING REMOTE SOURCE ALL;

 

Differences from the replication with SDI or SLT

SAP HANA smart data integration (SDI) with HanaAdapter can be used for replicating tables between SAP HANA Cloud systems, but this new feature with SDA hanaodbc adapter provides better performance with log-based replication and simplified approach without deploying the Data Provisioning (DP) Agent.
However, even though it supports near real-time change data capture as SDI does, there are some limitations. For example, the schema of source table and target table must be identical with no additional filters or different partitions. If you would like to know further details of limitations or comparisons between SDI and SDA, please also refer to the below link.

SAP HANA Cloud Data Access Guide: Replicating Tables from Remote Sources

SAP Landscape Transformation (SLT) Replication Server can also be used for replicating tables between SAP HANA systems but it is trigger-based and requires additional installation. For these reasons, this new remote table replication can provides benefits from log-based replication with simplified approach compared to SLT.

Furthermore, the key differentiator compared to SDI and SLT is, as explained, in case multiple source tables are updated within one transaction, it applies the same behavior on the target tables to ensure database level transactional consistency. However, it doesn’t mean that SDI does not support transactional consistency. The difference is that this new remote table replication ensures transactional commit order based on original transaction commit ID.

Other limitations

For other limitations, please refer to the below link.

SAP HANA Cloud Data Access Guide: Unsupported Data Types and Other Limitations

Hybrid scenario and futures

As briefly mentioned above, the hybrid scenario is growing in importance and this new remote table replication will play a crucial role as one of key enablers for the hybrid scenario. In parallel, it will keep evolving and be augmented with more capabilities and better supportabilities. So, please stay tuned for our next blog series.

Assigned Tags

      21 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Werner Dähn
      Werner Dähn

      Nice!

      A couple of thoughts when reading. Appreciate your input.

      1. One other odd limitation is that you cannot create subscriptions on Hana local tables. You always have to create a virtual table that reads via the adapter your own database. With this feature, it should be possible to resolve. Granted, not the most important use case, especially in the light of no-data-duplication but is required sometimes.
      2. You are saying the HanaAdapter of SDI does not support transactional consistency. Are you sure about that?
      3. Are all subscription writer types supported, insert-only, upsert, with change_type and change_time?
      4. Why the limitation of "source and target have the same partitioning"? I assume because of truncate statements? If that is the only reason, truncate table on production data happens less often than partition changes.
      5. What happens with the replication when the source does change the partitioning of a table?
      6. Is the adaptive schema supported, meaning when the source gets an additional column via alter-table-add does the target as well?
      7. The reason SDI does use triggers is because there is no API to read the Hana transaction log, not even an SAP internal one. It seems there is one now. Can this API be used for other consumers as well, e.g. SAP Data Services reading the change data or SAP BW Extractors having an easy way to identify changes or 3rd party?
      8. Will this solution make the SDI Adapter obsolete some time in the future as it can do everything the SDI Adapter can? For the supported Hana versions and deployments at least? Or asking the other way around, other than dpagent and no support for on-prem Hana yet, what are other limitations that might prevent using this new adapter?
      Author's profile photo Tae Suk Son
      Tae Suk Son

      Hi Werner,

       

      1. If the replication is about replicating within the same instance, it should be handled by OSTR (for scale-out). So, no need to use virtual tables for replication. I think the limitation you’re mentioning is about DP server trying to replicate local tables, isn’t it?
      2. In your blog (https://blogs.sap.com/2016/03/26/hana-smart-data-integration-transactional-consistency-and-delta-handling/), you’ve explained that SDI supports this as well. But the difference is that it is based on time changes. There can be multiple transactions committed in the same milli-seconds where as this is respecting the original commit transactions.
      3. The underlying technology is different from SDI, so different writer types are not supported.
      4. As the replication is done at the transactional layer and not session layer, internal dependencies exists
      5. The DDL changes are forwarded to the target
      6. Yes, any DDL changes are also applied to target
      7. This is limited to HANA to HANA only and cannot be consumed by other tools.
      8. There are still differences between SDI such as the writer types you mentioned above and support for flowgraphs

      @Lee, Seungjoon – Please correct if needed

      Author's profile photo Seungjoon Lee
      Seungjoon Lee
      Blog Post Author

      Thank you Tae-Suk!

      Hello Werner,

      Thank you for your feedback and questions.

      As almost questions are already covered by Tae-Suk, let me add a few points as below.

      2. My explanation might somehow mislead readers. I didn’t say that SDI does not support transactional consistency, and that was not my intention. As Tae-Suk commented, it is about the difference. This new remote table replication ensures transaction commit order by leveraging original transaction commit ID.

      I added the below sentence to the blog to avoid any further misleading.

      “However, it doesn’t mean that SDI does not support transactional consistency. The difference is that this new remote table replication ensures transactional commit order based on original transaction commit ID.”

      3. Even though it shares the SQL interface of SDI, its underlying technology is different as Tae-Suk mentioned. It only supports real-time replication with schema change, and users will see ‘feature not supported’ for other writer types.

      5. More precisely, it supports adding empty partitions or dropping existing partitions, but it doesn’t support splitting or shuffling partitions.

      8. No, there is no plan like making the SDI adapter obsolete. However, many customers expect that replicating tables between SAP HANA Cloud systems should be possible natively without any further installation or customer managed application. And it is also natural to extend its scope such as hybrid as long as it is technically feasible and it can provide benefits. So, between SAP HANA systems, this is the recommended option as it gives better optimizations but customers can still prefer SDI as it has richer features for data movement.

      Author's profile photo Mohan Kalidindi
      Mohan Kalidindi

      Like smart phone data transfer ...

      Author's profile photo Lars Breddemann
      Lars Breddemann

      Thanks for this outlook on (yet another) data replication technology for SAP HANA.

      Just as with the already existing products (SDI, SDA, SLT, Sybase Replication, RDSync, ...) it is not quite clear, when to use what and what the implications of each of the different options are.

      Notably, those products focus on a single major data "federation" scenario: data is brought into HANA. While I am a fan of HANA, I can tell that this is not the single best scenario for many customers. This is especially true nowadays, where sufficiently "smart" data storage is available from the diverse cloud providers for a fraction of what HANA costs.

      So, why is there the perception that the SAP user base needs/wants yet another "bring-it-into-HANA" product? Surely the technology could have just been implemented within the existing products if it was simply about improving on the shortcomings of previous developments.

      Is there a specific use-case where this scenario really shines? What is it?

      Another question that I can't quite figure out myself is: why package the replication of HANA internal data structures in ODBC? I understand that in order to operate "in the cloud" one cannot use arbitrary low-level constructs (i.e. using TCP/IP | UDP is the common denominator for network communication, implementing your own wire-protocol on top of ethernet or building your very own network interconnect is simply not an option on public IaaS infrastructure). But ODBC and its demands for type casting and memory-copying don't seem to lend itself for replicating data structures between to instances of the same software.

      Can you provide some insight into this choice?

      Final question (for now): how does the "re-sync" work in case the replication had been interrupted? Where is the "last-known-good" sync point kept? As the virtual tables support read-/write access there must be at least two change-sequence-histories that need to be merged and kept in sync.

      How does that work and how can one monitor this?

      Cheers,

      Lars

      Author's profile photo Seungjoon Lee
      Seungjoon Lee
      Blog Post Author

      Hello Lars,

      Regarding when to use what and a specific use case, please consider this new approach as a preferred option for replicating tables between SAP HANA Cloud instances.

      As you know, SAP HANA Cloud doesn’t support MDC. So, customers cannot create multiple tenants within the same SAP HANA Cloud instance. In on-premise world, MDC is being used for overcoming hardware limitations or for some business reasons. However, in cloud, as we don’t have such hardware limitations and we can leverage cloud elasticity, having separate instances can provide more benefits.

      In that sense, if customers want to have multiple tenants with SAP HANA Cloud as they did in on-premise, provisioning multiple SAP HANA Cloud instances is needed.

      And if they want to replicate tables between those SAP HANA Cloud instances, this is the recommended option as it is optimized for this use case and there is no need to deploy or operate additional application.

      Regarding your second question, ODBC is just used for transferring internal logs and replication requests. And SAP has been investing in SDA for further optimizations and performance improvements.

      Regarding the re-sync, this new approach is not based on keeping the sync points mechanism. And please refer to the above link for monitoring (SAP HANA Cloud Administration Guide: System Views for Monitoring Remote Table Replication).

      Author's profile photo Priyanka Patankar
      Priyanka Patankar

      Nice blog! Thanks for sharing! 

      Author's profile photo Priyanka Patankar
      Priyanka Patankar

      Hi Seungjoon Lee and Tae Suk Son

      Nice blog Seungjoon Lee! Past few months I have been working on SDI that is connecting on-premise HDB to HANA DB on Cloud foundry. Was able to connect successfully through HANAAdapter and the Virtual tables are have the live data,
      Created Replicated tables Like the Virtual tables.
      Created Remote Subscription for the replication from Virtual table to Replicated table.
      Inserted the initial load of records from the Virtual table to the replicated table
      then Altered the Subscription for QUEUE and then DISTRIBUTE

      the replicated tables showed live data for a couple of days, then suddenly it stopped working. Have you faced this scenario before? Am I missing any step? Could you please guide me on this for any troubleshooting steps I need to carry?
      Also, there are not exceptions recorded in neither of the systems, hence not sure what I can check.
      Have also tried few troubleshooting steps following this blog:

      https://blogs.sap.com/2017/12/09/hana-sdi-smart-data-integration-2.0-h2h-real-time-replication-lessons-learned/

      was not able to change the value for DDL scan interval in minutes as the Remote source is running other than the queries on _SYS_TASK due to insufficient privileges, did not find any anomalies!

      Hope to hear from you soon!

      Thanks and Regards,

      Priyanka

      Author's profile photo Seungjoon Lee
      Seungjoon Lee
      Blog Post Author

      Hi Priyanka,

      Thank you! By the way, as I'm a Product Manager for SDA only, unfortunately, I cannot answer your question. I would recommend asking to SDI experts.

      Best,
      Seungjoon

      Author's profile photo Priyanka Patankar
      Priyanka Patankar

      Hi Seungjoon Lee

      No worries! Thank you for your prompt response!

      Regards,

      Priyanka

      Author's profile photo David BIZIEN
      David BIZIEN

      Hi,

      I'm using HANA Services on BTP.
      I'm already using SDI to replicate tables into HANA.

      Can I use this solution to replicate data in realtime across various tables with small transformations ?

      Regards,

       

      Author's profile photo Seungjoon Lee
      Seungjoon Lee
      Blog Post Author

      Hi David,

      No, unfortunately, this is 'SAP HANA Cloud, SAP HANA database' (container-based) only feature.

      In 'SAP Cloud Platform, SAP HANA Service' (VM-based), SDI is the right solution for replicating tables.

      Please also refer to the blog from Denys, SAP (HANA) Cheat Sheet for the differences.

      Best,
      Seungjoon

      Author's profile photo Michael Cocquerel
      Michael Cocquerel

      Is there some options to improve to duration of initial load for large table ?
      In my case, I'm trying to replicate a table that has 201 columns, 356 millions rows and size is 20 GB.

      The initial load is estimated to 12 hours and my issue is that the tunnel connection between SAP Cloud Connector and HANA Cloud get always interrupted during few milliseconds before the end of initial load. This issue is described in note 2682913 - SAP Cloud Connector - Tunnel connection is broken - Connection reset by peer . My network team made some investigation to find the root cause of those intermittent connection cuts but did not find anything. That's the reason why I'm looking on how to improve initial load duration to minimize the risk it gets interrupted.
      Looking at "SYS"."M_REMOTE_STATEMENTS", I have observed the initial load is split into several small selects of 200 000 records max and each single statement is running around 30 seconds. Those statements are running in sequence one after the other. Is there some option to run those statements in parallel ? is there some options to increase the packet size ?

      Author's profile photo Seungjoon Lee
      Seungjoon Lee
      Blog Post Author

      Hello Michael,

      I saw the separate ongoing email discussions, but sorry for the delayed response here.

      Regarding your questions, in general, if you can partition the source table (e.g., 8 partitions, and the target table should be identical to the source table), parallelization can be achieved to some extent, and we can expect improved performance with this approach. However, please kindly understand that the performance will not be improved in proportion to the number of partitions. And as there are many other factors, the performance improvement via partitioning cannot always be guaranteed.

      Apart from that, since we're aware of the difficulties caused by the initial loading of a huge table, we're continuously investing in this area. So, I expect to come back with some progress in the future.

      Author's profile photo Michael Cocquerel
      Michael Cocquerel

      Thanks, I will try this method

      Author's profile photo Michael Cocquerel
      Michael Cocquerel

      Hello,

      I have made 2 tests: one with 10 partitions, one with 20 partitions: For both cases, I have observed in "SYS"."M_REMOTE_STATEMENTS" a maximum of 8 statements running in parallel. Is it a limit that I can change ?

      Author's profile photo Seungjoon Lee
      Seungjoon Lee
      Blog Post Author

      No, the number of parallelization is not a configurable. And this is one of the reasons why I mentioned 8 partitions as an example, and also mentioned that the performance will not be improved in proportion to the number of partitions.

      Author's profile photo Michael Cocquerel
      Michael Cocquerel

      Does the new connection property allowReconnectOnSelect coming with HANA Client 2.16 (see https://help.sap.com/docs/SAP_HANA_CLIENT/f1b440ded6144a54ada97ff95dac7adf/197dc47daa1d43efa5a77d3148717842.html?locale=en-US ) is also available for SDA remote source from HANA Cloud to HANA On-premise ? Would it help in my issue that the tunnel communication from SAP Cloud Connector to HANA Cloud is regularly interrupted by proxy and would avoid having to restart the table initial load from scratch when the issue occurs ?

      Author's profile photo Seungjoon Lee
      Seungjoon Lee
      Blog Post Author

      No, I don't think that allowReconnectOnSelect helps as it is about SQLDBC and JDBC. As you know, SDA based remote table replication is based on ODBC.

      Author's profile photo Michael Cocquerel
      Michael Cocquerel

      I have dropped a table replication subscription but, for unknown reason, on on-prem source system, the related entry is view "SYS"."M_REMOTE_TABLE_REPLICAS" has not been dropped.
      Now, when I try to create a new subcription for the same table, it fails with error : "Error: (dberror) [403]: internal error: Could not execute a SELECT statement on a source table (detail: [SAP AG][LIBODBCHDB SO][HDBODBC] General error;129 transaction rolled back by an internal error: Could not add replica due to duplicate replication status entry)". The same error is also appearing on the on-prem source system indexserver trace.
      How to clean-up "SYS"."M_REMOTE_TABLE_REPLICAS" entries ?

      Author's profile photo Seungjoon Lee
      Seungjoon Lee
      Blog Post Author

      I'm not sure what happened here, but for some unknown reasons, it appears that my previous reply was removed. Here's my response again.

      This is a known issue, and it was improved from the below combinations.

      • SAP HANA Platform 2.0 SPS06 Rev63 or later (source)
      • SAP HANA Cloud QRC 02/2022 or later (target)

      That is, from the above combinations, executing DROP REMOTE SUBSCRIPTION and then CREATE REMOTE SUBSCRIPTION will automatically detect those inconsistencies and clean up the garbage information. So, you will not see this error anymore.

      However, unfortunately, if the versions of SAP HANA Platform and SAP HANA Cloud are lower than above, one possible workaround is cleaning up all of the remote table replication related tables by dropping remote sources on both source and target systems, then creating a new remote source with a different name and establishing the remote table replication again. If you don't want this workaround, and what you want is just cleaning up the duplicate entries without dropping and creating a new remote source, then I have to say that the only option is raising an incident via SAP Support.