Technical Articles
A new approach for replicating tables across different SAP HANA systems
Update on June 29, 2022:
Update on January 21, 2021:
|
Starting with SAP HANA Cloud, SAP HANA supports a new remote table replication feature for replicating tables across different SAP HANA systems natively. With this new feature, users can handily replicate remote tables as it directly sends DDL and DML logs from a source system to a target system through SAP HANA smart data access (SDA) ODBC connection.In a nutshell, the new remote table replication was developed with the following five key characteristics.
- Transient (in-memory) log-based replication: As this new feature is log-based in transactional layer, it can provide better performance with less overhead to a source system compared to trigger-based replication
- ODBC connection: It ensures optimized performance between HANA systems because it uses SDA ODBC connection with hanaodbc adapter for its log/data transfer
- Transactional consistency: If multiple source tables are updated within one transaction, it applies the same behavior on the target tables to ensure database level transactional consistency (transactional consistency is only guaranteed if the transaction is handled on database level, and it doesn’t guarantee the transaction consistency on application level like ABAP)
- Near real-time replication: It keeps replicating when changes happen (change data capture)
- Auto reactivation for failures: When an indexserver is turned off for any reasons such as crash or upgrade, all replications are disabled but when the indexserver restarts, those are reactivated and resumes replicating
Getting started with the new remote table replication
First of all, this new feature is supported between SAP HANA Cloud systems. And it is also supported when replicating tables from SAP HANA Platform 2.0 SPS05 or later (source) to SAP HANA Cloud (target) as a hybrid scenario. For this hybrid scenario, Cloud Connector is needed to create a SDA remote sources to SAP HANA Platform 2.0 SPS05 or higher which is protected by a firewall.
Please also see the below table as the supported scenarios.
Scenario | Supported? |
Replication between SAP HANA Cloud systems | Yes |
Replication from SAP HANA Platform 2.0 SPS05 or later (source) to SAP HANA Cloud (target) | Yes |
Replication from SAP HANA Platform 2.0 SPS04 or earlier (source) to SAP HANA Cloud (target) | No |
Replication from SAP HANA Cloud (source) to SAP HANA Platform 2.0 (target) | No |
Replication between SAP HANA Platform 2.0 systems | No |
That is, this remote table replication feature is only supported when the target is SAP HANA Cloud as this is available in SAP HANA Cloud exclusively. So, it cannot be used for replicating tables between SAP HANA Platform 2.0 (on-premise) systems. For replicating tables between SAP HANA Platform 2.0 (on-premise) systems, SAP HANA smart data integration (SDI) is the right solution.
Configuration and activation
In order to use this new remote table replication, the first step is creating a remote source with SDA in the target SAP HANA Cloud. SDA supports 3 types of credential modes to access a remote source with SAP HANA Cloud: technical user, secondary credentials, and single sign-on (SSO) with JSON Web Tokens (JWT). Currently, this new feature only supports the technical user mode, and users need to configure ‘DML Mode’ of a remote source property as ‘readwrite’.
The ‘readwrite’ is a default value for the remote source property ‘DML Mode’ in SAP HANA Database Explorer but please make sure that the created remote source is configured correctly. And the remote source should be named in UPPERCASE to initialize the remote table replication.
For more details of creating an SAP HANA Cloud remote source, please see the below official documentation.
SAP HANA Cloud Data Access Guide: Import Certificates for SSL Connections to Remote Sources
SAP HANA Cloud Data Access Guide: Create an SAP HANA Cloud Remote Source
For creating an SAP HANA on-premise remote source as the hybrid scenario which becomes ever more important, please see the below official documentation.
SAP HANA Cloud Data Access Guide: Set Up the Cloud Connector
SAP HANA Cloud Data Access Guide: Create an SAP HANA On-Premise Remote Source
The next step is creating a virtual table in the target system. The following statement will create a virtual table V_CUSTOMER of MARKETING schema in target system based on CUSTOMER table of SALES schema in source system.
CREATE VIRTUAL TABLE MARKETING.V_CUSTOMER AT "REMOTE_REP"."<NULL>"."SALES"."CUSTOMER";
Then creating a replica table and remote subscription is needed.
-- create a replica table
CREATE COLUMN TABLE MARKETING.R_CUSTOMER LIKE MARKETING.V_CUSTOMER;
-- create a remote subscription
CREATE REMOTE SUBSCRIPTION SUB ON MARKETING.V_CUSTOMER TARGET TABLE MARKETING.R_CUSTOMER;
As the replica table is created with COLUMN LOADABLE by default, the entire columns will be loaded into memory and it ensures the performance. However, if needed, the user can switch it to PAGE LOADABLE. It means that column data is loaded by page into the buffer cache and it can reduce the memory usage. For switching, the user can use the following statement.
-- switch to page loadable
ALTER TABLE MARKETING.R_CUSTOMER PAGE LOADABLE;
-- switch to column loadable
ALTER TABLE MARKETING.R_CUSTOMER COLUMN LOADABLE;
Those are all about the configuration for new remote table replication, and the following statements can be used for activating, deactivating, or dropping the replication.
-- activate the replication
ALTER REMOTE SUBSCRIPTION SUB DISTRIBUTE;
-- deactivate the replication
ALTER REMOTE SUBSCRIPTION SUB RESET;
-- drop the replication
DROP REMOTE SUBSCRIPTION SUB;
If users drop the replication and want to reactivate it, they need to truncate or recreate the replica table. Please find the further details in the below links.
SAP HANA Cloud Data Access Guide: Configure Remote Table Replication with the SDA HANA Adapter
SAP HANA Cloud Data Access Guide: System Views for Monitoring Remote Table Replication
Auto reactivation
As mentioned, this new remote table replication feature supports auto reactivation for failures. During the reactivation, it synchronizes each pair of source and target table first then changes their replication status to ENABLED. Users can also manually do global reactivation with the following statement.
ALTER SYSTEM ENABLE ALL TABLE REPLICAS USING REMOTE SOURCE ALL;
Differences from the replication with SDI or SLT
SAP HANA smart data integration (SDI) with HanaAdapter can be used for replicating tables between SAP HANA Cloud systems, but this new feature with SDA hanaodbc adapter provides better performance with log-based replication and simplified approach without deploying the Data Provisioning (DP) Agent.
However, even though it supports near real-time change data capture as SDI does, there are some limitations. For example, the schema of source table and target table must be identical with no additional filters or different partitions. If you would like to know further details of limitations or comparisons between SDI and SDA, please also refer to the below link.
SAP HANA Cloud Data Access Guide: Replicating Tables from Remote Sources
SAP Landscape Transformation (SLT) Replication Server can also be used for replicating tables between SAP HANA systems but it is trigger-based and requires additional installation. For these reasons, this new remote table replication can provides benefits from log-based replication with simplified approach compared to SLT.
Furthermore, the key differentiator compared to SDI and SLT is, as explained, in case multiple source tables are updated within one transaction, it applies the same behavior on the target tables to ensure database level transactional consistency. However, it doesn’t mean that SDI does not support transactional consistency. The difference is that this new remote table replication ensures transactional commit order based on original transaction commit ID.
Other limitations
For other limitations, please refer to the below link.
SAP HANA Cloud Data Access Guide: Unsupported Data Types and Other Limitations
Hybrid scenario and futures
As briefly mentioned above, the hybrid scenario is growing in importance and this new remote table replication will play a crucial role as one of key enablers for the hybrid scenario. In parallel, it will keep evolving and be augmented with more capabilities and better supportabilities. So, please stay tuned for our next blog series.
Nice!
A couple of thoughts when reading. Appreciate your input.
Hi Werner,
@Lee, Seungjoon – Please correct if needed
Thank you Tae-Suk!
Hello Werner,
Thank you for your feedback and questions.
As almost questions are already covered by Tae-Suk, let me add a few points as below.
2. My explanation might somehow mislead readers. I didn’t say that SDI does not support transactional consistency, and that was not my intention. As Tae-Suk commented, it is about the difference. This new remote table replication ensures transaction commit order by leveraging original transaction commit ID.
I added the below sentence to the blog to avoid any further misleading.
“However, it doesn’t mean that SDI does not support transactional consistency. The difference is that this new remote table replication ensures transactional commit order based on original transaction commit ID.”
3. Even though it shares the SQL interface of SDI, its underlying technology is different as Tae-Suk mentioned. It only supports real-time replication with schema change, and users will see ‘feature not supported’ for other writer types.
5. More precisely, it supports adding empty partitions or dropping existing partitions, but it doesn’t support splitting or shuffling partitions.
8. No, there is no plan like making the SDI adapter obsolete. However, many customers expect that replicating tables between SAP HANA Cloud systems should be possible natively without any further installation or customer managed application. And it is also natural to extend its scope such as hybrid as long as it is technically feasible and it can provide benefits. So, between SAP HANA systems, this is the recommended option as it gives better optimizations but customers can still prefer SDI as it has richer features for data movement.
Like smart phone data transfer ...
Thanks for this outlook on (yet another) data replication technology for SAP HANA.
Just as with the already existing products (SDI, SDA, SLT, Sybase Replication, RDSync, ...) it is not quite clear, when to use what and what the implications of each of the different options are.
Notably, those products focus on a single major data "federation" scenario: data is brought into HANA. While I am a fan of HANA, I can tell that this is not the single best scenario for many customers. This is especially true nowadays, where sufficiently "smart" data storage is available from the diverse cloud providers for a fraction of what HANA costs.
So, why is there the perception that the SAP user base needs/wants yet another "bring-it-into-HANA" product? Surely the technology could have just been implemented within the existing products if it was simply about improving on the shortcomings of previous developments.
Is there a specific use-case where this scenario really shines? What is it?
Another question that I can't quite figure out myself is: why package the replication of HANA internal data structures in ODBC? I understand that in order to operate "in the cloud" one cannot use arbitrary low-level constructs (i.e. using TCP/IP | UDP is the common denominator for network communication, implementing your own wire-protocol on top of ethernet or building your very own network interconnect is simply not an option on public IaaS infrastructure). But ODBC and its demands for type casting and memory-copying don't seem to lend itself for replicating data structures between to instances of the same software.
Can you provide some insight into this choice?
Final question (for now): how does the "re-sync" work in case the replication had been interrupted? Where is the "last-known-good" sync point kept? As the virtual tables support read-/write access there must be at least two change-sequence-histories that need to be merged and kept in sync.
How does that work and how can one monitor this?
Cheers,
Lars
Hello Lars,
Regarding when to use what and a specific use case, please consider this new approach as a preferred option for replicating tables between SAP HANA Cloud instances.
As you know, SAP HANA Cloud doesn’t support MDC. So, customers cannot create multiple tenants within the same SAP HANA Cloud instance. In on-premise world, MDC is being used for overcoming hardware limitations or for some business reasons. However, in cloud, as we don’t have such hardware limitations and we can leverage cloud elasticity, having separate instances can provide more benefits.
In that sense, if customers want to have multiple tenants with SAP HANA Cloud as they did in on-premise, provisioning multiple SAP HANA Cloud instances is needed.
And if they want to replicate tables between those SAP HANA Cloud instances, this is the recommended option as it is optimized for this use case and there is no need to deploy or operate additional application.
Regarding your second question, ODBC is just used for transferring internal logs and replication requests. And SAP has been investing in SDA for further optimizations and performance improvements.
Regarding the re-sync, this new approach is not based on keeping the sync points mechanism. And please refer to the above link for monitoring (SAP HANA Cloud Administration Guide: System Views for Monitoring Remote Table Replication).
Nice blog! Thanks for sharing!
Hi Seungjoon Lee and Tae Suk Son
Nice blog Seungjoon Lee! Past few months I have been working on SDI that is connecting on-premise HDB to HANA DB on Cloud foundry. Was able to connect successfully through HANAAdapter and the Virtual tables are have the live data,
Created Replicated tables Like the Virtual tables.
Created Remote Subscription for the replication from Virtual table to Replicated table.
Inserted the initial load of records from the Virtual table to the replicated table
then Altered the Subscription for QUEUE and then DISTRIBUTE
the replicated tables showed live data for a couple of days, then suddenly it stopped working. Have you faced this scenario before? Am I missing any step? Could you please guide me on this for any troubleshooting steps I need to carry?
Also, there are not exceptions recorded in neither of the systems, hence not sure what I can check.
Have also tried few troubleshooting steps following this blog:
https://blogs.sap.com/2017/12/09/hana-sdi-smart-data-integration-2.0-h2h-real-time-replication-lessons-learned/
was not able to change the value for DDL scan interval in minutes as the Remote source is running other than the queries on _SYS_TASK due to insufficient privileges, did not find any anomalies!
Hope to hear from you soon!
Thanks and Regards,
Priyanka
Hi Priyanka,
Thank you! By the way, as I'm a Product Manager for SDA only, unfortunately, I cannot answer your question. I would recommend asking to SDI experts.
Best,
Seungjoon
Hi Seungjoon Lee
No worries! Thank you for your prompt response!
Regards,
Priyanka
Hi,
I'm using HANA Services on BTP.
I'm already using SDI to replicate tables into HANA.
Can I use this solution to replicate data in realtime across various tables with small transformations ?
Regards,
Hi David,
No, unfortunately, this is 'SAP HANA Cloud, SAP HANA database' (container-based) only feature.
In 'SAP Cloud Platform, SAP HANA Service' (VM-based), SDI is the right solution for replicating tables.
Please also refer to the blog from Denys, SAP (HANA) Cheat Sheet for the differences.
Best,
Seungjoon
Is there some options to improve to duration of initial load for large table ?
In my case, I'm trying to replicate a table that has 201 columns, 356 millions rows and size is 20 GB.
The initial load is estimated to 12 hours and my issue is that the tunnel connection between SAP Cloud Connector and HANA Cloud get always interrupted during few milliseconds before the end of initial load. This issue is described in note 2682913 - SAP Cloud Connector - Tunnel connection is broken - Connection reset by peer . My network team made some investigation to find the root cause of those intermittent connection cuts but did not find anything. That's the reason why I'm looking on how to improve initial load duration to minimize the risk it gets interrupted.
Looking at "SYS"."M_REMOTE_STATEMENTS", I have observed the initial load is split into several small selects of 200 000 records max and each single statement is running around 30 seconds. Those statements are running in sequence one after the other. Is there some option to run those statements in parallel ? is there some options to increase the packet size ?
Hello Michael,
I saw the separate ongoing email discussions, but sorry for the delayed response here.
Regarding your questions, in general, if you can partition the source table (e.g., 8 partitions, and the target table should be identical to the source table), parallelization can be achieved to some extent, and we can expect improved performance with this approach. However, please kindly understand that the performance will not be improved in proportion to the number of partitions. And as there are many other factors, the performance improvement via partitioning cannot always be guaranteed.
Apart from that, since we're aware of the difficulties caused by the initial loading of a huge table, we're continuously investing in this area. So, I expect to come back with some progress in the future.
Thanks, I will try this method
Hello,
I have made 2 tests: one with 10 partitions, one with 20 partitions: For both cases, I have observed in "SYS"."M_REMOTE_STATEMENTS" a maximum of 8 statements running in parallel. Is it a limit that I can change ?
No, the number of parallelization is not a configurable. And this is one of the reasons why I mentioned 8 partitions as an example, and also mentioned that the performance will not be improved in proportion to the number of partitions.
Does the new connection property allowReconnectOnSelect coming with HANA Client 2.16 (see https://help.sap.com/docs/SAP_HANA_CLIENT/f1b440ded6144a54ada97ff95dac7adf/197dc47daa1d43efa5a77d3148717842.html?locale=en-US ) is also available for SDA remote source from HANA Cloud to HANA On-premise ? Would it help in my issue that the tunnel communication from SAP Cloud Connector to HANA Cloud is regularly interrupted by proxy and would avoid having to restart the table initial load from scratch when the issue occurs ?
No, I don't think that allowReconnectOnSelect helps as it is about SQLDBC and JDBC. As you know, SDA based remote table replication is based on ODBC.
I have dropped a table replication subscription but, for unknown reason, on on-prem source system, the related entry is view "SYS"."M_REMOTE_TABLE_REPLICAS" has not been dropped.
Now, when I try to create a new subcription for the same table, it fails with error : "Error: (dberror) [403]: internal error: Could not execute a SELECT statement on a source table (detail: [SAP AG][LIBODBCHDB SO][HDBODBC] General error;129 transaction rolled back by an internal error: Could not add replica due to duplicate replication status entry)". The same error is also appearing on the on-prem source system indexserver trace.
How to clean-up "SYS"."M_REMOTE_TABLE_REPLICAS" entries ?
I'm not sure what happened here, but for some unknown reasons, it appears that my previous reply was removed. Here's my response again.
This is a known issue, and it was improved from the below combinations.
That is, from the above combinations, executing DROP REMOTE SUBSCRIPTION and then CREATE REMOTE SUBSCRIPTION will automatically detect those inconsistencies and clean up the garbage information. So, you will not see this error anymore.
However, unfortunately, if the versions of SAP HANA Platform and SAP HANA Cloud are lower than above, one possible workaround is cleaning up all of the remote table replication related tables by dropping remote sources on both source and target systems, then creating a new remote source with a different name and establishing the remote table replication again. If you don't want this workaround, and what you want is just cleaning up the duplicate entries without dropping and creating a new remote source, then I have to say that the only option is raising an incident via SAP Support.