Technical Articles
Set up real-time trigger-based replication from MSSQL to SAP HANA Cloud (or on-prem) using SAP Smart Data Integration (SDI)
Recently I got a few questions on the setup of real-time replication to SAP HANA Cloud and SAP Data Warehouse Cloud. I had set up such replication before, and had bits and pieces documented, but it turned out it would have been really handy if I would have had the full instructions at hand to repeat the process. There’s several SAP Help pages that, taken together, describe the process, but then you also need to know how to set up the source system, which might not even be an SAP owned database.
Therefore, in this blog I describe how to set up real-time replication, and in this case for a Microsoft SQL, or MSSQL, database. The tool of choice is SAP Smart Data Integration (SDI), which comes with SAP HANA Cloud or the on-prem version.
The whole process is also documented in the readme file of this Github repo. That repo also contains the Web IDE project to built on top of the replication setup. How you can use the Web IDE to create real-time replication tasks, I’ll describe as a follow-up blog.
Trigger or log-based replication?
When you set up real-time replication with SDI, you have the choice between trigger and log-based replication. I won’t go into all the pro’s and con’s of the two techniques, but in my case the choice is simple. My source database is a cloud-based MSSQL database, which does not allow access to the log files. So, trigger-based replication it is.
Pre-requisites
- A MSSQL database (this tutorial uses an AWS Cloud DB);
- An SAP HANA database (this tutorial uses SAP HANA Cloud);
- The SAP SDI Data Provisioning Agent (this tutorial uses version 2.5.1.2);
- Connection set up between HANA and the SDI Data Provisioning Agent;
- A driver for MSSQL, for SDI to connect (this tutorial uses mssql-jdbc-8.4.1.jre8.jar) which should be stored in the ./lib directory of the data provisioning agent;
- Registration of the MSSQL adapter using the DP Agent client;
- A database client for MSSQL, I used DBeaver.
Prep MSSQL database
For this tutorial, a Microsoft SQL Server Express Edition database was created using the AWS Relational Database Service. Please check the SAP Product Availability Matrix which databases are officially supported, as at time of writing, the “Express” edition is not, even though it works for this blog post content. The database version is the latest available 14.* version and is sized as a db.t3.small. Anything smaller would even make the creation of a simple table a slow process.
Below, a rundown of how you CAN configure your MSSQL database for trigger-based replication with SDI. I’m saying “CAN”, not “MUST” as the way that I’m creating a user and a schema might not be an MSSQL best practise. I do refer to the SAP Help pages with the requirements for the SDI technical user, so at the least follow those instructions.
Configure MSSQL database
Using the admin user that is provided by AWS, a “technical user” is created which will later be used for SDI to logon with to this source database.
USE master;
CREATE LOGIN HC_TECHNICAL_USER
WITH PASSWORD = '<password>',
CHECK_POLICY = OFF,
CHECK_EXPIRATION = OFF;
A database is created specifically for the replication tests
USE master;
CREATE DATABASE hc;
The technical user is granted the required privileges according to the SAP Help. The instructions below differ a bit from those, and are not all mandatory. There’s a trade-ff between security concern and usability. For example, I grant access to the entire schema and not individual tables.
Also, *disclaimer*, this is one way you can configure your source database, but it might not be the best way for you. I’m assuming you know what you’re doing.
In the new database, a new schema is created and for the technical user a user is created to login to this database
USE hc;
CREATE SCHEMA rep;
create user HC_TECHNICAL_USER for login HC_TECHNICAL_USER;
The user should also be allowed to create tables, to run our tests later
--Allow the user to create tables
USE hc;
GRANT CREATE TABLE TO HC_TECHNICAL_USER;
It’s up to you if you want to replicate DDL changes as well, I chose both DML/DDL.
--Creating a DML trigger requires ALTER permission on the table or schema on which the trigger is being created.
USE hc;
GRANT ALTER ON SCHEMA::rep TO HC_TECHNICAL_USER;
--Creating a DDL trigger with database scopes (ON DATABASE) requires ALTER ANY DATABASE DDL TRIGGER permission in the current database.
--Not needed if you don't want to replicate DDL changes
USE hc;
GRANT ALTER ANY DATABASE DDL TRIGGER TO HC_TECHNICAL_USER;
--GRANT CREATE PROCEDURE needed for SDI to create a generic procedure needed for replication
USE hc;
GRANT CREATE PROCEDURE TO HC_TECHNICAL_USER;
--GRANT VIEW SERVER STATE permission to view data processing state, such as transaction ID. This must be granted on the master database.
USE master;
GRANT VIEW SERVER STATE TO HC_TECHNICAL_USER;
Select access is needed to see data in the table, for example when viewing the content of the virtual table.
USE hc;
GRANT SELECT ON SCHEMA::rep TO HC_TECHNICAL_USER;
Besides, we want to also use the technical user to insert, update or delete data so we can run some DML tests with that user
USE hc;
GRANT SELECT, INSERT, UPDATE, DELETE ON SCHEMA::rep TO HC_TECHNICAL_USER;
The following is not in the SDI documentation, but was needed to allow replication (I’ll save you the error solving process)
--Create schema for SDI data to be stored
USE hc;
CREATE SCHEMA HC_TECHNICAL_USER;
--Grant privileges to technical user on the created schema
USE hc;
ALTER AUTHORIZATION ON SCHEMA::HC_TECHNICAL_USER TO HC_TECHNICAL_USER;
Create source table and insert a few records of initial data
DROP TABLE REP.SALES;
CREATE TABLE REP.SALES (ID INTEGER, CREATION_DATE DATE, CUSTOMER_NAME NVARCHAR(100), PRODUCT_NAME NVARCHAR (100), QUANTITY INTEGER, PRICE DECIMAL, POS_COUNTRY NVARCHAR(100), PRIMARY KEY (ID));
INSERT INTO REP.SALES VALUES (1,'20200908','Cas Jensen','Toothbrush 747','6','261.54','United States of America');
INSERT INTO REP.SALES VALUES (2,'20201018','Barry French','Shampoo F100','2','199.99','Germany');
Create remote source on HANA
Now it’s time to connect HANA with the MSSQL database, and that’s done using a so-called remote source. The SDI Data Provisioning agent is already setup, and that setup falls outside the scope of this blog.
Below, we set up using the remote source in the Database Explorer, with a graphical UI. The remote source can also be created using a SQL create statement, which you can find in the Github repo readme. I’m using the DBADMIN user for the setup.
Figure 1: Create remote source
When creating the remote source, I set the hostname, port, database (hc in this case), and don’t forget: the “capture mode”, Trigger. And of course, the credentials of the technical user. There are a lot more settings that can be made, but we’re going with the defaults.
Figure 2: Remote source capture mode
Test replication (with UI and SQL statements)
In a follow-up blog, the replication is set up with a Web IDE project with replication tasks. But here we only go until the SQL level (which, by the way, might be enough). So let’s check if the replication works, starting from the DB Explorer.
From the DB Explorer menu, the earlier created remote source is opened. Now you should see the catalog index of the MSSQL database, and should be able to find the SALES table created earlier. Then, choose “Create Virtual Object”, and choose an object name and a schema of choice.
Figure 3: The index of the remote source
Figure 4: Choosing the virtual table schema and name
Instead of using the UI, you can also use SQL to create the virtual table:
CREATE SCHEMA REP_MSSQL;
CREATE VIRTUAL TABLE "REP_MSSQL"."V_SALES" at "RS_MSSQL"."<NULL>"."rep"."SALES";
Now, querying the virtual table shows the data sitting in the MSSQL source system
Figure 5: a data preview on the virtual table
Now, let’s set up the real-time replication. For that, we need to create a subscription, and a persistent target table to write records to. The graphical UI for this is part of the Web IDE, but we save that for the next blog. For now, we use SQL statements to create a subscription, do an initial load, and set the subscription in active replication mode by setting it to “DISTRIBUTE”.
-- create persistent target table
CREATE TABLE "REP_MSSQL"."T_SALES" LIKE "REP_MSSQL"."V_SALES";
--create subscription
CREATE REMOTE SUBSCRIPTION "REP_MSSQL"."SUB_SALES" ON "REP_MSSQL"."V_SALES" TARGET TABLE "REP_MSSQL"."T_SALES" ;
ALTER REMOTE SUBSCRIPTION "REP_MSSQL"."SUB_SALES" QUEUE;
-- do initial load before distribute remote subscription
INSERT INTO "REP_MSSQL"."T_SALES" (SELECT * FROM "REP_MSSQL"."V_SALES");
ALTER REMOTE SUBSCRIPTION "REP_MSSQL"."SUB_SALES" DISTRIBUTE;
-- the following should return the two records inserted earlier into the source table
SELECT * FROM "REP_MSSQL"."T_SALES";
-- execute the following on the source table in mssql
INSERT INTO REP.SALES VALUES (3,'20201020','Zeph Skater','Helmet C172','30','300.00','Spain');
-- check if the record is replicated to the target table
SELECT * FROM "REP_MSSQL"."T_SALES";
After executing the above statements, the data preview should now show all three records in the persistent table in HANA.
Figure 6: Tadaaa! All records arrived.
If the SQL tests are successful, the created objects can again be removed with the following statements.
-- stop the replication and remove the tables
ALTER REMOTE SUBSCRIPTION "REP_MSSQL"."SUB_SALES" RESET;
DROP REMOTE SUBSCRIPTION "REP_MSSQL"."SUB_SALES";
DROP TABLE "REP_MSSQL"."V_SALES";
DROP TABLE "REP_MSSQL"."T_SALES";
Conclusion
That was a rundown of setting up trigger-based replication from MSSQL to HANA Cloud. Hope that was helpful! In the next blog I go further with the Web IDE in creating replication tasks on top of this setup.
End-to-end, great blog, Sefan.
For those interested, Bob recently recorded a video tutorial series on this topic showing how to replication from SAP HANA on-premises (sort-of, actually HXE hosted on AWS) to SAP HANA Cloud.
Thanks for the post. In my HANA studio remote source MssqlLogReaderAdapter why I am not seeing capture mode in CDC properties? Is it visible only in Web IDE for HANA?
It seems you are only seeing the log settings. The trigger functionality was added in the SDI DP Agent SP04 Patch2. Are you running that version or a higher version? Otherwise, you can also check the Git repo remote source example and create your remote source using SQL instead of the Studio GUI.
Thanks for the response. That screen-print is from HANA Studio remote source. The DP Agent version is 2.4.2.4 not sure if this version supports is there any way I can find the capture mode under CDC properties via HANA studio? Currently I can see only Log Reader under CDC properties.
Also in the post you mentioned about trigger or Log based replication. Can you share any link which describes the pros and cons of trigger or Log based replication?
Your DP Agent has the required minimal version. Maybe your remote source adapter does not have the newest capabilities of the DP agent. With the following statement, you can refresh the adapter capabilities:
ALTER ADAPTER MssqlLogReaderAdapter REFRESH at location AGENT <agent name>.
For a comparison of log and trigger replication, I don't have anything at hand.
Hi Sefan,
Is it possible to data replicate on real time from SAP Hana Cloud(or OnPrem) to External Customer Data Lake using SDI?
Thanks & Regards,
Suneeth
Hello Community,
We tried setting up Real-Time Replication from a Microsoft SQL Server to our SAP HANA System (both on-prem).
We tried setting up Trigger-Based Replication as per this Blog, but in our Version of the MssqlLogReaderAdapter the option Capture Mode is not available. We tried setting up the Log Based Replication beforehand and as that did not work out in the end, we wanted to switch to Trigger-Based Mode.
But we can not seem to change the Capture Mode as we now see the same Error no matter the Settings of the Remote Source Configuration in HANA.
If possible, kindly direct me towards a documentation, that mentions the Capture Mode Setting in the Remote Source Configuration and what to do if it is not there / visible. Any Information on how to activate / unlock the Capture Mode Option is appreciated.
Best Regards,
Adrian Wummel
Hello Community
there is no official SAP Documentation for this and I hope summarizing my Learnings here will help someone in the future.
Once again, I am taking the Usecase of on-prem Systems, the Set-Up for Cloud Systems could be different.
After sucessfully setting up the Trigger-based Replication in the Source System, that Source System has to be defined as a Remote Source in HANA. The Wizard for creating a Remote Source may not provide sufficient options for the Configuration of the Remote Source.
Therefore I recommend using the SQL Console to create the Remote Source.
Reference to the SQL Command is provided by Sefan in his github Repository: GitHub - slinders/WebIDE_MSSQL_replication
Using the Remote Source provided by this SQL Command enabled my Replication Tasks to correctly use the Trigger-based approach to replicate Data from MS SQL Server to SAP HANA in Real-Time.
Best Regards,
Adrian Wummel