Set up real-time trigger-based replication from MSSQL to SAP HANA Cloud (or on-prem) using SAP Smart Data Integration (SDI)
Recently I got a few questions on the setup of real-time replication to SAP HANA Cloud and SAP Data Warehouse Cloud. I had set up such replication before, and had bits and pieces documented, but it turned out it would have been really handy if I would have had the full instructions at hand to repeat the process. There’s several SAP Help pages that, taken together, describe the process, but then you also need to know how to set up the source system, which might not even be an SAP owned database.
Therefore, in this blog I describe how to set up real-time replication, and in this case for a Microsoft SQL, or MSSQL, database. The tool of choice is SAP Smart Data Integration (SDI), which comes with SAP HANA Cloud or the on-prem version.
The whole process is also documented in the readme file of this Github repo. That repo also contains the Web IDE project to built on top of the replication setup. How you can use the Web IDE to create real-time replication tasks, I’ll describe as a follow-up blog.
Trigger or log-based replication?
When you set up real-time replication with SDI, you have the choice between trigger and log-based replication. I won’t go into all the pro’s and con’s of the two techniques, but in my case the choice is simple. My source database is a cloud-based MSSQL database, which does not allow access to the log files. So, trigger-based replication it is.
- A MSSQL database (this tutorial uses an AWS Cloud DB);
- An SAP HANA database (this tutorial uses SAP HANA Cloud);
- The SAP SDI Data Provisioning Agent (this tutorial uses version 22.214.171.124);
- Connection set up between HANA and the SDI Data Provisioning Agent;
- A driver for MSSQL, for SDI to connect (this tutorial uses mssql-jdbc-8.4.1.jre8.jar) which should be stored in the ./lib directory of the data provisioning agent;
- Registration of the MSSQL adapter using the DP Agent client;
- A database client for MSSQL, I used DBeaver.
Prep MSSQL database
For this tutorial, a Microsoft SQL Server Express Edition database was created using the AWS Relational Database Service. The database version is the latest available 14.* version and is sized as a db.t3.small. Anything smaller would even make the creation of a simple table a slow process.
Below, a rundown of how you CAN configure your MSSQL database for trigger-based replication with SDI. I’m saying “CAN”, not “MUST” as the way that I’m creating a user and a schema might not be an MSSQL best practise. I do refer to the SAP Help pages with the requirements for the SDI technical user, so at the least follow those instructions.
Configure MSSQL database
Using the admin user that is provided by AWS, a “technical user” is created which will later be used for SDI to logon with to this source database.
USE master; CREATE LOGIN HC_TECHNICAL_USER WITH PASSWORD = '<password>', CHECK_POLICY = OFF, CHECK_EXPIRATION = OFF;
A database is created specifically for the replication tests
USE master; CREATE DATABASE hc;
The technical user is granted the required privileges according to the SAP Help. The instructions below differ a bit from those, and are not all mandatory. There’s a trade-ff between security concern and usability. For example, I grant access to the entire schema and not individual tables.
Also, *disclaimer*, this is one way you can configure your source database, but it might not be the best way for you. I’m assuming you know what you’re doing.
In the new database, a new schema is created and for the technical user a user is created to login to this database
USE hc; CREATE SCHEMA rep; create user HC_TECHNICAL_USER for login HC_TECHNICAL_USER;
The user should also be allowed to create tables, to run our tests later
--Allow the user to create tables USE hc; GRANT CREATE TABLE TO HC_TECHNICAL_USER;
It’s up to you if you want to replicate DDL changes as well, I chose both DML/DDL.
--Creating a DML trigger requires ALTER permission on the table or schema on which the trigger is being created. USE hc; GRANT ALTER ON SCHEMA::rep TO HC_TECHNICAL_USER; --Creating a DDL trigger with database scopes (ON DATABASE) requires ALTER ANY DATABASE DDL TRIGGER permission in the current database. --Not needed if you don't want to replicate DDL changes USE hc; GRANT ALTER ANY DATABASE DDL TRIGGER TO HC_TECHNICAL_USER; --GRANT CREATE PROCEDURE needed for SDI to create a generic procedure needed for replication USE hc; GRANT CREATE PROCEDURE TO HC_TECHNICAL_USER; --GRANT VIEW SERVER STATE permission to view data processing state, such as transaction ID. This must be granted on the master database. USE master; GRANT VIEW SERVER STATE TO HC_TECHNICAL_USER;
Select access is needed to see data in the table, for example when viewing the content of the virtual table.
USE hc; GRANT SELECT ON SCHEMA::rep TO HC_TECHNICAL_USER;
Besides, we want to also use the technical user to insert, update or delete data so we can run some DML tests with that user
USE hc; GRANT SELECT, INSERT, UPDATE, DELETE ON SCHEMA::rep TO HC_TECHNICAL_USER;
The following is not in the SDI documentation, but was needed to allow replication (I’ll save you the error solving process)
--Create schema for SDI data to be stored USE hc; CREATE SCHEMA HC_TECHNICAL_USER; --Grant privileges to technical user on the created schema USE hc; ALTER AUTHORIZATION ON SCHEMA::HC_TECHNICAL_USER TO HC_TECHNICAL_USER;
Create source table and insert a few records of initial data
DROP TABLE REP.SALES; CREATE TABLE REP.SALES (ID INTEGER, CREATION_DATE DATE, CUSTOMER_NAME NVARCHAR(100), PRODUCT_NAME NVARCHAR (100), QUANTITY INTEGER, PRICE DECIMAL, POS_COUNTRY NVARCHAR(100), PRIMARY KEY (ID)); INSERT INTO REP.SALES VALUES (1,'20200908','Cas Jensen','Toothbrush 747','6','261.54','United States of America'); INSERT INTO REP.SALES VALUES (2,'20201018','Barry French','Shampoo F100','2','199.99','Germany');
Create remote source on HANA
Now it’s time to connect HANA with the MSSQL database, and that’s done using a so-called remote source. The SDI Data Provisioning agent is already setup, and that setup falls outside the scope of this blog.
Below, we set up using the remote source in the Database Explorer, with a graphical UI. The remote source can also be created using a SQL create statement, which you can find in the Github repo readme. I’m using the DBADMIN user for the setup.
When creating the remote source, I set the hostname, port, database (hc in this case), and don’t forget: the “capture mode”, Trigger. And of course, the credentials of the technical user. There are a lot more settings that can be made, but we’re going with the defaults.
Test replication (with UI and SQL statements)
In a follow-up blog, the replication is set up with a Web IDE project with replication tasks. But here we only go until the SQL level (which, by the way, might be enough). So let’s check if the replication works, starting from the DB Explorer.
From the DB Explorer menu, the earlier created remote source is opened. Now you should see the catalog index of the MSSQL database, and should be able to find the SALES table created earlier. Then, choose “Create Virtual Object”, and choose an object name and a schema of choice.
Instead of using the UI, you can also use SQL to create the virtual table:
CREATE SCHEMA REP_MSSQL; CREATE VIRTUAL TABLE "REP_MSSQL"."V_SALES" at "RS_MSSQL"."<NULL>"."rep"."SALES";
Now, querying the virtual table shows the data sitting in the MSSQL source system
Now, let’s set up the real-time replication. For that, we need to create a subscription, and a persistent target table to write records to. The graphical UI for this is part of the Web IDE, but we save that for the next blog. For now, we use SQL statements to create a subscription, do an initial load, and set the subscription in active replication mode by setting it to “DISTRIBUTE”.
-- create persistent target table CREATE TABLE "REP_MSSQL"."T_SALES" LIKE "REP_MSSQL"."V_SALES"; --create subscription CREATE REMOTE SUBSCRIPTION "REP_MSSQL"."SUB_SALES" ON "REP_MSSQL"."V_SALES" TARGET TABLE "REP_MSSQL"."T_SALES" ; ALTER REMOTE SUBSCRIPTION "REP_MSSQL"."SUB_SALES" QUEUE; -- do initial load before distribute remote subscription INSERT INTO "REP_MSSQL"."T_SALES" (SELECT * FROM "REP_MSSQL"."V_SALES"); ALTER REMOTE SUBSCRIPTION "REP_MSSQL"."SUB_SALES" DISTRIBUTE; -- the following should return the two records inserted earlier into the source table SELECT * FROM "REP_MSSQL"."T_SALES"; -- execute the following on the source table in mssql INSERT INTO REP.SALES VALUES (3,'20201020','Zeph Skater','Helmet C172','30','300.00','Spain'); -- check if the record is replicated to the target table SELECT * FROM "REP_MSSQL"."T_SALES";
After executing the above statements, the data preview should now show all three records in the persistent table in HANA.
If the SQL tests are successful, the created objects can again be removed with the following statements.
-- stop the replication and remove the tables ALTER REMOTE SUBSCRIPTION "REP_MSSQL"."SUB_SALES" RESET; DROP REMOTE SUBSCRIPTION "REP_MSSQL"."SUB_SALES"; DROP TABLE "REP_MSSQL"."V_SALES"; DROP TABLE "REP_MSSQL"."T_SALES";
That was a rundown of setting up trigger-based replication from MSSQL to HANA Cloud. Hope that was helpful! In the next blog I go further with the Web IDE in creating replication tasks on top of this setup.