Technical Articles
Set up real-time replication tasks with the SAP Web IDE and SAP Smart Data Integration
The SAP Web IDE provides so-called replication tasks to set up real-time replication. It provides a graphical UI to browse tables in your connected source systems, create virtual tables, create subscriptions for real-time replication, create data load partitioning strategies, and more. The replication tasks in Web IDE are part of the SAP Smart Data Integration (SDI) toolset, which comes integrated with the SAP HANA platform, both on-premise and in the cloud.
However, getting started with replication tasks in the Web IDE can be a bit of a hassle, as you have to work your way through configuring a Web IDE project with service connections to HANA, hdbgrant and yaml files. Therefore, I’d like to share this little setup that I recently made. Once you finished the steps in this blog post, the rest becomes a lot easier.
In a previous blog post I described how to set up real-time replication using SAP Smart Data Integration (SDI), up until the database or SQL level. That level allows you to basically leverage all SDI functionality, purely using SQL commands. But there’s benefits in using the Web IDE on top of that, with an easy to use UI, code generation, and all other things the Web IDE have to offer. In this blog post, I build on the SQL example using the Web IDE. The data source could have been any other database that SDI supports for real-time replication.
Pre-requisites
- An SAP HANA database (this tutorial uses HANA Cloud);
- The Full-Stack Web IDE if you’re running Cloud, or the HANA Web IDE on top of XS Advanced if you’re running on-prem;
- The SAP SDI Data Provisioning Agent (this tutorial uses version 2.5.1.2);
- Connection set up between HANA and the SDI Data Provisioning Agent;
- A source system connected with SDI to HANA. I’ve used MSSQL as configured in the previous blog post, but SDI supports many other sources for real-time replication. You can also use the target HANA system itself as a source and set up HANA trigger-based replication, if it’s just for play.
Create a user in HANA for the User Provided Service
A User Provided Service will be created in the next step, but first we need to create a user on the HANA database that this service can leverage. The user that we create should have the privilege to grant access to create virtual tables, create remote subscriptions, and to process remote subscriptions.
DROP USER cups_mssql_remote_source;
CREATE USER cups_mssql_remote_source PASSWORD "<password>" NO FORCE_FIRST_PASSWORD_CHANGE;
ALTER USER cups_mssql_remote_source DISABLE PASSWORD LIFETIME;
GRANT CREATE VIRTUAL TABLE ON REMOTE SOURCE "RS_MSSQL" TO cups_mssql_remote_source WITH GRANT OPTION; --needed for object owner and reptask editor to browse remote source
GRANT ALTER ON REMOTE SOURCE "RS_MSSQL" TO cups_mssql_remote_source WITH GRANT OPTION; --needed for reptask editor to browse remote source
GRANT CREATE REMOTE SUBSCRIPTION ON REMOTE SOURCE "RS_MSSQL" TO cups_mssql_remote_source WITH GRANT OPTION;
GRANT PROCESS REMOTE SUBSCRIPTION EXCEPTION ON REMOTE SOURCE "RS_MSSQL" TO cups_mssql_remote_source WITH GRANT OPTION;
Now that this preparation is complete, it’s time to create the Web IDE project.
Create a Web IDE project for a HANA Database
No need to explain in detail: from the Web IDE, create a project using the SAP HANA Database template.
Create a User Provided Service
The User Provided Service is needed to provide privileges to the Web IDE project, or rather, to the HANA Deployment Infrastructure (HDI) container that we will create.
Creating a user provided service can be done from the Web IDE. When creating the service, make sure to include the certificate as described in the SAP Help. Make sure to put in the password and the right host and API endpoint of the HANA database. If you are configuring an on-prem HANA database, you can leave out the endpoint, encrypt and certificate tags.
Figure 1: Add HANA service connection to Web IDE project
Figure 2: Configure the service connection
Below the json definition of the service is displayed, which can also be used to create the service from the XSA or CF command line or the SCP Admin UI instead of from the Web IDE.
{
"host": "<database-id>.hana.prod-eu20.hanacloud.ondemand.com",
"port": "443",
"user": "cups_mssql_remote_source",
"password": "<password>",
"driver": "com.sap.db.jdbc.Driver",
"tags": [
"hana"
],
"endpoint": "https://api.cf.eu20.hana.ondemand.com",
"encrypt": true,
"certificate": "-----BEGIN CERTIFICATE-----\nMIIDrzCCApegAwIBAgIQCDvgVpBCRrGhdWrJWZHHSjANBgkqhkiG9w0BAQUFADBh\nMQswCQYDVQQGEwJVUzEVMBMGA1UEChMMRGlnaUNlcnQgSW5jMRkwFwYDVQQLExB3\nd3cuZGlnaWNlcnQuY29tMSAwHgYDVQQDExdEaWdpQ2VydCBHbG9iYWwgUm9vdCBD\nQTAeFw0wNjExMTAwMDAwMDBaFw0zMTExMTAwMDAwMDBaMGExCzAJBgNVBAYTAlVT\nMRUwEwYDVQQKEwxEaWdpQ2VydCBJbmMxGTAXBgNVBAsTEHd3dy5kaWdpY2VydC5j\nb20xIDAeBgNVBAMTF0RpZ2lDZXJ0IEdsb2JhbCBSb290IENBMIIBIjANBgkqhkiG\n9w0BAQEFAAOCAQ8AMIIBCgKCAQEA4jvhEXLeqKTTo1eqUKKPC3eQyaKl7hLOllsB\nCSDMAZOnTjC3U/dDxGkAV53ijSLdhwZAAIEJzs4bg7/fzTtxRuLWZscFs3YnFo97\nnh6Vfe63SKMI2tavegw5BmV/Sl0fvBf4q77uKNd0f3p4mVmFaG5cIzJLv07A6Fpt\n43C/dxC//AH2hdmoRBBYMql1GNXRor5H4idq9Joz+EkIYIvUX7Q6hL+hqkpMfT7P\nT19sdl6gSzeRntwi5m3OFBqOasv+zbMUZBfHWymeMr/y7vrTC0LUq7dBMtoM1O/4\ngdW7jVg/tRvoSSiicNoxBN33shbyTApOB6jtSj1etX+jkMOvJwIDAQABo2MwYTAO\nBgNVHQ8BAf8EBAMCAYYwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUA95QNVbR\nTLtm8KPiGxvDl7I90VUwHwYDVR0jBBgwFoAUA95QNVbRTLtm8KPiGxvDl7I90VUw\nDQYJKoZIhvcNAQEFBQADggEBAMucN6pIExIK+t1EnE9SsPTfrgT1eXkIoyQY/Esr\nhMAtudXH/vTBH1jLuG2cenTnmCmrEbXjcKChzUyImZOMkXDiqw8cvpOp/2PV5Adg\n06O/nVsJ8dWO41P0jmP6P6fbtGbfYmbW0W5BjfIttep3Sp+dWOIrWcBAI+0tKIJF\nPnlUkiaY4IBIqDfv8NZ5YBberOgOzW6sRBc4L0na4UU+Krk2U886UAb3LujEV0ls\nYSEY1QSteDwsOoBrp+uvFRTp2InBuThs4pFsiv9kuXclVzDAGySj4dzp30d8tbQk\nCAUw7C29C79Fv1C5qfPrmAESrciIxpg0X40KPMbp1ZWVbd4=\n-----END CERTIFICATE-----"
}
Below you see the yaml file, where the green rectangles mark the changes that were automatically made by the Web IDE as part of the service creation. If you have created the service otherwise, then you will have to make such adjustment to the yaml file yourself. Of course, you are free to choose more suitable names than the ones generated by the Web IDE.
Figure 3: automatic YAML file adjustments for created service connection
Hdbgrants file
The HDI container owner and runtime users need privileges on the remote source to be able to browse remote tables, create virtual tables, and create remote subscriptions. These are provided by the grantor service, but the privileges also need to be requested in the hdbgrants file.
The object owner needs the following privileges:
- CREATE VIRTUAL TABLE, CREATE REMOTE SUBSCRIPTION on the remote source. That’s easily explained as the HDI container owner simply needs to be able to create those two object types for that remote source.
The application user need the following privileges:
- CREATE VIRTUAL TABLE: this one is needed for the replication task editor to be able to browse the remote source
- PROCESS REMOTE SUBSCRIPTION EXCEPTION: this one is needed for the runtime user to process remote exceptions if they occur (e.g. if a truncate operation is performed in the source and it’s not supported by the SDI adapter, you need to clean things up and ignore that exception).
Figure 4: hdbgrants file for requesting privileges from HANA
After that’s done, build the db folder, and then a replication task can be created.
Figure 5: create replication task
We navigate to the remote tables that needs to be replicated, and select “initial + realtime” so that an initial load is done, after which replication is turned on. As you can see, you can also choose to skip the initial load, just create a virtual table, or only do an initial load, or choose any other option.
Figure 6: configure replication task
You can see in the database explorer that two tables are created: a virtual table and a persistent table. The latter one is not filled yet, as the initial load has not run yet.
Figure 7: virtual table query result
For executing the initial load and starting the subscription, a procedure is generated that simply needs to be called. If you had created multiple subscriptions for multiple source tables, this would load and start them all.
Figure 8: procedure to kickstart initial load and replication
After running the procedure, you can see that the persistent table received the initial load.
Figure 9: data preview of the persistent table
So now let’s see if a record that is newly inserted in the source database, is replicated to HANA.
Figure 10: add record to source database
And yes, there it is.
Figure 11: result
The replication task provides a lot more options than the simple 1-on-1 replication than was set up in this blog post. For example, you can turn on full history tracking of all source record changes simply by choosing Load Behaviour “Preserve all”.
Figure 12: preserve all
If that’s turned on, let’s see what happens when we update a record in the source db.
Figure 13: preserve all
Now you can see that the entire change history is recorded, including change type (insert, update, delete) and when this happened.
Figure 14: preserve all
Conclusion
The Web IDE editor for SDI replication tasks provided you a rich toolset to manage all your real-time replications. It is an add-on to the capabilities you already have on the HANA level. This blog post provides you with an example of how to set up this replication, and should get you started fast.
What I have not covered is the monitoring of the subscriptions, which can be done using the DB explorer tooling, or by simply viewing the SQL views that provide SDI monitoring data, from any client database tool that can connect to HANA. That’s maybe for next time.
Nice article, Thank you for sharing this. 🙂
Thanks Sefan, nice blog on data integration in HANA cloud. Is there an SAP tool or service which can be used to extract data out of HANA cloud and send it to a SFTP location? Considering the note that BODS is not available for us.
Thanks
Sreekanth
Helle everyone,
currently I face an issue which i can't solve myself. Maybe I will find a solution by posting my challenge.
Scenario:
Note: The scenario described above worked pretty fine for about two hours. We captured every single data change coming from the DB2 source system. At one point we truncated the table we had in use for testing directly within DB2. Since then I am no longer able to get a replication task mode "initial + realtime" running although "initial only" works as fine as before.
So the moment I execute the replication task in "initial + realtime" i receive the following error message:
Could not execute 'CALL "DELTA_XXX"."DataWareHouse.Database.Replication ...'
Error: (dberror) [686]: start task error: "DELTA_XXX"."DataWareHouse.Database.Replication Tasks::XXX_DELTA.START_REPLICATION": line 27 col 6 (at pos 1110): [140038] Error executing SQL command in task.;Error executing ALTER_REMOTE_SUBSCRIPTION_QUEUE with command: 'ALTER REMOTE SUBSCRIPTION "DELTA_XXX"."SUB_XXX" QUEUE'. ,exception 71000129: SQLException
exception 71000256: QUEUE: SUB_XXX: Failed to add subscription for remote subscription SUB_XXX[id = 16579466] in remote source XXX_LogReader[id = 14918173]. Error: exception 151050: CDC add subscription failed: RS[XXX_LogReader]: Failed to add the first subscription. Error: Failed to subscribe table ["SCHEMA"."XXX"]. Error: Initialization for replication of database <XXX> has not been done.
Maybe you guys have a working solution for me. Thanks in advance.
There could be several causes for this, and there might be more details in the framework.trc trace of your dpagent. It could also be that you are now using a different remote user than before and haven't cleaned up the previous one. Best to check out the agent log files, and otherwise log an incident with SAP support.
Hi Lars,
after rebuilding the RepTask (e.g. adding a field, changing data types etc.) we loose all our subscriptions which were started before. Is this a common process, that we need to re-initialize the RepTask after every new build or is there a work around to avoid the re-init after any rebuilds?
Loosing all our subscriptions after a new build can lead to big issues. I am thinking about a scenario where we once loaded 600 mio. data records initially and then have to do this all over again just because we made minor changes on the RepTask and rebuilt it.
Regards
How to stop a execution of replication task with realtime after start and maintain datas in the table?
You can find how to manage subscriptions here and the SQL commands for it here