Skip to Content
Technical Articles

Replicating Data from Oracle to HANA using SAP HANA Web IDE SDI

Overview

SAP Smart Data Integration provides real time, high speed data provisioning, bulk data movement, and federation. Replication task or flowgraph lets you set up batch or real-time data replication and transformation scenarios in an easy-to-use web application. Data Provisioning Agent hosts data provisioning adapters, enables data federation, replication, and transformation scenarios for on-premise or in-cloud deployments. Monitors for Data Provisioning Agents, remote subscriptions, and data loads are accessible from the SAP HANA cockpit.

Architecture

In all deployments, the basic components are the same. However, the connections between the components may differ depending on whether SAP HANA is deployed on premise, in the cloud, or behind a firewall. Based on our current SAP HANA landscape we use the following deployment options for SDI.

  • System 1: Data Provisioning Server (internal component of SAP HANA Server)
  • System 2: Data Provisioning Agent
  • System 3: Source system

Yo may notice Data Provisioning Server within SAP HANA connects to the agent using the TCP/IP protocol and default port is 5050

Enable the Data Provisioning Server

You need to enable the data provisioning server in SAP HANA server first and if you have a multi-database container scenario,  you need to execute the ALTER DATABASE statement in the system database 

ALTER DATABASE <database_name> ADD 'dpserver' AT LOCATION '<hostname>[:<port_number>]'

Open SAP HANA Administration for SYSTEMDB and switch to Configuration tab. Expand the daemon.ini and you can find dpserver.HD2 (your tenant database). The data provisioning server has been successfully enabled for your tenant database

If you want to remove the data provisioning server in a multi-database container scenario you also need to execute the ALTER DATABASE statement in the system database

ALTER DATABASE <database_name> REMOVE 'dpserver' AT LOCATION '<hostname>[:<port_number>]'

Download the DP Agent & Delivery Unit

For POC we use SAP HANA Server 2.00.037 and need to download corresponding version of DP Agent and Delivery Unit. For details, see the SAP HANA smart data integration Product Availability Matrix (PAM). On the SAP Software Download Center, you can find the installation packages in the following locations:

DP Agent

SAP Software Download Center > Software DownloadsSupport Packages & PatchesBy Alphabetical Index (A-Z)HSAP HANA SDISAP HANA SDI <version_number>Comprised Software Component VersionsHANA DP AGENT <version_number>

DP Delivery Unit

SAP Software Download CenterSoftware DownloadsInstallations & UpgradesBy Alphabetical Index (A-Z)  H  SAP HANA SDI  SAP HANA SDI 2.0 > COMPRISED SOFTWARE COMPONENT VERSIONS > HANA DP 2.0 > Click the ZIP file that you need, and save it to your preferred location > In the HANAIMDP<version number>.ZIP file, find and extract the HANA_IM_DP.tgz file

Import DP Delivery Unit

For multi-tenant database container configuration, you must import the delivery unit into the tenant database that you have enabled DP Server in previous step

For simplicity of security I will use SYSTEM user to import from SAP HANA Studio

In the upper left corner, click File Import. On the Import dialog, type delivery into the search box for Select an import source

Click Delivery Unit on the resulting navigation tree and click Next. Then select your target system, and click Next

On the Import Through Delivery Unit dialog, select Client radio and click Browse and navigate to the location where you downloaded the delivery unit (please extract first), select HANAIMDP.tgz, and click Finish.

Install the DP Agent

System Requirements

  • Windows (64bit) 2016 Service Pack 03
  • Instance type: AWS m5.2xlarge (32GB)
  • AWS Instance name: xxxxxx
  • Access privilege: service account with local administrators group
  • Firewall port: all open (for simplicity)

Software Requirements

  • Find the required supporting libraries inside Oracle client folder
    • ojdbc7.jar:  C:\Oracle\Client12102\jdbc\lib
    • xdb6.jar: C:\Oracle\Client12102\RDBMS\jlib
    • xmlparserv2.jar: C:\Oracle\Client12102\LIB

Note: You must use the Administrator user or a user in the administrators group to install and configure DP Agent.

Installation Steps:

  1. Create a temp folder C:\Install and copy the agent installation file into and extract there. Go inside and find the hdbsetup.exe and right click to run as administrator.
  2. Choose install new SAP HANA Data Provisioning Agent and specify the installation path. I left installation with default setting.
  3. The user that runs the agent service must have read/write access to the installation directory so that configuration files can be updated.

Default Installation Paths as below

Configure DP Agent

You must configure the Data Provisioning Agent before you can use adapters to connect to data sources, create remote sources, and so on.

Start the configuration tool with run as administrator

C:\usr\sap\dataprovagent\configTool\dpagentconfigtool.exe

Connect to the SAP HANA serverSpecify the host name, port, and HANA Agent Admin User credentials for the SAP HANA server

Register DP Agent to HANA server

Start DP Agent by clicking Start Agent Button.Register OracleLogReaderAdapter to HANA server

Data Provisioning Agent Tuning: For large initial loads, open the DP Agent configuration file  C:\usr\sap\dataprovagent\dpagent.ini, configure for 24GB of memory at -Xmx24576m

Disable SSL for Agent to HANA communication on TCP if you don’t have SSL in place.  Beyond POC environment, We recommend to install SSL certificate to encrypt the communication

Lastly we need to copy over those supporting libraries into DP Agent library folder C:\usr\sap\dataprovagent\lib

Create a Remote Source

Oracle Database:

  • User Name – xxxxxx
  • Password – xxxxxx
  • Database Service Name – SCUAT
  • Host name – xxxxxx

To create a remote source we need to provide the following

  1. Remote Source Name: ORCL_UAT
  2. Adapter Name: OracleLogReaderAdapter
  3. Source Location: agent(dpagentorcl)

Also provide the following properties for remote source and some of them are for performance optimization

Lastly, provide Oracle Connection Credential (Technical user)

Test connection to see if it is OK

Oracle Database Permissions

Oracle database users need to have certain permissions granted to them in order to carry out real-time change data capture or batch or initial load transactions. Please go through Oracle Database Permission page to find out all details

SDI supports both database-level supplemental logging and table-level supplemental logging. We decide to go table-level supplemental logging.

Set your logging level in the Adapter Preferences window of the Data Provisioning Agent configuration tool for the Oracle Log Reader adapter. Then, run the necessary scripts found in the oracle_init_example.sql file, located in <DPAgent_root>\LogReader\Scripts.

Here is our table-level supplemental logging

-- Database Level

ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY) COLUMNS;
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (UNIQUE) COLUMNS;Database User Level

-- Create database user

CREATE USER <user_id> IDENTIFIED BY <password> DEFAULT TABLESPACE USERS TEMPORARY TABLESPACE TEMP PROFILE DEFAULT ACCOUNT UNLOCK;

-- Grant two roles for database user

GRANT EXECUTE_CATALOG_ROLE TO <user_id>;
GRANT SELECT_CATALOG_ROLE TO <user_id>;
ALTER USER <user_id> DEFAULT ROLE ALL;

-- Grant 11 system privileges for database user

GRANT ALTER ANY TABLE TO <user_id>;
GRANT CREATE PROCEDURE TO <user_id>;
GRANT CREATE SEQUENCE TO <user_id>;
GRANT CREATE SESSION TO <user_id>;
GRANT CREATE TABLE TO <user_id>;
GRANT CREATE TRIGGER TO <user_id>;
GRANT CREATE VIEW TO <user_id>;
GRANT LOGMINING TO <user_id>;
GRANT SELECT ANY TABLE TO <user_id>;
GRANT SELECT ANY TRANSACTION TO <user_id>;
GRANT UNLIMITED TABLESPACE TO <user_id>;

-- Grant 53 object privileges for database user

GRANT SELECT ON SYS.ATTRIBUTE$ TO <user_id>;
GRANT SELECT ON SYS.CCOL$ TO <user_id>;
GRANT SELECT ON SYS.CDEF$ TO <user_id>;
GRANT SELECT ON SYS.COL$ TO <user_id>;
GRANT SELECT ON SYS.COLLECTION$ TO <user_id>;
GRANT SELECT ON SYS.COLTYPE$ TO <user_id>;
GRANT SELECT ON SYS.CON$ TO <user_id>;
GRANT SELECT ON SYS.DBA_ERRORS TO <user_id>;
GRANT SELECT ON SYS.DBA_LIBRARIES TO <user_id>;
GRANT SELECT ON SYS.DBA_LOG_GROUPS TO <user_id>;
GRANT SELECT ON SYS.DBA_OBJECTS TO <user_id>;
GRANT SELECT ON SYS.DBA_SYNONYMS TO <user_id>;
GRANT SELECT ON SYS.DBA_TABLES TO <user_id>;
GRANT SELECT ON SYS.DBA_TRIGGERS TO <user_id>;
GRANT EXECUTE ON SYS.DBMS_LOGMNR TO <user_id>;
GRANT EXECUTE ON SYS.DBMS_LOGMNR_D TO <user_id>;
GRANT SELECT ON SYS.DEFERRED_STG$ TO <user_id>;
GRANT SELECT ON SYS.GV_$INSTANCE TO <user_id>;
GRANT SELECT ON SYS.GV_$SESSION TO <user_id>;
GRANT SELECT ON SYS.ICOL$ TO <user_id>;
GRANT SELECT ON SYS.IND$ TO <user_id>;
GRANT SELECT ON SYS.INDCOMPART$ TO <user_id>;
GRANT SELECT ON SYS.INDPART$ TO <user_id>;
GRANT SELECT ON SYS.INDSUBPART$ TO <user_id>;
GRANT SELECT ON SYS.LOB$ TO <user_id>;
GRANT SELECT ON SYS.LOBCOMPPART$ TO <user_id>;
GRANT SELECT ON SYS.LOBFRAG$ TO <user_id>;
GRANT SELECT ON SYS.MLOG$ TO <user_id>;
GRANT SELECT ON SYS.NTAB$ TO <user_id>;
GRANT SELECT ON SYS.OBJ$ TO <user_id>;
GRANT SELECT ON SYS.OPQTYPE$ TO <user_id>;
GRANT SELECT ON SYS.PARTOBJ$ TO <user_id>;
GRANT SELECT ON SYS.SEG$ TO <user_id>;
GRANT SELECT ON SYS.SEQ$ TO <user_id>;
GRANT SELECT ON SYS.SNAP$ TO <user_id>;
GRANT SELECT ON SYS.TAB$ TO <user_id>;
GRANT SELECT ON SYS.TABCOMPART$ TO <user_id>;
GRANT SELECT ON SYS.TABPART$ TO <user_id>;
GRANT SELECT ON SYS.TABSUBPART$ TO <user_id>;
GRANT SELECT ON SYS.TS$ TO <user_id>;
GRANT SELECT ON SYS.TYPE$ TO <user_id>;
GRANT SELECT ON SYS.USER$ TO <user_id>;
GRANT SELECT ON SYS.V_$ARCHIVED_LOG TO <user_id>;
GRANT SELECT ON SYS.V_$ARCHIVE_DEST TO <user_id>;
GRANT SELECT ON SYS.V_$DATABASE TO <user_id>;
GRANT SELECT ON SYS.V_$DATABASE_INCARNATION TO <user_id>;
GRANT SELECT ON SYS.V_$INSTANCE TO <user_id>;
GRANT SELECT ON SYS.V_$LOG TO <user_id>;
GRANT SELECT ON SYS.V_$LOGFILE TO <user_id>;
GRANT SELECT ON SYS.V_$LOGMNR_CONTENTS TO <user_id>;
GRANT SELECT ON SYS.V_$LOGMNR_LOGS TO <user_id>;
GRANT SELECT ON SYS.V_$PARAMETER TO <user_id>;
GRANT SELECT ON SYS.V_$TRANSACTION TO <user_id>;

Oracle Table Level

We also enable supplemental  logging for table level with primary key and unique index and please run the scripts located in the oracle_init_example.sql file on your Oracle client to set this up.

Data Provisioning Monitoring

We need to grant sap.hana.im.dp.monitor.roles::Monitoring to the user who will perform monitoring tasks.

Subscription Monitor

<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=DPSubscriptionMonitor

Replication Task Monitor

<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=IMTaskMonitor

DP Agent Monitor

<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=DPAgentMonitor

Access External Resource

User Provided Service

In order to access the remote source created in previous step inside container, we have to create a user provided service to grant access to the container. Basically anything your container need to access has to be granted explicitly in advance. A hdbgrantor file will also be created in Web IDE so object owner and application user will have permission to access the granted resource. They DON’T HAVE access at default. After this the remote source behaves like local object to container.

Inside your MTA project, you can add reference directly at database module. Right click db module and choose New -> Add SAP HANA Service Connection

Select Non HDI Container since we are adding reference to external system resource not HDI container

For Host information, it is optional to provide since you can add reference to project space mapping tenant database only. You can leave it empty or just localhost. For simplicity I used tenant system administrator to provision Oracle remote source but in real implementation you should create a dedicated users for development and operation support.

Alternatively you can create a user provided service using the either XS CLI or XS Advanced Cockpit

xs cups grant-service -p "{\"host\":\"<hostname>\",\"port\":\"3<Instance Number><15|13>\",\"user\":\"<user>\",\"password\":\"<Password>\",\"driver\":\"com.sap.db.jdbc.Driver\",\"tags\":[\"hana\"] , \"schema\" : \"PLAIN\" }"

In case you create user provided service not from SAP HANA Service Connection, you have to open the project mta.yaml file to add resource manually , then attach it as dependency to db module.

Add grant-service resource

Go to module tab and attach grant-service resource as dependency to db module

You also need to create a hdbgrants file which grant access privileges for the remote source to the object owner and application user of container schema.

Create a Replication Task

To replicate data from objects in a remote source into tables in SAP HANA, you must configure the replication process by creating an .hdbreptask file. After the .hdbreptask has been configured, activate it to generate a stored procedure, a remote subscription, one or more virtual tables for objects that you want to replicate, and target tables. Initial Load Only doesn’t need remote subscription otherwise the remote subscription will always be created. When the stored procedure is called, an initial load is run. When realtime is enabled, then subsequent changes are automatically distributed.

Any change to source data during the initial load will be captured in initial load and after initial load the changes will be replicated in realtime.

DDL changes to source tables that are associated with a replication task will be propagated to SAP HANA if you choose replication with structure so that the same changes will be applied to the SAP HANA target tables.

Note: The Replication Editor is available only starting from SAP Web IDE for SAP HANA with XS Advanced Feature Revision 1.

Enable SAP Web IDE Extensions for SDI

Open the Preferences perspective, in the left sidebar, choose (Preferences).Under Workspace Preferences, choose Extensions.In the Extensions pane to the right, select the toggle button for the extension you want to enable.Choose Save.Refresh your browser.

Add Database Module and Replication Task

It is pretty straightforward to add database module into MTA project

Develop Replication Task

Navigate to the src folder inside your added database module. Choose File New  Replication TaskFor best practice we create a folder with name repl to hold all replication tasks, and then add a replication task inside this folder          

Enter a unique name for the task, and then click CreateClick Connect to a remote source to choose a remote source object.      

Select one of the available remote sources, and then click OK.                   

Now we need to click Add Object to add objects from source database      

Click Dictionary Search. Dictionaries can be created on some remote sources so that you can more easily find the objects you want to replicate. You can execute the following statement by building HANA dictionary tables that contain remote source objects.

ALTER REMOVE SOURCE  REMOTE_SOURCE_NAME REFRESH OBJECTS

Navigate to the object level and select the remote source objects that you want to replicate, and then click OK.

Option
Description
Initial load only Performs a one-time data load without any real-time replication. Always available.
Initial + Realtime Performs the initial data load and enables real-time replication. Available when CDC is supported, for tables and virtual tables.
Realtime Enables real-time replication without performing an initial data load. Available when CDC is supported, for tables and virtual tables.
No data transfer Replicates only the object structure without transferring any data. Always available.
Initial + realtime with structure Performs the initial data load, enables real-time replication, and tracks object-level changes. Available when CDC is supported and for tables.
Realtime only with structure Enables real-time replication and tracks object-level changes without performing an initial data load. Available when CDC is supported and for tables.

Click OK to close the Select Remote Objects dialog. The following information is included in the table.

You can sort, filter, or choose to show or hide the column names by clicking any column heading.      

he replication task is ready to run at this point. Before running the task, you can edit one or more replication objects by selecting the objects and making changes at the bottom of the screen.

You can also partition, create filters, define target options such as truncating or dropping the target table or changing the load behavior. See more topics in this section for more information.

Target Properties for replicate table  

Note: Check on Drop Target Table and Truncate Table option will drop or truncate target table.

Partition Data in a Replication Task

Partitioning data can be helpful when you are initially loading a large data set, because it can            improve performance and assist in managing memory usage. Please check further details at            Partition Data in a Replication Task

Create an Expression

Use the Expression Editor to create an expression to enhance or filter the data during replication.

Select a remote object.

Click Edit Details.

Click Projection.To create a column mapping expression, choose Target Table Columns to edit or add a column.

Click in the Mapping column to create the expression.To filter the replication data, click Filter Target Table Rows.

Select the columns that you want to use in your expression. You can drag and drop the column names from the list and place it in the Enter filter expression box.Click or type any operators to complete the expression.Save the replication task.


Edit, Add, or Delete Target Columns

Select a remote object. Click Edit Details.

Click Projection  Target Table Columns.

Click Add Column .

Enter the column name, select the data type, choose whether you want this column to be the primary key or to allow it to contain a null value. Then create an expression for the mapped value.

Click Validate Syntax to check for errors in the expression, and then click Apply.

Save the replication task

Validate your expression before save

Build Database Module

Choose Build / Build Selected Files.

Virtual table(s): Generated in the specified virtual table schema. You can display the contents of the virtual table in SAP HANA studio.

Remote subscription(s): Generated in the schema selected for the virtual table. This is only generated when the Initial load only option is not selected.

Task(s): Generated in the same schema as the target table.

Target table(s): Populated with the content after execution.

Procedure: Generated in the schema of the target table, the procedure performs three functions.       

 Execute a Replication Task

Sets the remote subscription to the Queue status. Calls Start Task to perform the initial load of            the data. Sets the remote subscription to the Distribute status. Any changes,additions or                      deletions made to the source data during the initial load are updated in the target system. Any            changes to the source data thereafter are updated real time to the target.To begin replicating              data, you need to open the container first and find those replication procedures

Click Procedures to display all available replication procedures

Right click the procedure and select Generate Call Statement

Click Run to execute replication task

The replication task begins. To monitor the status, select the replication task name in the workspace list and choose Run  Launch Tasks Overview. The Database Explorer shows the tasks that have been processed in the container.

Measuring Latency

You can measure trends by issuing the following statements at HANA server

ALTER REMOTE SOURCE <remote_source_name> START LATENCY MONITORING <latency_ticket_name> [INTERVAL <interval_in_seconds>]

Starts the collection of latency statistics one time or at regular intervals. The user specifies a target latency ticket in the monitoring view.

Let’s take a quick look at result

SELECT * From “SYS”.”M_REMOTE_SOURCE_LATENCY_HISTORY”

The sequence of components from end to end is

SDB (source database) > LRI (LogReader input) > LRO (LogReader output) > SNDR (sender) > FRAMEWORK > RECEIVER > DISTRIBUTOR > APPLIER.

So you can take a look at statistics and find the where is bottleneck for your replication performance

Data Recovery

After a replication task has failed, you can use the lost data tracker in the Data Provisioning Agent command-line configuration tool to identify and correct data inconsistencies that may have occurred. For details please check SAP document.

 

Thanks,

 

Daniel Liu

8 Comments
You must be Logged on to comment or reply to a post.
  • Few comments from my side.

    1. Indeed very complicated setup comparing to other solutions on the market.
    2. Performance is not production ready. It is not fast and scalable solution. We had significant performance issues with this setup.
    • The SDI development guideline said:

      • “No setting stored in the source because you might have two agents, all settings are stored in Hana and provided to the adapter at start”. All Transaction Log based adapters store their settings in the source.
      • “One agent serves multiple Hana instances”. If S/4Hana, BW and the Data Lake all want to read from one Oracle system, you need to install the agent on that system three times. Architecture isolates the agent on session level. Every new session, regardless from where, is independent. Hence no problem to support that. (Latest JDBC proxy addition violates that agreement)
      • “Users are business users, hence make it simple”. Look at the settings in the datastores. Tons of options.
      • “Simple UIs”. Migration to XSA made all UIs worse. Lots of above complexity stems from XSA.
      • RepTask is not simple either. You want to replicate 1000 objects, hence select them and go. Instead you have to make settings on each and every replicated object. Could have been an advanced setting instead. And if you forgot one table, the entire reptask start from scratch with an initial load of all – there goes your weekend.

      The list goes on and on and on…

      I am using the SQL commands most of the time, if I use SDI.

      However what I prefer is Kafka: https://blogs.sap.com/2019/12/10/modern-sap-erp-data-integration-using-apache-kafka/

       

    • Hi Yuriy,

      It is bit complicated to configure the solution compared to other vendors. But SDI is HANA centric so performance is not bad after we evaluated top vendors on the market also it includes transformation and SDQ.

      Thanks,

      Daniel