Replicating data from Oracle DB 12c to SAP HANA 2.0SPS04 using SDI
This tutorial’s goal is to share how to extract data from an Oracle database and replicate it to SAP HANA. We use of SAP Smart Data Integration (SDI) to replicate data in real-time from the oracle database log files.
The content is written for informative purpose on how to use SAP HANA features, and should not be considered as a guide on using Oracle products. All usage of Oracle products has to follow license guidelines of aforementioned products.
SDI provides real-time data provisioning, bulk data movement, and federation capabilities. It includes two components: Data Provisioning Server (hereafter DP server) on the SAP HANA database server, and Data Provisioning Agent (hereafter DP agent) as an intermediary between SAP HANA and the external sources.
Data Provisioning Agent hosts data provisioning adapters. It enables data federation, replication, and transformation scenarios for on-premise or in-cloud deployments. As it can be installed on a separate host from the data source, it can be used in secure enterprise network scenarios.
Replication tasks and flowgraphs let you set up batch or real-time data replication and transformation scenarios in an easy-to-use web application.
In this tutorial, we will use SAP Web-based development workbench to set up SAP Smart Data Integration and create a replication task from an Oracle 12c database.
SAP HANA cockpit allows users to monitorr Data Provisioning Agents, remote subscriptions, and data loads.
This tutorial is based on the blog of Feng liu : Replicating Data from Oracle to HANA using SAP HANA Web IDE SDI
However we do not use the SAP Web IDE, and go further in details over the oracle database installation.
In all deployments, the basic components are the same. However, the connections between the components may differ depending on whether SAP HANA is deployed on premise, in the cloud, or behind a firewall. Based on our current SAP HANA landscape we use the following deployment options for SDI.
- System 1: Data Provisioning Server (internal component of SAP HANA Server)
- System 2: Data Provisioning Agent
- System 3: Source system
You may notice Data Provisioning Server within SAP HANA connects to the agent using the TCP/IP protocol and default port is 5050
1) Install Oracle database 12c
2) Oracle database permissions
3) Activate DP server on the SAP HANA server
4) Download DP Delivery unit from SAP Service Marketplace as a ZIP file
5) Import DP Delivery Unit
6) Download DP agent from SAP Service Marketplace as a ZIP file
7) Copy necessary Oracle drivers in lib folder of DP agent
8) Configure DP agent
9) Enable the web dispatcher so that you can access all tenants through a web browser
10) Create a remote source
11) Create a replication task
12) Execute the replication task
13) Data Provisioning Monitoring
Install Oracle database 12c
For this PoC, we installed and oracle Database 12c on the windows host where we will install DP agent later. These two machines can also be different.
Find oracle database 12c here : https://www.oracle.com/database/technologies/database12c-win64-downloads.html
Download and extract the installer, then install the database by clicking on setup.exe in the extracted folder.
Follow the wizard for the installation.
Select the first option, to create a new database.
Choose desktop class.
Use an embedded user in Windows.
Define your oracle base. In our case, we use C:\Oracle
Verify your database edition, your character set, your global database name and your admin user password.
Then, choose if you want to create your database as single tenant, or multi-tenant. We choose to create the database as a multi-tenant database, with a pluggable database (PDB) called pdborcl
Follow the wizard and complete the installation.
Oracle database permissions
Oracle database users need to have certain permissions granted to them in order to carry out real-time change data capture or batch or initial load transactions. Please go through Oracle Database Permission page to find out all details.
SDI supports both database-level supplemental logging and table-level supplemental logging. We decide to go with the default, table-level supplemental logging.
You can set your logging level later in the Adapter Preferences window of the Data Provisioning Agent configuration tool for the Oracle Log Reader adapter.
You can find the necessary permissions in scripts located in <DPAgent_root>\LogReader\Scripts
If you created your Oracle DB as a single-tenant database, run oracle_init_example.sql to create a communication user and grant it the necessary roles/privileges. (See below for multi-tenant)
Open Oracle SQL Developer(or SQL Plus) and create a new connection to your database with the user SYS as a SYSDBA.
Open an SQL Console with the user SYS as a SYSDBA and execute the contents of oracle_init_example.sql.
These SQL statements create a new user LR_USER in the oracle database and grant it the roles to access logs from all database tables.
You will need to un-comment the parts which concern your system, and possibly replace the user provided, LR_USER, by any user you want to use. I used the SYSTEM user.
The roles given here are essential for SDI to access the logs from your Oracle database.
If you created your Oracle DB as a multi-tenant database, create a first connection to your container database using SYS user as SYSDBA.
Run oracle_multitenant_init_example_for_container_database.sql to create a communication user and grant it the necessary permissions in the container database.
Then, create a second connection to your pluggable database using SYS user as SYSDBA.
Run oracle_multitenant_init_example_for_pluggable_database.sql to create a communication user and grant it the necessary permissions in the pluggable database.
Activate DP server on the SAP HANA server
In order to run Smart Data Integration, you need to enable the data provisioning server in SAP HANA server. If your SAP HANA system is set up as a multi-tenant DB, you need to execute the ALTER DATABASE statement in the system database.
Connect to your SAP HANA system database as an admin user with HANA Studio.
Right click on your SYSTEMDB, select administration panel and go to the configuration tab.
Within daemon.ini, verify that there is a service dpserver.<tenant_name> active.
If not, open an SQL console on your SYSTEMDB, and execute the following statement :
ALTER DATABASE <database_name> ADD 'dpserver' AT LOCATION '<hostname>[:<port_number>]'
If you want to remove the data provisioning server in a multi-database container scenario you also need to execute the ALTER DATABASE statement in the system database
ALTER DATABASE <database_name> REMOVE 'dpserver' AT LOCATION '<hostname>[:<port_number>]'
We use SAP HANA Server 2.00.045 and need to download corresponding version of DP Agent and Delivery Unit.
For details, see the SAP HANA smart data integration Product Availability Matrix (PAM). On the SAP Software Download Center, you can find the installation packages in the following locations:
Download DP Delivery unit from SAP Service Marketplace as a ZIP file
SAP Software Download Center > Software Downloads > Installations & Upgrades > By Alphabetical Index (A-Z) H SAP HANA SDI SAP HANA SDI 2.0 > COMPRISED SOFTWARE COMPONENT VERSIONS > HANA DP 2.0 > Click the ZIP file that you need, and save it to your preferred location > In the HANAIMDP<version number>.ZIP file, find and extract the HANA_IM_DP.tgz file
Import DP Delivery Unit
For multi-tenant database container configuration, you must import the delivery unit into the tenant database that you have enabled DP Server in previous step.
For simplicity of security I will use SYSTEM user to import from SAP HANA Studio. In the upper left corner, click File Import. On the Import dialog, type delivery into the search box for Select an import source
Click Delivery Unit on the resulting navigation tree and click Next. Then select your target system, and click Next
On the Import Through Delivery Unit dialog, select Client radio button and click Browse and navigate to the location where you downloaded the delivery unit (please extract first), select HANAIMDP.tgz, and click Finish.
Download DP agent from SAP Service Marketplace as a ZIP file
The next step is to install DP agent on the server which will be used to replicate data from Oracle database to your SAP HANA database. It can be installed on the Oracle database server, on a cloud server, or on your own local computer. For this tutorial, we installed it on the same server as the oracle database.
SAP Software Download Center > Software Downloads > Support Packages & Patches > By Alphabetical Index (A-Z) > H > SAP HANA SDI > SAP HANA SDI <version_number> > Comprised Software Component Versions > HANA DP AGENT <version_number>
If you download it as a SAR file, you need to extract the SAR archive with SAPCAR, using the following command:
SAPCAR -xvf IMDB_DPAGENT200_03P_20-70002517.SAR
Note: DP Agent version must match the Hana version. Which version to use is best found out by looking at the PAM for Hana SDI.
You must use the Administrator user to install and configure DP Agent.
Find hdbsetup.exe and right click to run as administrator.
Choose install new SAP HANA Data Provisioning Agent and specify the installation path. I left installation with default setting. -> C:\usr\sap\dataprovagent
Copy necessary Oracle drivers in lib folder of DP agent
DP agent requires libraries from Oracle to import data, depending on the version of your Oracle database. Find which libraries are required on the Product Availability Matrix (PAM) .
Find the required supporting libraries inside Oracle client folder
- ojdbc7.jar: C:\Oracle\product\12.1.0\dbhome_1\jdbc\lib
- xdb6.jar: C:\Oracle\product\12.1.0\dbhome_1\RDBMS\jlib
- xmlparserv2.jar: C:\Oracle\product\12.1.0\dbhome_1\LIB
Paste the supporting libraries into the DP agent lib folder.
Configure DP agent
Configuring the Agent in Command-Line Interactive Mode
Open a terminal in windows.
The command-line agent configuration tool requires the DPA_INSTANCE environment variable to be set to the installation root location (<DPAgent_root>).
On windows, open a command prompt.
Set the DPA_INSTANCE
Navigate to <DPAgent_root>\bin
Start the configuration tool with the –configAgent parameter.
Make sure the agent is started with option 1. If needed, start the agent with option 2.
Connect the DP agent to SAP HANA with option 6.
You will need an Agent Admin HANA User used for communication between DP agent and dpserver. The Agent Admin HANA User must have the System privileges : AGENT ADMIN and ADAPTER ADMIN
To know which port you should reach on the SAP HANA tenant, execute this SQL query in HANA Studio, with an SQL console active on HANA SYSTEMDB
SELECT * FROM SYS_DATABASES.M_SERVICES
Connect to the SQL Port of the tenant database where you installed dpserver. 30041 in this case.
Come back to the main menu, then register the agent to dpserver active in your tenant of SAP HANA, by using option 7. You must provide a name for your DP Agent and the IP address of your Windows host.
My agent is registered by the name dataprovagent.
Go back to the main menu, then register the adapter that you will use with SAP HANA by using option 8. You can display all available adapters and register the one you want to use. Just input its name and it’s done!
The OracleLogReaderAdapter gets registered in dpserver.
Enable the webdispatcher so that you can access all tenants through a web browser
In HANA Studio, open the administration panel by right-clicking on your SYSTEMDB.
Go to configuration tab. Filter for webdispatcher.ini > profile > wdisp/system_auto_configuration –> set it to true.
In xsengine.ini, set up public urls for each tenant as described here.
Double click on http_url to set-up the http URL. Then, repeat the same steps for the https URL if needed.
Give each tenant which you want to access through a web browser its own URL. Then save your changes by clicking “Save”.
Edit your Windows client’s “hosts file” to add the IP address of the HANA systemdb and tenants.
You can find the hosts file in C:\Windows\System32\drivers\etc
Always open your hosts file as an administrator.
Add the HANA system’s IP address and then its host name as described below :
xxx.xxx.xxx.xxx <host name for SYSTEMDB>
xxx.xxx.xxx.xxx <host name for tenant DB>
Now you are able to access SAP HANA XS apps, including the web-based development workbench for each tenant. We will use the web-based development workbench to create remote sources, and replicate data from Oracle database to SAP HANA.
Create a Remote source
We are going to create remote sources, virtual tables and replication tasks with the SAP HANA Web-based development workbench in this tutorial. You could also work with the SAP Web IDE.
In order to access the workbench, you need a user with the role sap.hana.ide.roles::Developer.
You can grant roles to users with HANA Studio, or with the HANA Cockpit. In this blog I use the HANA Cockpit –> It is available on the SAP HANA XS Web server at the following URL: https://<system db xs webserver host name>:51027/cockpit#Shell-home
Log in to HANA Cockpit, then access the tenant DB in which you want to replicate data.
Create the user which will be used for development and assign the role sap.hana.ide.roles::Developer to that user.
Use this user to log in to the SAP HANA Web-based Development Workbench. It is available on the SAP HANA XS Web server at the following URL:
http://<tenant db xs webserver host name>:80<SAPHANAinstance>/sap/hana/ide
Select Catalog, go into Provisioning and right Click “Remote Sources” to create a new remote source.
Create a Remote source to your Oracle database.
You can specify the Database host and port number directly in the remote source definition, or you can based your remote source on the TNSNAMES files by specifying its path on your DP agent host.
( C:\Oracle\product\12.1.0\dbhome_1\NETWORK\ADMIN\tnsnames.ora )
If you installed your Oracle database as a Multitenant database, you will need the name of the container database service (CDB), and the pluggable database service (PDB) which you want to access.
You will also have to provide the username and password of the users you prepared at step 2) Oracle database permissions
The Oracle LogReader adapter supports two capture modes : LogMiner and Trigger.
We use the log-based replication as it is generally beneficial over the trigger-based replication
In order to create virtual tables based on that remote source, you need to grant access to the sys_repo user.
Execute the following SQL from SAP HANA Studio as the user who created the remote source.
GRANT CREATE VIRTUAL TABLE, CREATE REMOTE SUBSCRIPTION ON REMOTE SOURCE "<Remote_source_name>" to _SYS_REPO; GRANT SELECT, INSERT, UPDATE, DELETE, EXECUTE ON SCHEMA "<schema_name>" TO _SYS_REPO WITH GRANT OPTION;
Create a replication task
To replicate data from objects in a remote source into tables in SAP HANA, you must configure the replication process by creating an .hdbreptask file. After the .hdbreptask has been configured, activate it to generate a stored procedure, a remote subscription, one or more virtual tables for objects that you want to replicate, and target tables. Initial Load Only doesn’t need remote subscription otherwise the remote subscription will always be created. When the stored procedure is called, an initial load is run. When realtime is enabled, then subsequent changes are automatically distributed.
Any change to source data during the initial load will be captured in initial load and after initial load the changes will be replicated in realtime.
DDL changes to source tables that are associated with a replication task will be propagated to SAP HANA if you choose replication with structure so that the same changes will be applied to the SAP HANA target tables.
Through the SAP HANA Web-based Development Workbench, access the Editor
Right click on the Content folder and create a new package
Right click the newly created package and create a new Replication Task.
Select the remote source which you just created, then define the schema in which you want to create a virtual table(which does not store data physically) and the schema in which you want to replicate data.
You can select the remote objects which you want to replicate, and the replication behavior.
|Initial load only||Performs a one-time data load without any real-time replication. Always available.|
|Initial + Realtime||Performs the initial data load and enables real-time replication. Available when CDC is supported, for tables and virtual tables.|
|Realtime||Enables real-time replication without performing an initial data load. Available when CDC is supported, for tables and virtual tables.|
|No data transfer||Replicates only the object structure without transferring any data. Always available.|
|Initial + realtime with structure||Performs the initial data load, enables real-time replication, and tracks object-level changes. Available when CDC is supported and for tables.|
|Realtime only with structure||Enables real-time replication and tracks object-level changes without performing an initial data load. Available when CDC is supported and for tables.|
After you define all the objects which you want to replicate, save and activate your replication task.
Execute the replication task
In order to start the replication task, right click it and select Execute.
Executing the replication task sets the remote subscription to the Queue status. It performs the initial load of the data, then sets the remote subscription to the Distribute status.
Any changes, additions or deletions made to the source data during the initial load are updated in the target system. Any changes to the source data thereafter are updated real time to the target.
You can verify the result directly in the table in the Catalog or use the monitoring tools below.
Data Provisioning Monitoring
We need to grant sap.hana.im.dp.monitor.roles::Monitoring to the user who will perform monitoring tasks.
<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=DPSubscriptionMonitor
Replication Task Monitor
<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=IMTaskMonitor
DP Agent Monitor
<host name>:80<2 digit instance number>/sap/hana/im/dp/monitor/?view=DPAgentMonitor
Special thanks to Werner Dähn and Feng Liu for all the content they posted on-line. Go check their articles if you need further information on Smart Data Integration.
great blog with useful information and howto!
one point "We use the log-based replication as it is generally beneficial over the trigger-based replication" , as you may know log-based replication is not supported for some oracle version
Thanks for sharing article.
I have been setup ODBC and replication using the procedure. But data transfer from Oracle to HANA is very slow to do initial load. Do you have any hint ?