Hana Smart Data Integration – Realtime Table Replication
This post is part of an entire series
Hana Smart Data Integration – Overview
- Hana Smart Data Integration – Adapters
- Hana Smart Data Integration – Batch Dataflows
- Hana Smart Data Integration – Realtime Table Replication
- Hana Smart Data Integration – Realtime Sources with Transformations
- Hana Smart Data Integration – Realtime Sources with History Preserving
- Hana Smart Data Integration – Architecture
- Hana Smart Data Integration – Fun with Transformation Services
One of the more basic use cases is to copy data from a source database into Hana. Either just once to try out Hana or doing an initial load and setup realtime replication for all changes in order to have Hana side by side with the source system, e.g. for reporting.
Using the Smart Data Integration technology that can be done, you have to create virtual tables, write the SQL or dataflow for the initial load – one per source table – and create the realtime subscriptions. Or in short, use the Hana WebIDE to create a hdbreptask object that does all of that for you.
But let’s do something different instead of replicating a source database into Hana this time. Just for fun, a database would work as well of course. We want to replicate the Twitter!
For that we need an Adapter to connect to Twitter – no problem, shipped together with the Agent – and create a remote source with the Twitter login credentials. Everything else is straight forward:
Open the hdbreptask editor in the WebIDE by creating a new file of that ending
Select the remote source, the target schema and all remote tables – well in Twitter there are just two but in reality you can add hundreds of tables at once
Optionally specify filters
Execute the task
Actually, there is a nice video of all of this
Hi Werner Daehn
is there any video or turtorial for using EIM to replicate data from oracle db?
we tried and it seems the task only runs once when does initial load, then it marked as not "real time" task so it doesn't catch deltas.
do we need to do extra configure for that?
Which adapter did you use? The SDA Oracle Adapter (ODBC) maybe? That one does not support realtime. It has to be the Oracle Adapter located in the Data Provisioning Agent.
i am using the "OracleLogReaderAdapter".
According to the EIM admin guide (3.7 Activate and Execute a Replication Task):
Remote subscription(s): Generated in the schema selected for the virtual table. This is only generated when the Initial load only option is not selected.
This actually brings the dilemma: I do need data initial load and after that leaving the remote subscription running in “distribute status” (playing redo log for delta). But if I select the initial load option when creating the task, the remote subscription would not be created.
Fine, I went back to the task again and modified it to have “initial load option” unchecked. This time after I saved the task the remote subscription was generated however not in “distribute status” (in Data Provisioning Remote Subscription Monitor, status is “successful”).
And the replication for delta was not happening. So I tried to re-run the stored procedure it generated. It throws error out like this:
Could not execute 'call "ZABBIX"."EIM::zabbix_rep.START_REPLICATION"()' in 1.870 seconds .
: transaction rolled back by an internal error:  "ZABBIX"."EIM::zabbix_rep.START_REPLICATION": line 3 col 1 (at pos 113):  (range 3): transaction rolled back by an internal error: sql processing error: QUEUE: EIM::zabbix_rep.SUB_VT_ZABBIX.HISTORY: Failed to add subscription for remote subscription EIM::zabbix_rep.SUB_VT_ZABBIX.HISTORY.Error: exception 151050: CDC add subscription failed: Failed to initialize RepAgent.
Ok, I took it as the stored procedure was trying to recreate/reinitialize the subscription which is already existed. So I pull the “alter” sql DDL directly out of stored procedure and run the following to kick off the delta replay:
ALTER REMOTE SUBSCRIPTION "ZABBIX"."EIM::zabbix_rep.SUB_VT_ZABBIX.JEFFTEST" DISTRIBUTE
And it doesn’t like this too, error like this:
Could not execute 'ALTER REMOTE SUBSCRIPTION "ZABBIX"."EIM::zabbix_rep.SUB_VT_ZABBIX.JEFFTEST" DISTRIBUTE --call ...' in 52 ms 743 µs .
SAP DBTech JDBC: : sql processing error: DISTRIBUTE: EIM::zabbix_rep.SUB_VT_ZABBIX.JEFFTEST: Cannot execute distribute on remote subscription EIM::zabbix_rep.SUB_VT_ZABBIX.JEFFTEST .It is not in MAT_START_BEG_MARKER/MAT_START_END_MARKER state.
So what are the essential steps to actually kick off the sync after initial load (never mentioned in any docs or demo)?
Yea, that's something I am after myself. I have filed a bug saying "When you run the initial load again, it should reset everything".
The way I read the error messages is, you have an active subscription already and when the initial load wants to prepare the change capture, it complains that this subscription is active already.
So I would start fresh:
drop remote subscription "ZABBIX"."EIM::zabbix_rep.SUB_VT_ZABBIX.JEFFTEST";
truncate table <target table>;
activate the dataflow again so the remote subscription is generated
Alternatively you could execute the command
alter remote subscription "ZABBIX"."EIM::zabbix_rep.SUB_VT_ZABBIX.JEFFTEST" reset;
But sometimes this did not work for me in my development build. Worth a try though.
To explain the subscription states and the messages around begin/end marker a bit more...
A subscription has multiple states:
1. freshly created or in reset state: The subscription exists but does not do anything. No information has been sent to the adapter yet.
2. alter remote subscription queue: This command signals the adapter to start collecting changes. They are sent to Hana and queued there. The target table is not being loaded! Now you can run the initial load and get all the data into the target table. This will not fail with a primary key violation because the realtime part does not insert any rows yet. This adds a begin marker row in the realtime data stream.
3. alter remote subscription distribute: This adds an end marker in the data stream and starts loading data from the queue into the target table. Everything between begin marker and end marker is inserted into the target table carefully (upsert), everything after the end marker is loaded with the proper opcode (insert/update/delete).
I think this topic deserves blog entry....
Thanks for your reply. i tried your 2nd way to reset the subscription and then re-run the stored procedure, however still get the same error:
Error: (dberror) 129 - transaction rolled back by an internal error:  "ZABBIX"."EIM::test0420.START_REPLICATION": line 3 col 0 (at pos 111):  (range 3): transaction rolled back by an internal error: sql processing error: QUEUE: EIM::test0420.SUB_VT_ZABBIX.JEFFTEST1: Failed to add subscription for remote subscription EIM::test0420.SUB_VT_ZABBIX.JEFFTEST1.Error: exception 151050: CDC add subscription failed: Failed to initialize RepAgent.
you mentioned "activate the dataflow" again, is this something i am missing here? i thought i don't need to do that if the target table and source table have the same structure.
Also, where to find the log for the DPA retrieving data from target database? i checked DPA "logreader" folder, didn't find any logs generated.
or maybe i should ask in this way:if i want to replicate table A in oracle to HANA in real time, what should be the right steps?
Well, this part of the error message "Failed to initialize RepAgent" seems to point into the root cause. What did you do one the Oracle side?
i followed EIM administration guide. On the Oracle database side: i ran the script (oracle_init_example.sql) for the initial setup.
i am digging more in log files and found this oracle error:
I. 2015/04/22 13:55:38.168 INFORMATION com.sybase.ds.oracle.lobcache.OracleInstanceL Could not find Resource Bundle containing index: LOB_CACHE_THREAD_INITED and args=3
E. 2015/04/22 13:55:38.261 ERROR com.sybase.ds.oracle.logmnr.LogMinerScanner LogMiner failed to start with start SCN <36694782107>, end SCN <N/A> because: ORA-01292: no log file has been specified for the current LogMiner session
E. 2015/04/22 13:55:38.261 ERROR com.sybase.ds.oracle.logmnr.LogMinerScanner ORA-06512: at "SYS.DBMS_LOGMNR", line 58
E. 2015/04/22 13:55:38.261 ERROR com.sybase.ds.oracle.logmnr.LogMinerScanner ORA-06512: at line 1
E. 2015/04/22 13:55:38.263 ERROR com.sybase.ds.oracle.logmnr.LogMinerScanner Could not find Resource Bundle containing index: LOGMNR_RETRY_START_ERROR and args=36694782107,N/A
it seems like something wrong with the DPA calling oracle logminer APIs. Actually i am surprised the script for initial setup is so simple that i doubt it has enough configuration for utilizing oracle LogMiner.
Thanks for the internal document you sent me, all steps we've done match the step in the doc. Yet still no success.
Jeff, seems I overlooked your reply. Did you make any progress?
Thanks for your help again. we managed to get it work on a single oracle instance.
But still having issue on the Oracle RAC environment. i was pulled off from the topic for other projects.
Once i fixed replication issue on Oracle RAC i will share the updates here.
Scheduling the SDI and SDA jobs in HANA XSA WebIDE
In HANA 1.0 XSC we had very user-friendly interface to schedule and monitor the SDI jobs. But I am not able to see the XSA flowgraphs in earlier Data provisioning design time object monitor. Do we have separate similar dashboards to schedule and monitor XSA SDI jobs now? If it is; please could you share some documents on how to activate them in HANA XSA.
I am trying to extract SAC Planning model to SHC(SAP HANA Cloud) using Cloud Data integration Adapter. I am able to do full load but realtime replication is not working. Please suggest me if i can use Cloud data integration adapter to get SAC planning model to SHC or do i need to use any other specific adapter. And aslo please suggest me how to schedule a reptask/Flowgraph in SHC. I checked SAP Help portal but couldnt find step by step explanation.
I've created a virtual table with Oracle data source and I was able to read data. But I would like to write into the virtual tables (like simple inserts, updates, deletes). Is this supported? I don't see any option available. Oracle is configured thru TNS using Oracle Adapter (ODBC). It is SPS09.
I read some article that says SDA supports writing in remote sources with some limitations.
Smart Data Access- A new feature by HANA
Can you please provide some details on it?
Correct. When you use the ODBC adapters of Smart Data Access (adapter location = indexserver) then SP09 does support inserts into virtual tables. Updates and deletes as well but with performance limitations.
For the Smart Data Integration adapters SP10 supports above as well, depends on the adapter as well. Oracle SDI Adapter does. Twitter does not.
From a SQL point of view, all you do is creating a virtual table and execute a insert statement. I have played with the SDI SP10 file adapter and a
insert into virtual_table select * from source_table
did create a flat file with many rows. And a insert... values (); command appended a single row.
Thanks Werner for the quick response.
My actual requirement is to add the virtual table as a target node(datasync) in flowgraphs and move data into it. Do you see any issues with this approach and is there some best practices to follow?
I see. Virtual tables are not supported as targets of flowgraphs in SP9. In SP10 they are supported but still lots of limitations. We inherit the limitations from the SDA layer because it supports only update/delete statements with constants. But in the task we need more complex SQLs, e.g. update/delete based on the data of another table and things like that.
So I would be very careful.
SDI is really targeted to get data into Hana. What you are trying to do we only got started with. Will take a while until it is rock solid.
What other options do we have? If we use SDI flowgraphs for transformations, using BODS just to move data out from Hana to Oracle doesn't seem to be a good thing. It is like using 2 ETL tools together. We may be better off to use BODS even for the transformation in that case. Other options I've been thinking are using the java adapter to pull data from Hana or simply using flat file import/export.Can you please share your thoughts on this?
As said, as long as it is inserts only - and your file approach would be the same - we are good.
All I wanted to say is to not expect the full concept, e.g. build a dataflow once and the realtime delta happens automatically. This requires more than the virtual table target supports, it requires a Hana target table.
We are trying to replicate a few tables from HIVE to HANA. We have set up the remote source, the DP Agent is installed, up and running, connected to HANA and has the HiveAdapter (which came by default with the SP10) registered too.
But the moment we try to call the prosedure that got created after setting up anm hdbrep task, then it throws the following error-
2:31:05 PM (SQL Editor) (dberror) 129 - transaction rolled back by an internal error: statement snapshot set failed: access failure to 127.0.0.1:30003
We have rechecked the steps preceding this and they seem to be correct.
Would someone be able to guide me from here on, as to how to resolve or where to look for a solution?
we are using EIM/SDI replication tasks to bring Oracle database tables onto SAP HANA. We are currently on SPS10 rev. 102.00.
I am surprised about the rules that govern the generation of the target table name. We face a serious problem because the name of the source database schema becomes a part of the target table name, too. In our shop, the schema name depends on the environment; development and production environment access the same tables but in different schemas.
Let me give an example. The Oracle schema UWP_COM3 contains the database table TUC_GK_DEPARTMENT that I want to replicate. In the Web IDE, the suggested Target Name "UWP_COM3"."TUC_GK_DEPARTMENT" is shown. Eventually, the effective target table also gets a prefix: the repository package path where the replication task document is stored. This is the outcome:
I know that I can manually delete the schema name in Web IDE but turns into a nightmare since we plan to replicate hundreds of tables. Is there a way to control or customize the generation of the target table name?
Let me check what development has to say about this.
Hope you are doing good. We got the bug fix from SAP (note 2271762) applied for multi-tenant (pluggable) database oracle log reader applied this week.
We tested the real time functionality successfully using Flow graph. (it worked with insert/delete/update ..,) all is fine.
But when we tried to do the same with replication task it gives some strange missing privilege issue. (remote subscriptions are created and no exceptions in other system table) log in oracle doesn't give any hints. we can see the replication is set for the table in oracle side.
Later we gave grant select on developer user schema to the back ground user which executes the replication task and it worked successful.
Any ideas? I am trying to search in help documents and admin guide for prerequisites and necessary privileges but I wonder why the flow graph real time works without this extra select .
One thing I noticed strange is the replication is created by developer profile user and we executed it with technical background user in HANA which does the replication. But in error message I see like "NY31349"."_SYS_DE__popid_6...." what is it doing with the developer ID. Please see below trace file screenshot
I writing here to know if you guys can help us to understand the behavior and also is it normal for replication task or is it a bug? Do you suggest to open an incident?
error message :
We are using EIM to replicate data from oracle DB.
The adapter we are using is the "OracleLogReaderAdapter".
The initial load is working but the replications doesn't work.
The error we are getting is: Error: exception 151050: CDC add subscription failed: RepAgent is not started
The traces show;
[ERROR] DPFramework - Exception while processing requrest "CDC Request for OracleLogReaderAdapter for request type CDC_ADD_SUBSCRIPTION"
[ERROR] DPFramework - RepAgent is not started. Context: null
com.sap.hana.dp.adapter.sdk.AdapterException: RepAgent is not started.
[INFO ] com.sap.hana.dp.oraclelogreaderadapter.OracleLogReaderAdapter.getCDCConfiguration - 'pdb_archive_path' property is not configured. Query it from V$ARCHIVE_DEST
[WARN ] com.sap.hana.dp.oraclelogreaderadapter.OracleLogReaderAdapter.queryArchivePath - Failed to query archive path.
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
What am I missing?
I suspect that you did not configure the Oracle Log Reader Adapter completely. You need to tell it the location of the redo logs on the database server. You can query the file path as mentioned in the error message above (SELECT DESTINATION FROM V$ARCHIVE_DEST WHERE STATUS = 'VALID' AND SCHEDULE = 'ACTIVE') but I recommend asking your database administrator.
Once you got the file path, you start the SAP HANA Data Provisioning Agent Configuration tool and insert it there. The picture below shows where I put in our environment.
Hope this helps!
Our Oracle DB is using ASM thus meaning that I've used the following value for directory: USE_DB_RECOVERY_FILE_DEST
I have the feeling that the RedoLogAdapter is not happy with the above mentioned value.
Another thing I've tried was to unregister and re-register the DPAgent (basically delete and recreate the setup) and now the error message looks slightly different:
2016-02-25 15:26:46,961 [ERROR] DPFramework - NullPointerException while processing request "CDC Request for OracleLogReaderAdapter for request type MDS_OPEN" Context: java.lang.NullPointerException
What else could I try?
I cannot recommend actions related to the Automated Storage Management (ASM). However, I can refer to a document that may help you. It is the "Replication Agent for Oracle 15.7.1 ESD #2 Primary Database Guide". This is part of the SAP Replication Server documentation which supplies the core of EIM/SDI.
Go to this site:
Now navigate in the left area down to "Replication Server Option for Oracle 15.7.1 ESD #2". In this folder you find the above mentioned document (and much more).
How did you get the ase adapter for DP agent?
Is there a way to create Virtual table as design time objects ? So that i can move them across the systems like Dev , Quality and Production . I am using Virtual table in my flowgraph as data source pointing to remote source tables residing in another HANA box .
Currently in SPS 12 we are manually creating Virtual table on top of remote source tables.
very nice Blog actually We want to use HANA SDI for Realtime Replication of SAP ECC Data. The Best way to do that is in our opinion the ORACLE ECC Adapter. Although we have there a little bit of a security and compliance issue you might help us. For realtime replication the Oracle user seems to must have alter any table and drop table rights. As you might figure these right's concerns Data base administration especially we are on the SAP Schema on the oracle DB. Is there a guideline or an assurance on SAP Side that such rights is OK for HANA SDI User in combination with SAP ECC connection? Or is there another way?
Hello Florian,It would be great if you could provide more inputs on integration between SDI and SAP HANA.It would be great if you could share more details on how to implement real time SDI between SAP HANA and SAP ECC like adapter details, technical details .Data is stored in SAP ECC and SAP HANA would want to access these tables for say one particular key each time with some logic involved (say when some calculation view or a procedure/function is called ).Entire data would be only stored in SAP ECC.
I am trying to configure real time replication from MS SQL database to SCP through SDI(Smart Data Integration).
But i am not able to run my replication task and i get the below error message. Any input would be appreciated.
I am using MSSQLLogReader Adapter to achieve this. Below is the Replication agent error log message. If anyone of you have come across such issues, Request you to please spare few minutes of your time on this. Thanks
Failed to synchroniz log device with message while trying to invoke the method java.util.SortedMap.keySet() of a null object loaded from field com.sybase.ds.mssql.log.device.LogDevice._replicaPathMap of an object loaded from local variable ‘this’
It would be great if you could share more details on how to implement real time SDI between SAP HANA and SAP ECC like adapter details, technical details .Data is stored in SAP ECC and SAP HANA would want to access these tables for say one particular key each time with some logic involved (say when some calculation view or a procedure/function is called ).Entire data would be only stored in SAP ECC.