Skip to Content

Hi HANA EIM and SDI Community,

In this blog entry I would like to convey some of the experiences we made throughout an SDI HANA to HANA (H2H) implementation project. To gather an understanding of the context, we will start off with the scenario description and solution architecture.

These are the items that will be covered throughout this blog:

  1. Implementation Scope
  2. Solution Architecture
  3. Best Practices
  4. Challenges
  5. Reengineering of Replication Tasks
  6. Monitoring
  7. Real-time Replication & Source System Archiving
  8. References

You can expect practical insights into the implementation of a HANA to HANA repliction scenario. Some detailed aspects on e.g. task partitioning, replication task design or monitoring are described. Potentially you can adapt the approaches described in this blog in your own SDI implementation project.


1.    Implementation Scope

From an SDI perspective, this brief overview will describe some facts and requirements we had to deal with:

  • Replicate data in real-time from 3 different HANA source systems into a (consolidated) target schema using SDI RTs (with the SDI HANAAdapter)
  • Replication scope approx. 550 tables per source (times 3 = > 1.600 tables)
  • Replicate tables with high record count (6 tables of > 2 billion in production)
  • SDI task partitioning for large tables (> 200 mio. records)
  • Target table partitioning for large tables (> 200 mio. records)
  • SDI infrastructure/configuration – e.g. DP-Agent + Agent groups
  • Follow SDI best practice guidelines (naming convention, implementation guidelines, tuning)
  • SDI development artifacts maintenance + transport across landscape to PRD
  • Dpserver + dpagent monitoring
  • Out of scope: Load and replication of IBM DB2 based source systems (compare with architectural diagram)


2.    Solution Architecture

The end-to-end solution architecture employs several SAP and non-SAP components

  • DP-Agents
    • Virtual host on Linux, 64 GB
    • 2.1.1
  • HANA 2 SP02
    • 4 TB
  • HANA EIM SDI (XSC runtime)
  • DLM
  • HANA Vora 1.4
  • Hadoop Cluster with Spark enabled
  • Microstrategy

The following illustration shows the architecture in a facilitated way. From an SDI point of view there are multiple real-time + batch input streams: Suite on HANA systems, files, legacy data from IBM DB2 DBs (not shown).

In the productive environment (as shown) each Suite on HANA (shown as HDB1/2/3) is connected employing a dedicated DP-Agent group with a HANAAdapter instance. Thus, the risk of stalling the whole replication when remote sources or RepTasks exceptions occur on source system level can be mitigated. The Hadoop and Vora part, shown on the right-hand side will not be further elaborated and are not part of this blog entry.


3.    SDI Best Practices

Initially, most of the aspects (for users + authorizations) considered in the official SDI Best Practices Guide were implemented(refer to references section for the web link to the bast practices).

SDI users were organized the following way:

  • SDI_ADMIN – monitoring privileges, user creation, ***
  • SDI_DEV – Web-based development workbench, repository privileges, schema privileges
  • SDI_EXEC – execute replication tasks
  • SDI_TRANSPORT – transport SDI artifacts

Using this pattern, you can easily follow a segregation of duties approach and avoid unnecessary and unwanted situations in development or deployment. On the contrary, you have to stick with the approach and align your development and administration processes accordingly.


4.    SDI Real-time Replication Design – Challenges

The following describes the major challenges we faced:

  1. Multiple sources into one target
  2. Replication Task Count
  3. Duration of Initial Load and Cut-over Time
  4. Disable the Capture of DDL Changes
  • 1. Multiple sources into one target
    • It is targeted to consolidate all source data (SAP ECCs) into one target schema. In this sense, source tables of the same fashion are replicated into one consolidated target table.
    • Translated into a specific example:

The replication of e.g. table COEP is set up for all three source systems. The target for all                  sources is one table COEP. This target table must comply with all structural properties that                  exist in the sources (COEP tables across n source systems do not necessarily have the exact            same structure, at least in the given case. You might have system specific appends), meaning            different Z-/Y-fields across all COEP source tables. As RTs do not allow for flexible column                  mapping like FGs, this is how we resolved the requirement:


  • 2. Replication Task Count
    • In scope were about 550 tables which have to be replicated with different volumes and different change/delta frequency. For the largest tables, dedicated remote sources are planned to be introduced. This would translate into a remote source count of 7 for the largest SoH source.
      Since each remote source gets its own receiver/applier pair assigned (in the dpserver), this makes sense from a parallelization and performance perspective. On the downside, you have to create dedicated source system users and of course you need to maintain and organize your solution considering the set of different remote sources that are in place. In the following illustration each red arrow represents an own remote source.


  • 3. Duration of Initial Load and Cut-over Time
    • We observed quite a few tables that initially consumed much time (> 24 hours per table). By introducing task partitioning within RepTasks, combined with a proper sequence of loading we could achieve major performance improvements. Hereby, the challenge is to find appropriate columns and value ranges. Imagine a tables such as MARC (SoH Material Master Plan View) with 1.9 billion records and you should define proper ranges for your range partitioning approach. How do you do that? The solution is the profile your data with generic SQL procedures or by using other tools. Potentially you have experts at hand who might have the knowledge about the value distribution in those tables. This task can be tricky.
    • Check value distribution of columns that are candidates for partitioning:


    • Apply partitioning settings in RT; here: Range Partitioning on field STUNR with two parallel partitions for the initial load:


  • 4. Disable the Capture of DDL Changes
    • The HANAAdapter is able to capture DDL changes: drop/add column. Obviously if you load in real-time from three sources into one target table this influences the behavior for DDL changes considerably – i.e. you can’t capture DDL changes anymore as the (dynamic) structural differences between the source tables would cause inserts on the SDI applier side to fail.
    • A good option is therefore to set the DDL scan interval to 0 which means “disabled”. The default value is 10 as you can see in the below picture:


5.    Reengineering Replication Tasks

Throughout the implementation of the project, several changes in requirements, e.g. regarding RT structural details such as conversion expressions etc. occurred (you might know these change requests from your own project 😉 ).

Therefore, some PowerShell and Python scripts were implemented in order to better apply mass changes to the SDI objects. When you have around 1.600 RepTasks you will be grateful to not touch each of them one by one of course. Doing this, you need to take precautions wherever possible. Of course, you take backups of the export you might do from the web-development workbench. After you export your SDI artifacts from the web-development workbench, you can apply changes to the XML structure. You can also do this by opening the replication task from the workbench via right click -> Open in text editor.

Whatever you do in terms of editing the XML directly, you need to be aware of the find/replace operation you undertake. Of course you make sure that you only edit the XML properties you really need to! Else your replication task structure is likely to get corrupt. After applying your changes locally you can re-import the replication task in the web-development workbench.

Here are a couple of examples we had to deal with. These where resolved by applying respective scripts or using common tools such as Notepad++/Sublime.

  1. Replace “Project Expression” in all RepTasks for a defined scope of columns (specific for each table) and input proper conversion expressions such as ABAP date NVARCHAR(8) à HANA DATE format or ABAP time NVARCHAR(6) à HANA TIME fields.


As all RepTasks were initially designed to conduct a 1:1 replication (only 1 table per RepTask and no further conversion or project expressions), a python script executed the following:

  • Iterate directory where RepTasks are stored, element by element
  • Match RepTask file with respective line in provided csv (proper identifiers required)
  • Find XML attribute for affected columns and replace element value “Project Expression”

Using a python script, class xml.etree.ElementTree can help facilitating to parse and edit RepTask XML structures.

  1. Check consistency and correctness of values identifying the source system, which is defined on table level with an additional column
    1. Source system 1 = ‘001’
    2. Source system 2 = ‘002’
    3. Source system 3 = ‘003’

Moreover, check if all RepTasks apply the correct filter criteria. This can also be done using e.g. Notepad++ to “scan” entire directories for certain patterns/values.


6.    SDI-Monitoring

Monitoring of the SDI solution requires both the target HANA system and the DP-Agent(s). Apart from these “core” components, many sub-components come into play. Stating that monitoring the DP-Agent and the DP-framework on the HANA side is enough and your SDI solution is in safe hands would be wrong.



As it is a framework, which is truly embedded into the HANA database, you need to understand (optimally) all inner mechanisms that come into play when data processing happens. However, in this blog, we only want to dedicate our attention to SDI framework related monitoring options. Please do not expect to be able to monitor or troubleshoot your SDI solution after reading this. The following shall provide you only some basic input and approaches we have applied and experienced in our project.

To begin with, and most-likely well known, are the SDI monitoring views. These views unveil quite a lot of details of SDI design time objects, ongoing loads or general statuses. They also provide some control mechanisms to handle replication and load. Anyhow, when it comes to more detailed root-cause analysis, one needs to be familiar with the framework’s table, how traces are set, where log files are stored etc. A good reference here is the official SDI Administration Guide (refer to references section where links are provided).

In addition to the monitoring views, a solid set of prepared SQLs or some custom-built stored procedures can of course help to facilitate your monitoring or troubleshooting activities. E.g. during the initial load of our RepTasks you typically continue querying some important views. Especially, the REMOTE_SUBSCRIPTIONS view is helpful when it comes to monitoring a particular initial load or CDC states. The state column describes one of the following replication states. These should be well understood (you also find this information in the SDI Administration Guide):

State Name Description
CREATED Remote subscription is created by the replication task.
MAT_START_BEG_MARKER The receiver is waiting for the begin marker that indicates the first changed data to queue while the initial load is running.
MAT_START_END_MARKER The receiver queues the rows and is waiting for the end marker that indicates the last row of the initial load.
MAT_COMP_BEG_MARKER The receiver is waiting for the begin marker that indicates the first row to queue after the initial load has completed.
MAT_COMP_END_MARKER The receiver queues the changed rows and is waiting for the end marker that indicates the last row of the initial load. The initial load has completed and the end marker is sent to the adapter. If the state does not change to AUTO_CORRECT_CHANGE_DATA, the adapter or source system is slow in capturing the changes.

When the end marker is received for the initial load, the applier loads the changed data captured (and queued during the initial load) to the target.

If a lot of changes occurred after the initial load started, this state might take a long time to change to APPLY_CHANGE_DATA.

APPLY_CHANGE_DATA All of the changes captured while the initial load was running have completed and are now loaded to the target.


We personally perceived the usage of the monitoring views as too cumbersome, as well as the SDA monitor in HANA Studio. This is subjective and does not apply to all cases of course. Subsequently, I present a bunch of helpful statements:

--Check for current source, if all subscriptions are in state APPLY_CHANGE_DATA – show differing subscriptions

--Check how many reptasks/remote subscriptions are in real-time replication state per remote source

--Query for exceptions on remote source/subscription level

--Check applied remote statements during the intial load (SDA request) for a given table. You need to sort out the correct statement as multiple statement might involve your search pattern

--Find running tasks. The initial load in the SDI context is wrapped into a HANA runtime task. Therefore you can keep track of its execution by querying the appropriate tables	
--In case of partitioned tasks, query the subsequent table


 Monitoring the DP-Agent

To keep track of what happens on the DP-Agent side, it is advisable to have a real-time monitoring tool at hand, such as Array, Baretail or when running on Linux, some native Linux tool. It is your choice. Set the trace and log levels in the DP-Agent’s ini-files. Make yourself familiar with the paths were traces and logs are stored and access them via your tool of preference.

Here is a brief overview of the dpagentconfig.ini file properties for logging:

  • log.level = ALL/INFO/DEBUG set the trace level
  • log.maxBackupIndex = 10 max number of trace files
  • log.maxFileSize = 10MB                                     max size of trace file
  • trace.body=TRUE log entire body part of operation

Correspondingly you should see in the log path the respective trace/log files:

  • trc Trace file for agent
  • Framework_alert Alert file for framework
  • log Log Reader rep agent log files
  • log Service log files


Data Assurance

In order to assure source and target record count are the same, you can use some prepared SQL statements to query the record counts of source and target table.

Possibly, you want to provide some SQL procedure to automatically match source and target table in terms of record counts, across remote sources, e.g. taking into


7.    Real-time Replication & Source System Archiving

Replication using the HANAAdapter is trigger based.

I.e. database triggers on table level exist and react upon DMLs such as: I/U/D. If a table is being replicated, and a record in the table is archived, the database trigger interprets the archiving operation as a DELETE statement, and SDI deletes the corresponding record from the target table.

The following option can be leveraged to ignore DELETE operations from an archiving run, introducing a user that executes the archiving run. This user must always be the same user.



We would have an ECC source system user e.g. “ARCHIVE_USER” that executes the archiving run. All of the operations conducted by that user will be filtered out by the HANAAdapter, no deletes will happen.


8.    References

Official SAP HANA SDI Documentation:

  • Installation and Configuration Guide
  • Administration Guide
  • Modeling Guide

SDI Best Practices:



  • RS = Remote Subscriptions
  • RT = Replication Task
  • CDC = Change Data Capture


I hope you can learn from the experiences we made and eventually adopt some of the approaches. I will be open for questions or anything you want to challenge.

Best regards,


To report this post you need to login first.


You must be Logged on to comment or reply to a post.

  1. Nicholas Chang

    Hi Stefan,

    Thanks for the excellent blog! Somethings cool about “SDI”! Always want to get my hands dirty on advance SDI implementation like this but hardly get any chance.

    One question, can this same scenario use on HANA to non-hana target, for e.g. HANA to Hadoop?


    Nicholas Chang


  2. Stefan Geiselhart Post author

    Hi Nicholas,

    thanks for your comment.

    Am I understanding correctly:

    You want to distribute data from your HDB to e.g. a Hadoop target using SDI data flows (RepTask/FlowGraph)? You want to do that batch-based/scheduled or in real-time?

    Best regards,


  3. Arthur Chen

    Hi Stefan

    Thanks! Excellent blog and very informative.

    We’re also implementing a SDI project to extract data from ECC and it’s good to know I’m not the only one using notepad++ to edit RTs.

    We had experienced an issue where partitions can’t be applied along with filters. Let’s say there’s a huge table containing data since 2000 and we’re only interested to load past 3 years, and also partition is preferred. But we noticed that partition will ignore the filters in the remote statement as well as in the log file. Not sure if you’ve seen the same issue?



  4. Stefan Geiselhart Post author

    Hi Arthur,

    thank you.

    Can you tell which DP-Agent version, HANA revision and IM_DP version you are using? Are you using ECCAdapter or HANAAdapter? The wrong behavior you’re describing that filters are not pushed down to the where clause sounds a bit familiar…

    Cheers Stefan

  5. Stefan Geiselhart Post author

    Hi Arthur,

    can you x-check if the following SAP Note is applicable in your case?

    2560754 – Projection filter is not pushed down to partitions when task partitioning is used


    Best regards,


    1. Arthur Chen

      Thanks Stephan, sorry for the late reply.

      The note 2560754 was the right one and SAP also advised the issue will remain on HANA 1.X and won’t be fixed till 2.X.



  6. Sudha Rao

    Hi Stefan,

    Thanks for the detailed explanation.

    I have few outstanding questions based on our project scenario : Would be great if you can answer those :

    1. Can we schedule the SDI replication, which replicates tables into the Catalog physical schema directly ( like how SLT does)
    2. Do we always have to create a target table for the SDI replication, or does it create a default table using the same name as source with the same data type structures?
    3. Can we schedule the SLT replication as real time or non real time ( again like how it is in SLT).

    Kindly confirm.



  7. Stefan Geiselhart Post author

    Hi Sudha,

    here are my answers:

    1.) In SDI you can also schedule data flows

    2.) Using replication tasks in SDI will allow you to auto-create the target table (…also defining additional columns etc. when need)

    3.) Both real-time/non real-time is possible but that depends on the source you are pulling data from (compare with either SDI Config/Admin Guide or refer to SDI PAM)

    Hope this helps,

    Best regards,


  8. Atul A. Maslekar

    Hi Stefan,


    Thanks It’s really useful.


    Can you please give details about S 4 HANA ABAP CDS view as source with SDI.

    Thanks in advance.

    Best Regards,

    1. Stefan Geiselhart Post author

      Hi Atul,

      to my knowledge you should be able to leverage CDS views using the ABAP adapter.

      Please go ahead and check.

      I haven’t checked myself but as far as I can remember some colleagues of mine did this.

      Best regards,


  9. Kanagarasu Saminathan

    Hello Stefan,

    Thanks for the excellent blog about SDI

    I have an question. In our scenario, we have replicate from HANA to HANA

    We have Replicate from ECC HANA DB (Host-XYZ) to BW HANA DB (Host-ABC) – On top of this we use BO for real time reporting.

    Now, where do we deploy DP agent? on Host ABC? or On HOST XYZ? or do we need separate host for SP agent that is near to ECC HANA DB(Host-XYZ)?

    Please clarify

    Thanks in advance

    Best Regards,


    1. attaphon predaboon

      per best practice, it is suggested to have DP agent as a seperate server near the source. In this case should be your ECC.


      “We generally recommend that you install the Data Provisioning Agent close to the data sources. If you have data sources scattered across multiple geographical locations separated by distance and network latency, you can deploy multiple Data Provisioning Agents. Install at least one in each geographical location.”

    2. Stefan Geiselhart Post author

      Hi Kanagarasu,

      as the SDI documentation is outlining, the DP agent should be deployed “as close” as possible to  the data source (from a network topology point of view).

      You can either install it directly on the machine where your source DB sits or you have a dedicated (can be virtual) server. For production purposes a dedicated machine is the preferred way, for pilot scenarios you might install it directly on the source.

      Best regards,


  10. Pierre Combe

    Hello Stefan,

    Thanks for the excellent blog about SDI

    I have an question. In our scenario, we will replicate data from SAP ERP (Oracle DB) to HANA. We  plan to replicate around 500 tables, and of course, we would like to avoid to create reptask one by one.

    We want to create a script, with as input, the list of table, and then generate the XML for all tables. However, how do we (how did you) deal with the tag <TargetColumns><TargetColumn>?

    We try this:

    <TargetObjects SchemaName=”TEST”>
    <TargetObject Type=”TABLE” ObjectName=”MARA”>
    <Mapping ObjectName=”MARA” RemoteObjectUniqueName=”MARA” FilterExpression=”” ReplicationBehavior=”InitLoadOnly” VTObjectName=”MARA”>

    It didn’t work. error =>  column store error: task framework: [140004] scenario -> QUERY – Operation output requires at least one attribute

    Do we have to specify, for each tables, the columns? If yes, I’m not sure to understand what your python script do.

    Thanks in advance

    Best Regards,


    1. Stefan Geiselhart Post author

      Hi Pierre,

      I think there was a misunderstanding in the section where I describe the Python script usage.

      The script we have used did not create the replication task files up from scratch.

      What we did using pythons are replace operations in the XML structure or optionally small extensions when new columns had to be introduced. The columns + properties + exact syntax we prepared upfront and then ingested into the script as variables/parameters.

      Potentially it is thinkable to automate the creation of replication task objects. For this purpose however some more scripting would be required, also leveraging python HANA drivers to access the DB, get metadata etc.

      Best regards,


Leave a Reply