Skip to Content
Business Trends
Author's profile photo Werner Daehn

HANA Smart Data Integration – The one stop solution for data handling

In the data handling related market there is a pure Use-Case driven thinking. Customers requires to do data transformations? Provide an ETL tool. Customer requires the same data in another system? Provide a Replication tool. Customer requires access to the remote data? Provide Data Federation options in the database. Cloud Integration? Another product. Big Data? Yet another product. Data Quality? Data Preparation? The list goes on and on.

This Use-Case driven approach is way too limited. Just because the target is in the cloud, does that mean less sources, less transformation options are required? Certainly not. And the same can be said about any other combination.

The reason for the situation is partly historical and partly because no vendor has all the building blocks to provide a one stop solution for all – except SAP with Hana.


Playing Lego with Hana

Actually, thinking about it, all of the above use cases can be broken down into three types of components: Data Access – Data Transformation – Data Output. Each of the three types come in different flavors…


  • Data Access
    • Pull data on request: When an batch job starts or the user executes a query, pull the data.lego2x4red
    • Push changed data: Whenever there is a change in the source, push the change.lego2x4grey
    • Pass the data: Provide the data as input parameter.lego2x4yellow


  •  Transformations
    • Transform the read datalego2x8green
    • Transform the incoming change-datalego2x8blue


  • Data Output
    • Load the targetlego2x2white
    • Show the datalego2x2transparent
    • Return the data as output parameterlego2x2yellow


All use cases are now just combinations of these building blocks. With the main advantages that only one product – Hana – is to be used, only one connectivity is to be set up to each source, one central place for monitoring and administration, one UI to design the transformations. And as a consequence, the ability to move from one method to the other with a flip of a switch.


In order to provide better insight, imagine a user does make the following journey with Hana…


Setting up the connectivity to the source

For fun the remote system should be something different for a change, it shall be the headlines provided via the RSS Adapter.

In the Adapter layer the user created a remote source using the RSS Adapter and pointing to the URL


This RSS Adapter provides a single remote table only, the RSSFEED table, with all recent headlines. In order to select data, this remote table was imported into the Hana catalog as virtual table, it was called V_RSSFEED.


Now Hana has all the required information to allow using as data source and above Lego bricks can be combined.


Virtual table access (Smart Data Access)

lego2x4redlego2x2transparentThe data to be accessed is in a remote system and should be presented to the user. How these queries are executed does not matter. Could be the SQL console, a Business Intelligence tool, another application, BW,…. at the end it is all the same: A query like “select * from cnn_headlines” is executed in order to retrieve the data. (Note: The object cnn_headlines is a database view pointing to the virtual table as additional layer of abstraction.)


This functionality by itself is unique in the database market already. Other database vendors allow Data Federation for database sources only and certainly do not allow adding adapters created by somebody else.

According to marketing this Smart Data Access functionality is the preferred method for Hana, as the remote data is realtime by definition – the data is retrieved whenever the query is executed – and no data duplication. Fine.

But a few problems with Data Federation become obvious here already.

  1. The query speed is the speed of the source system. Might be sufficient, might not be.
  2. The source system does delete old data. Granted, via the RSS internet protocol the 100 most recent headlines are provided only. That’s kind of an extreme case of not providing much history but it is the same with operational systems as well often.
  3. The source system gets pounded by queries constantly.

If one of above reasons suggest it would be better to copy the data into Hana instead, it calls for a batch data movement use case.


Batch data movement

lego2x4redlego2x8green lego2x2white

Here the task is quite simple, pull the data from the source, transform it into the target table schema and load all rows into the target table. Using the FlowGraph UI it is a matter of seconds dragging in the virtual table as source and connecting it to a new Hana column table as target table.


For the user nothing changed. He still does execute the same query “select * from cnn_headlines” as before but now this database view points to the loaded T_CNN_HEADLINES table in Hana. Hence the query is executed with Hana performance, has all the history and the source system is relieved from the additional resource consumption. Except that this dataflow has to be executed tomorrow again to get the changes.

In today’s world that would need to be executed more frequently. Every hour? Every 10 minutes? Every second? The desire is to bring the latency down to a sub second level, meaning executing the dataflow that often. At some frequency it does not make sense anymore since one reason for copying the data was to reduce the load on the source system by redirecting the user queries into a Hana table. Yet the source system is queried for changes more frequently than users would have? Does most definitely make no sense.

The key problem is the pull-mechanism. The batch job has to run just because there might(!) be new data. Would be much better if the source does send the changed data by itself – a push mode.


Realtime Data Movement



In the push mode the adapter does whatever is required to be informed about changes in the source. For database adapters that is done by following the transaction log. Internet sources often provide a streaming protocol. …whatever make sense. The RSS protocol is too trivial, it provides neither of these options but at least the http protocol header can be used to figure out quickly if there was a change or not.

The other consequence of the push method is that the source can tell the type of change for each row, e.g. in case a row has been deleted in the source. Therefore the dataflow has to deal with those deleted records somehow. Again, implementing that as a batch pull logic, this can get very difficult as each source is different. The user has to design a flow that handles deleted rows. Likely in addition to the one dealing with insert/updates. Another one if the second source table got changed. Another….. It gets ugly quickly. In Hana all the differences are handled by the adapter and hence the FlowGraph editor got the logic to turn (almost) every dataflow into a realtime push transformation.

As a result, turning the above batch dataflow into a realtime push version is as simple as opening the properties of the source table and selecting “realtime”.


Internally such realtime enable FlowGraph is turned into two runtime objects actually. First an batch dataflow performing the initial load of the target table. And a second dataflow with a parameter of datatype TableType as input, which is called with all changed rows.


CalcView like transformations

lego2x4red lego2x8green lego2x2transparent

The idea of the CalcView in Hana is to read the data, transform it and show the results whenever the user queries such view. Isn’t that just another combination? Pulling the data, performing the transformations and presenting the data to the end user. With the main difference, in this editor all Hana Transformations can be used: Data Quality, Predictive, Statistic,….


Service call

lego2x2yellowlego2x8green lego2x2transparent

As a FlowGraph supports passing parameters, this can be used to design data transformation services. Something that looks and feels like a procedure with input and out parameters and does complex transformations inbetween. With that it is possible to utilize all the transformations Hana provides in other applications. Thanks to the simple to use FlowGraph editor even end users can make modifications to these dataflows.


A step by step guide for this example can be found in SCN.


Plus more

By mixing and matching these 3 x 2 x 3 components the user gets much more than a few supported use cases. All use cases are supported and the best of it, the user does not have to decide upfront. For some cases the virtual table method is the perfect solution, for others the realtime data movement. In some cases the transformation has to be flexible, in others it is better to convert the data model of the source into something that is better suited for querying. The end user does not care, he does not even see the difference – all he does is “select * from cnn_headlines”.





Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Niels Weigel
      Niels Weigel

      Hi Werner,
      thanks for posting this introduction article. I agree that many times currently the focus on the decision of "the right" data provisioning or integration technique is more on the knowledge, expertise or experiences with classical tools like ETL or Replication Servers. Your article describes the flexibility of SAP HANA smart data integration where at the end the "end user experience" is always the 'select * from cnn_headlines', independent of the data provisioning process.

      Author's profile photo chandan praharaj
      chandan praharaj

      Really nice explanation Werner! Can you please point me towords some material, how the delta mechanism works in Real time flowgraphs. And one more query, is it going to come with all standalone HANA license or separate license like SLT or BODS. And if it is with Native HANA package, what would be the implication on the system performance, if replication has to happen in same HANA system?

      Author's profile photo Kiran Shenvi
      Kiran Shenvi

      Hi All,

      I am new to Flowgraphs and needed help on the same.
      We have a requirement where we have 2 columns in Data Source and want to concatenate them and map it to a single column in the target table, which is the correct type of node to be used?
      I couldn't find any relevant example on line.. Kindly guide on the required steps if its possible to achieve.


      Author's profile photo Ashok Kumar M
      Ashok Kumar M

      Hi Werner,

      Is it possible to call a odata/rest service using SDI? 

      E.g: Based on a change to table in source system, replicate the same into destination system and also trigger a service maintained in SAP Cloud Platform?

      Best Regards,

      Author's profile photo Werner Daehn
      Werner Daehn
      Blog Post Author

      Let me answer it by saying, yes, there is a SDI adapter for oData and for restful. So you can assemble such logic. It will not be straight forward though, I think.

      Author's profile photo Ashok Kumar M
      Ashok Kumar M

      Hi Wener,

      Thanks. I dont see any oData or REST adapters in the system now. Am I missing something?
      Also, could you please point me to some additional info on SDI adapter for OData and for REST?
      Best Regards,

      Author's profile photo Werner Daehn
      Werner Daehn
      Blog Post Author

      Author's profile photo Ashok Kumar M
      Ashok Kumar M

      Thanks Werner

      Author's profile photo Ashok Kumar M
      Ashok Kumar M

      Hi Wener,

      I have db in two SAP Cloud Platform accounts. I’d configured data provisioning agent with one and tried to create a remote source with other db. What should be the host and port?

      I created a remote source with as host and 30015 as port. I get below error when I open the remote data source node.

      5:29:10 PM (Catalog) Error reading Remote Object: InternalError: dberror(CallableStatement.execute): 403 - internal error: Cannot get remote source objects: Cannot establish JDBC connection. Error: SAP DBTech JDBC: Cannot connect to jdbc:sap:// [Cannot connect to host [Connection timed out: connect (local port 60283 to address 0:0:0:0:0:0:0:0, remote port 30015 to address (], -813.].

      Anyone experienced this error before? Any idea what I could be missing?


      Best Regards,


      Author's profile photo Prashant Kumar Patnaik
      Prashant Kumar Patnaik

      Hi Werner,
      This is a wonderful article teaching a lot of things about smart data integration. i have one doubt and hope to receive help from you in this regard. I have two schemas namely SAPHANADB and TEST_DEEPDIVE. Now i want to compare the data from both the schemas and store it in a data sink. while trying to achieve this i am getting an error."first argument has to be string representing the schema to which the procedure belongs.","errorResultSet":"Error: $.hdb.Connection.executeQuery: Parameter at position 1 is undefined, which is not allowed"}"
      Please let me know if you have any solution to the issue.