Smart Data Integration available for the SAP Cloud Platform
HANA smart data integration (SDI) is a native technology part of your HANA database to handle all styles of data integration. It can do data federation (aka smart data access), real-time data replication and also apply complex transformations on your data. SDI is part of each and every HANA database since HANA SPS09, and is available on the SAP Cloud Platform (SCP – previously known as HANA Cloud Platform – HCP) as well. This blog series will focus on using SDI for SCP, in the “Useful links” section below you will find links to other generic SDI resources.
*New* [Nov 2016] Video recording and demo on SDI for SCP *New*
- Data Federation: with data federation, you can expose onPremise sources like databases or even Hadoop to your HANA database in the cloud. The data is not physically moved to the cloud, but remains in its original source. Via virtual tables, the data becomes available in HANA queries. Obviously there can be latency issues, and this federation scenario is only useful for infrequent queries and low amounts of data. But it can be a first step before moving to replication.
- Data Replication: through a scheduled process, or in real-time physically move data from the original source into the HANA database on SCP. Real-time replication is possible for selected sources like databases (Oracle, DB2, SQL Server, …) and Twitter. A change in these sources is replicated in (near) real-time to the HANA database in the cloud.
- Data Transformation: apply complex transformations on your data before storing it in HANA, or for data that is already available in HANA and needs further transformation. These transformations include SQL-like operations like join, filter and aggregate, as well as more complex operations like pivot your data (transform rows to columns and vice versa) or build in conditional logic (CASE transform). Another common data transformation scenario is history preserving to build up a history of changes (e.g. a delete in your source, would be transformed to an update to set a flag to inactive, but keep the record for history).
SDI has two main components: the data provisioning server (dpserver) and the data provisioning agent.
The data provisioning server is a native server in your HANA database. All you need to do is activate the dpserver in your HANA configuration.
The data provisioning agent is a small component you need to install on-premise, on his agent the adapters are deployed, who will take care of the communication with the source systems. The agent will get the data from the source, next compress and send over HTTPS (encrypted) to SCP. In order to establish the communication between agent and server, a third small component is needed: a proxy server packages as a delivery unit to be imported in your HANA database. The same delivery unit will also provide you with monitoring capabilities.
Note that bi-directional data transfer is possible, so not only loading data into SCP, also write back to external targets.In both cases it is the agent that initiates the communication, so from a network point of view, the agent will always do outbound HTTPS calls. The response of such a call can be the data to be written back to the onPremise system. This means the agent can communicate with SCP without VPN tunnels or reverse proxy setup.
The picture below visualizes this architecture.
The SDI agent includes many adapters out-of-the-box. Some of these are real-time enabled, others can only do batch extraction. In addition to the out-of-the-box adapters, there’s also a JAVA based adapter SDK available, which can be used to create custom adapters.Below is a list of available adapters, based on HANA SPS11 (latest HANA release available on SCP). This list is not exhaustive and will keep growing with every release, both with built-in adapters as with adapters delivered through the partner ecosystem.
Real-time enabled adapters:
- SAP ECC based on Oracle, DB2, SQL Server or ASE
- SAP (Sybase) ASE
- SAP HANA
- IBM DB2
- Microsoft SQL Server
- ABAP adapter
- Microsoft Excel
- File (delimited and fixed width)
- Hadoop Hive
As stated earlier, SDI is fully integrated in HANA, so also the user interfaces are the standard HANA design tools. This is primarily the HANA Web IDE where you can create the SDI objects like remote sources, virtual tables, flowgraphs, replication tasks etc. But also HANA Studio (with the SCP plugin) can be used to create SDI objects.
The licensing model is always subject to change, so please check with your SAP account team for the latest status. However in general we can say that SDI licenses are included in in the “HCP, integration service premium edition” licenses. You will always need HANA on SCP as a pre-requisite, the license will enable you to download the data provisioning agent and deployment unit.
Trial access for SDI is also available today on hanatrial.ondemand.com. At the end of this blog there’s a link to the step-by-step instructions to start using this trial.
- Official help portal on Smart Data Integration on http://help.sap.com/hana_options_eim
- Tutorial videos on HANA Academy: SAP HANA Academy – Smart Data Integration/Quality: An Overview Demo [SPS09] – YouTube
- Blog series on SDI : Hana Smart Data Integration – Overview
Where to go next:
- Step-by-step: Setup SDI for your SCP account (non-trial) : Step-by-step: Setup SDI for your HCP account (part 1)
- Step-by-step: Setup SDI for your SCP trial account : Step-by-step: Setup SDI for your HCP trial account
I see this as the next step for application enhancement and integration of S/4 and SuccessFactors and HCP enhancements.
May be years and years until it happens but SDI allowing fast powerful read-only queries of emp and enterprise data could be very powerful!
A great tool !!
A lot more powerful than HCI.
We had even developped our own adapter to answer specific needs ...
Sounds really interesting, can it de-duplicate and cleanse data?
De-dupe and cleansing data is part of the SDQ (smart data quality) functionality and is not enabled on HCP in the same was as we can do it for an onPremise HANA system. You can only use SDI (smart data integration) functionality.
For cloud we have come up with a different model where data quality is enabled as a central service that can be called from any HCP application. This is currently in beta, you can find more info here: New HCP Service (Beta) - SAP Data Quality Management, microservices for location data
The technology used is still SDQ behind the scenes, but you interact with the micro services via RESTfull webservice calls.
I can read a text or csv file that is store at HCP mobile documents?
Unless you are asking for clarification/correction of some part of the Document, please create a new Discussion marked as a Question. The Comments section of a Blog (or Document) is not the right vehicle for asking questions as the results are not easily searchable. Once your issue is solved, a Discussion with the solution (and marked with Correct Answer) makes the results visible to others experiencing a similar problem. If a blog or document is related, put in a link. Read the Getting Started documents (link at the top right) including the Rules of Engagement.
NOTE: Getting the link is easy enough for both the author and Blog. Simply MouseOver the item, Right Click, and select Copy Shortcut. Paste it into your Discussion. You can also click on the url after pasting. Click on the A to expand the options and select T (on the right) to Auto-Title the url.
Thanks, Mike (Moderator)
SAP Technology RIG
NO, SDI cannot read files from HCP Mobile documents. SDI requires a local onPremise agent, this agent can access local files (or files on a shared drive) accessible through the local operating system (Windows/Linux).
Note: to Mike Appleby's comment: it's more efficient to ask questions via the discussion forum -> SAP HANA Cloud Platform Developer Center
If you pre-fix your question's subject with "SDI" me and my colleagues will be able to find them more easily and respond.
Does SDI essentially do the same thing as Smart Data Access?
Does it allow you to virtually federate data from a 3rd party source such as an Oracle and SQL D/B to the HCP DBaaS instance?
It's usually better to start a new discussion for questions about tech stuff, rather than asking questions in the comments of a blog post, as here the Q&A can often get lost.
If you haven't seen it, I'd suggest looking through the main SDI overview post here Hana Smart Data Integration - Overview
Gary, SDI and SDA go hand-in-hand. SDI basically provides more adapters for SDA, plus, SDI makes SDA available on the HCP (through the agent which communicates over HTTPS).
Using SDI on cloud,
Is it possible to establish an SDA connection between two SAP Hana Cloud Platforms?
very cool blog series. Managed to access the sample file from within my MDC on trial in less than 2 hours. Cheers
PS: the "Licensing" section still claims that "SDI is not available (June 2016)" in the trial landscape
Thanks Thorsten. I have changed the text in the licensing section... SDI is obviously available now in trial (since early July).
Good day Ben,
Great article on the introduction to the SDI.
I am interested in exploring more about the data sync or push from Hana DB on HCP to the DB on premise.
I was wondering if you could clarify on how the DP server on the HCP would contact or trigger the DP agent installed on premise. In my mind it is virtually inconceivable to have the DP server contact the DP agent if the latter is installed behind firewall, unless the http(s) connection remains or persists.
I notice on Hana Web IDE that you can browse the on premise table contents which makes me wonder how the DP server connects or initiates to the DP agent. I always thought the agent was the one who initiated.
I look forward to hearing from you.
Good question ! It indeed looks like the DP server would be calling the DP agent to collect metadata or to push data back to onPremise systems; However, this would obviously not be possible because firewalls will prevent incoming traffic into any customer's network.
So how this is done is true "long polling". Basically the DP agent is constantly polling the server in the cloud to see if there's any work to do. If there's work for the agent in the queue, as a response to the poll request, the DP agent will get the definition of the work, so the agent can execute and report he result back to the server. With this technique we can load data back to onPremise systems and also get metadata from onPremise systems to the webIDE during modeling.
This means that from a network point of view, the communication request is always initiated by the DP agent, via outbound HTTPS. So there will be no need to open any firewalls for inbound traffic.
Cheers Ben for the quick reply. It is all clear now.
is there a way to decouple SDI from Hana as DB as of today? if not will be planned for the future?
Elena, SDI or more in particular the DPserver, is a native HANA component, so SDI will always need HANA.
We are looking into ways to provide SDI as a service, so that it will still run on a HANA server, but customers do not need to own the HANA server. This is just in exploration phase, so not something that will come short term (1 year +).
Awesome Blog Ben !!!
Awesome Blog @ben and its very useful.
I have question regarding Data Federation (SDA)
We have this use case of getting data from one Hana cloud platform to other HANA cloud platform.Do we need any additional steps for this usecase?
We are setting up SDI for our IBP system, I found mention about tool "Data Provisioning Agent Configuration tool", is it DP Agent installation or something different. I have downloaded SDI set-up but couldn't find - "Data Provisioning Agent Configuration tool". Please help.
It's an additional step post installation. You can find the SDI documentation here: https://help.sap.com/viewer/7952ef28a6914997abc01745fef1b607/2.0_SPS01/en-US/c467c18dc3f14bef87e72018544a2b87.html
Step 4 is the installation of the DP agent, step 5 the configuration. (Note: for IBP steps 1-3 are done for you by the SAP cloud operations team). You will find the configuration tool in a location like this: C:\usr\sap\dataprovagent\configTool\dpagentconfigtool.exe
Can we access data from multiple on-premise systems simultaneously?
Yes, that's definitely possible !
Great Blog Ben Thanks a lot.
We have read that the SAP ABAP adapter supports calling BAPI functions as virtual procedures. Is this somehow only limited for extracting data from a on-premise SAP System or can we also call BAPIs using SDI, which are creating / committing data in the backend system?
In case someone has the same question: Yes you can also call BAPIs to write data back to ERP. You can even implement a proper error handling for your OData / BAPI Service.
Have got the same question.. any idea if it is possible to have data moved into HANA on the SAP Cloud by calling a ABAP - BAPI vis SDA?
Is it possible to call a odata/rest service using SDI?
E.g: Based on a change to table in source system, replicate the same into destination system and also trigger a service maintained in SAP Cloud Platform?
Yes, it is possible to call those services using SDI.
1. For OData you have a standard adapter, here you have the link to use it:
2. For REST you have to create a custom adapter (by extending a class in Java). Here you have the link to do it:
3. For SOAP (web services), you didn't ask but maybe you could be interested, there is a standard adapter, here you have the link:
Thanks a bunch Fernando
Is it possible for me to setup SDI between two systems in HCP? Read from one HCP system into another HCP System ?
Tried to search for any information regarding setting up Agent on HCP, but did not find any suitable results.