SAP’s Data Integration Solutions in the Cloud
I have seen lately quite some confusion on our data integration offering on the SAP Cloud Platform. In this blog I’ll try to clarify our offering and make a recommendation for which tool to use for which use case.
To start with, there’s some confusion on naming and bundling, that I want to get out of the way before we look at the actual solutions and their positioning. First, there’s confusion about the name of the solution, which has changed a couple of times over the last few years. Originally the overall solution was called “SAP HANA Cloud Integration” (HCI), later renamed to “SAP HANA Cloud Platform, integration service” and earlier this year finally (?) renamed to “SAP Cloud Platform Integration”. These were pure name changes, there have been no related changes to the underlying software solutions.
A second point of confusion is that SAP Cloud Platform Integration is not one product, but a suite of products that are bundled under one umbrella. Depending on the exact license a customer buys, he or she might get one or more of the components included. For example there’s a “process edition” for process integration use cases and a “data edition” for data integration use cases, but also a full edition which includes everything, and some specific bundles with SAP cloud applications. The actual products included in these bundles include “Cloud Platform Integration for process integration” (fka HCI-PI), “Cloud Platform Integration for data services” (fka HCI-DS) and “HANA smart data integration”.
With these 2 topics out of the way, let’s have a closer look at the individual solutions to better understand what to use when.
Process vs data integration
Process and data integration have been different disciplines in the traditional on-premise integration world. In a simplistic black and white view, we could make this distinction:
- Process integration is about routing small (but many), individual “messages” from one system to another, and make sure delivery is guaranteed and response messages are send back as part of an end-to-end business process. Usually the source application is in control, sending the message to the process integration middleware, which on its turn takes care of the further processing of the message and routes it to the desired destination.
- Data integration is about extracting, transforming and loading (ETL) large amounts of data from one system to another. The data is processed in bulk and transformations can happen on the full data set, like aggregation, sorting or finding duplicates. Traditionally these ETL tools interact at the database level and pull the data – so the data integration middleware is in control, not the source application.
However, the (integration) world is not black and white, and there’s a large gray zone. Specially in the cloud because direct database access is often not possible and all interactions need to go through APIs, so data integration loses one of its key benefits. Also, with the demand for real-time data, data integration tools have been enhanced to now get data in micro-batches (near real-time), or even get data pushed from the source into the integration layer. But today we still have separate tools to do data and process integration…
For a deeper dive into the differences between integration patterns, I recommend SAP’s ISA-M (Integration Solution Advisor Methodology). This blog based on a popular TechEd session is a good introduction to ISA-M.
In the remainder of this blog we focus on data integration only.
For more details on the process integration capabilities part of Cloud Platform Integration, check out this landing page.
Data Integration use cases
For data integration, we have 2 products in the Cloud Platform Integration portfolio: “Cloud Platform Integration for data services” (CPI-DS fka HCI-DS) and “HANA smart data integration” (SDI). Today, these two solutions have a big overlap, but in the future we plan to bring the two closer together, and ideally make it look like one solution. In the interim however, these are two separate solutions, each with their own strengths and their own targeted use cases. But from a licensing point of view they are always sold together so that a customer can pick the tool that best fits the use case, without additional costs or risk to buy the wrong tool.
A quick comparison between the two products:
- Cloud Platform Integration for data services is a cloud based data integration tool for batch/scheduled data integration between onPremise applications and cloud applications. The main use case for CPI-DS is loading data to SAP IBP (Integrated Business Planning), but there’s also connectivity to other cloud applications through standards like SOAP, REST or OData. CPI-DS does not provide real-time data integration, and has a limited set of transformation capabilities.
- HANA smart data integration is a native technology inside a HANA database to load data in (and out of) this HANA database. In addition to a batch / scheduled data load, it can also do real-time replication and even data federation. It has a rich set of adapters for connectivity and provides many transformation capabilities, leveraging the HANA in-memory database as engine. SDI cannot be used stand-alone and is only available if a customer also purchased a HANA database (DBaaS – database-as-a-service) where the SDI data provisioning server can be enabled.
What both have in common is their high level architecture: an onPremise agent provides native connectivity to all onPremise systems, and this agent will communicate with the cloud via simple, outbound HTTPS. All configuration, modelling and monitoring is handled through web-based user interfaces.
Let’s have a look now what this means for some important use cases.
SAP IBP – Integrated Business Planning
For SAP IBP the required integration tool is bundled with the application license and this is Cloud Platform Integration for data services (CPI-DS). There are “templates” available, which are pre-defined dataflows with mappings from SAP ERP or APO to a data model in IBP. These templates can be used as-is, or can be used as a starting point for further customization. Also the write back (from IBP to ERP/APO) can be done via CPI-DS.
In the IBP suite there’s one exception with IBP Response. IBP Response is leveraging the SDI engine with the goal to enable real-time replication in the future. There is no SDI web user interface available and no customizations to mappings are possible, the configuration is done through the IBP Fiori based user interface. So only the agent from SDI is visible to customers as a component they need to install.
HANA on the SAP Cloud Platform
Technically both CPI-DS and SDI can be used to load data into a HANA database on the SAP Cloud Platform (fka HANA Cloud Platform). But there is a strong recommendation to use SDI for this use case because SDI is natively part of this HANA database, so does not require switching to a different user interface and SDI also provides richer functionality: more adapters, more transformation capabilities and the option to use real-time replication and federation.
CPI-DS is also not available in all of the rapidly expanding set of datacenters where SAP Cloud Platform is going, while SDI is always there where HANA is.
Other use cases
Other use cases are integration with SAP or non-SAP applications where there’s no HANA database involved, or the HANA database is not exposed to customers. In these cases all integration needs to go through APIs (SOAP, REST, OData, …) and we are again in the gray zone where process and data integration start to blend. In many cases Cloud Platform Integration for process integration can be used, and this is also where all out-of-the-box content is available (see: integration content catalog ).
Where CPI-DS can bring value is for integration between these cloud applications and an onPremise data warehouse, either on SAP BW, HANA or a 3rd party database.
Hopefully this blogs gives you a better understanding on how the cloud-based data integration tools from SAP relate and which one to use when.