SAP Data Services and SAP Data Intelligence: Optimized Data Management with Hybrid Solution
The Hybrid context
The hybrid data management provides the data integration users, best of the on premise and cloud world with a fully integrated solution meeting the needs of classic ETL, data quality and intelligent data management and while also enabling the use of technologies indigenously in a mutually exclusive method. SAP’s hybrid data management solution enables the existing SAP Data Management user base with the following:
- Continue to use the existing SAP Data Services contents and assets in SAP Data Intelligence Cloud.
- Huge value addition by bridging the unique capabilities of both SAP Data Services and SAP Data Intelligence Cloud. Take advantage of common functionalities in a distributed processing architecture
- Improved user experience for the existing SAP Data Services users with the usability extended into SAP Data Intelligence Cloud.
Scope of the SAP Data Services and SAP Data Intelligence Cloud hybrid solution
- Orchestration of SAP Data Services jobs as a part of the Data Intelligence Cloud data pipelines (currently available).
- Move enriched and transformed legacy and enterprise, table type data directly from the SAP Data Services datastores (SAP Data Services Connection Object) and consume in SAP Data Intelligence Cloud. (Planned for 2021. Supports unique targets found in SAP Data Intelligence Cloud only).
- Bi-directional data movement between SAP Data Intelligence Cloud and SAP Data Services to utilize unique data transformation and data enrichment in SAP Data Services. Our plans are to fully utilize other unique transforms from SAP Data Intelligence Cloud and SAP Data Services (Planned for 2022).
Use Case 1 – Scenario
- Move enriched and transformed data, by SAP Data Services with its supported RDBMS based legacy and enterprise analytical data system into cloud-based SAP Data Intelligence Cloud supported connection.
- Modelling and configuration is performed in SAP Data Intelligence Cloud. In SAP Data Intelligence Cloud there is a new operator Data Services Transform. This operator accepts user inputs for generating SAP Data Services executable code and pushes it to the existing on-premise SAP Data Services infrastructure and procures the data to be consumed in SAP Data Intelligence Cloud. Note: There is zero configuration required in the existing SAP Data Services infrastructure.
- Source and Target:
- Source is on premise. In this case we are using an SAP Data Services datastore, which is pre-configured and being used by existing SAP Data Services batch dataflows to populate, enriched, and transformed data. This datastore is the source connection, which in turn is pointing to an Oracle database. Note: From SAP Data Intelligence perspective SAP Data Services datastore is the source and not the underlying database connection. Please refer to fig 2.0
- Target is in cloud. In this case we will be using SAP Data Intelligence Cloud connection pointing to Kafka. Kafka connectivity is unique in SAP Data Intelligence Cloud. Note: There is a relational type data to message type data conversion occurs seamlessly in the background based on SAP Data Intelligence Cloud smart design.
Functional architecture and requirements
- SAP Data Services 4.2 SP 14 Patch 13 or above
- SAP Data Intelligence Cloud 2107
- Configure and identify the SAP Data Services connection in SAP Data Intelligence Cloud
- Identify the SAP Data Services repository and required datastore to be used as source
- Identify the SAP Data Intelligence Cloud connection to be used as target
- As per Fig 3.0 below SAP Cloud Connector installation is required in the SAP Data Services environment
Graph design and execution:
Step 1: In SAP Data Intelligence Cloud modelling tool, build a graph using SAP Data Services Transform
Step 2: Execute graph. The system will generate the required code, push it down to SAP Data Services and execute the code within the SAP Data Services engine
Step 3: SAP Data Services will produce the output dataset and move it to SAP Data Intelligence Cloud
Step 4: SAP Data Intelligence Cloud will execute its processing sub engine to write the data into the SAP Data Intelligence Cloud connection
Benefits of this hybrid data management solution
- The assets built on and around SAP Data Services can be efficiently reused by the cloud, so that user does not have to rebuild any existing assets.
- This solution bridges the gap between the unique capabilities supported between the on-premises and cloud solution
- Cloud based user experience expands the on-premises tested and trusted capabilities.
- SAP Data Intelligence Cloud supports orchestration of SAP Data Services jobs, which helps in remotely invoking an existing job and producing the output. This hybrid approach minimizes complete execution of SAP Data Services jobs each time and helps consume the required data slice.
- The paradigm shift is within the information consumption method. In the on-premises solution, user goes to the information repository to get their required slice for decision making. With this hybrid use case, the required (subscribed) slice of information can potentially go to the user.
- The SAP Data Services on-premises implementation has no configuration change, thus allowing the ongoing SAP Data Services assets to run as is. This means zero disruption in running the business yet expanding the data outreach
- Enables expanded and distributed processing power and opens the scope of handling a wide range of data formats and data transformation between on-premises and cloud applications.
- Existing skill set has a very low learning curve. This also helps gradual skill sharpening with the rapid evolution of SAP Data Intelligence Cloud
- With the adoption of SAP Data Intelligence Cloud, the hybrid data management solution allows to future-proof user projects, relying on SAP’s strategic solution for cloud data management.
The evolution roadmap of the SAP data management platform and SAP Data Intelligence Cloud
Please refer to the corresponding blog here: https://blogs.sap.com/2021/08/03/sap-data-services-sap-information-steward-and-sap-data-intelligence-strategy-maintenance-and-future-vision-of-sap-data-management-products/
Is there a user story or video or both available?
Please follow this link for a demonstration video of the SAP Data Services and SAP Data Intelligence Hybrid Data Management
Data Services and DI interplay is quite interesting and I am looking forward to your session on 30th Sep.
Hi Shibajee, it has been a while!
I would have a few questions on above text.
You did not list Orchestration as a capability on the DataServices (DS) side with its Workflows, Jobs, execute-only-once semantic, error handling. Does that mean all orchestration has to be done on the Data Intelligence (DI) side, the integration is purely on a DS DataFlow level and all workflow logic must be rebuilt in DI? The notion of a Data Services Transform in DI seems to indicate that, too.
How does DS hand over the data to DI? I see webservices and WebHDFS. Does that mean the exchange happens via files? So for a large source system I first need to wait for an hour for the file being created and DI is idle, then DI starts processing the file for an hour and DS is idle? Or is it streaming the data continuously from end-to-end, so that the overall execution time is just one hour instead of two?
From an execution point of view, what would happen if I want to read onPrem data, apply a Tensorflow logic, compare the results with a onPrem Hana and feed the changed data into Kafka? Per my understanding the reading is done by DS, Tensorflow exists in the DI engine only, hence all data must be sent over the Internet to DI, then back to the onPrem system for the table comparison and again to DI as Kafka loading is supported only there. This would take forever to execute, wouldn't it?
The DI DataServices Transform editor, will it have 100% of the DS functionality? Every transform, nested data, all functions, all properties of each DS dataflow object? That is a tall order. On the one hand, all these options exist for a reason, because they are needed by customers, otherwise it would have been removed from DS long time ago.
When executing data flows a lot is controlled via parameters, global variables, substitution variables in DS. Will it be possible to manage these in DI and pass the values to DS? And vice versa in case needed, e.g. check if the financial period was closed in the onPrem system and if, a different DI task should be started.
Thanks in advance,
It's all good on my side. Hope all is well with you. Indeed it has been a long time. I was on vacation and then was sick for a week after I returned from vacation. Thus could not address your thoughts earlier.
Please see my notes in the same sequence of the paragraphs of your post. Really appreciate your questions and thoughts.
The orchestration capability with Data Intelligence into Data Services for Job execution has been there from the very beginning when Data Hub started, and the functionality exists and currently available. The orchestration capability is already covered through a demo on the Youtube. This blog is focused of the new Hybrid use case. If the use case is for orchestrating DS jobs, then yes, it will be from DI interface.
Currently it is a secured data caching mechanism which is volatile in nature. Replacing caching with streaming and DI consuming from stream endpoint is in the roadmap.
Current Hybrid scope that has been delivered is moving data from DS datastore into DI Connection. For your example the Tensorflow data is produced in DI. The option would be to leverage the common capabilities, which is the intersection in the venn diagram. DI can place the data in a location supported by both DS and DI and apply the Table compare from DS. That way data need not be pushed to and fro on-premise and DI. In our 2022 scope we have plans to build the use of common location as a part of the solution.
The current Data Services Transform in DI is pretty much equivalent to the Query Transform in DS. For the 2022 scope, we are analyzing the most highly used transforms in DS and within that, the highly used functionality, to make it a part of transform editor. Since there are some converters already in DI we plan not to re-invent the wheel as a part of the hybrid solution.
Yes, parameter passing from DI into DS is planned for 2022. Substitution parameters is a tricky one, we may not need that as it can be better managed via DS and achieved through orchestration.
I hope this helps. It will be good to catch up with you over the phone. I will setup a call.
Excellent - thanks, Shiba!
Excellent - thanks, Shiba!