Skip to Content
Product Information
Author's profile photo Shibajee Dutta Gupta

SAP Data Services and SAP Data Intelligence: Optimized Data Management with Hybrid Solution

 

The Hybrid context

The hybrid data management provides the data integration users, best of the on premise and cloud world with a fully integrated solution meeting the needs of classic ETL, data quality and intelligent data management and while also enabling the use of technologies indigenously in a mutually exclusive method. SAP’s hybrid data management solution enables the existing SAP Data Management user base with the following:

  • Continue to use the existing SAP Data Services contents and assets in SAP Data Intelligence Cloud.
  • Huge value addition by bridging the unique capabilities of both SAP Data Services and SAP Data Intelligence Cloud. Take advantage of common functionalities in a distributed processing architecture
  • Improved user experience for the existing SAP Data Services users with the usability extended into SAP Data Intelligence Cloud.

Fig 1.0

 

Scope of the SAP Data Services and SAP Data Intelligence Cloud hybrid solution

  1. Orchestration of SAP Data Services jobs as a part of the Data Intelligence Cloud data pipelines (currently available).
  2. Move enriched and transformed legacy and enterprise, table type data directly from the SAP Data Services datastores (SAP Data Services Connection Object) and consume in SAP Data Intelligence Cloud. (Planned for 2021. Supports unique targets found in SAP Data Intelligence Cloud only).
  3. Bi-directional data movement between SAP Data Intelligence Cloud and SAP Data Services to utilize unique data transformation and data enrichment in SAP Data Services. Our plans are to fully utilize other unique transforms from SAP Data Intelligence Cloud and SAP Data Services (Planned for 2022).

 

Use Case 1 – Scenario

  1. Move enriched and transformed data, by SAP Data Services with its supported RDBMS based legacy and enterprise analytical data system into cloud-based SAP Data Intelligence Cloud supported connection.
  2. Modelling and configuration is performed in SAP Data Intelligence Cloud. In SAP Data Intelligence Cloud there is a new operator Data Services Transform. This operator accepts user inputs for generating SAP Data Services executable code and pushes it to the existing on-premise SAP Data Services infrastructure and procures the data to be consumed in SAP Data Intelligence Cloud. Note: There is zero configuration required in the existing SAP Data Services infrastructure.
  3. Source and Target:
  • Source is on premise. In this case we are using an SAP Data Services datastore, which is pre-configured and being used by existing SAP Data Services batch dataflows to populate, enriched, and transformed data. This datastore is the source connection, which in turn is pointing to an Oracle database. Note: From SAP Data Intelligence perspective SAP Data Services datastore is the source and not the underlying database connection. Please refer to fig 2.0
  • Target is in cloud. In this case we will be using SAP Data Intelligence Cloud connection pointing to Kafka. Kafka connectivity is unique in SAP Data Intelligence Cloud. Note: There is a relational type data to message type data conversion occurs seamlessly in the background based on SAP Data Intelligence Cloud smart design.

 

Functional architecture and requirements

 

Fig 2.0

 

Requirements:

  • SAP Data Services 4.2 SP 14 Patch 13 or above
  • SAP Data Intelligence Cloud 2107
  • Configure and identify the SAP Data Services connection in SAP Data Intelligence Cloud
  • Identify the SAP Data Services repository and required datastore to be used as source
  • Identify the SAP Data Intelligence Cloud connection to be used as target
  • As per Fig 3.0 below SAP Cloud Connector installation is required in the SAP Data Services environment

Graph design and execution:

Step 1: In SAP Data Intelligence Cloud modelling tool, build a graph using SAP Data Services Transform

Step 2: Execute graph. The system will generate the required code, push it down to SAP Data Services and execute the code within the SAP Data Services engine

Step 3: SAP Data Services will produce the output dataset and move it to SAP Data Intelligence Cloud

Step 4: SAP Data Intelligence Cloud will execute its processing sub engine to write the data into the SAP Data Intelligence Cloud connection

 

Technical architecture

Fig 3.0

 

Benefits of this hybrid data management solution

  1. The assets built on and around SAP Data Services can be efficiently reused by the cloud, so that user does not have to rebuild any existing assets.
  2. This solution bridges the gap between the unique capabilities supported between the on-premises and cloud solution
  3. Cloud based user experience expands the on-premises tested and trusted capabilities.
  4. SAP Data Intelligence Cloud supports orchestration of SAP Data Services jobs, which helps in remotely invoking an existing job and producing the output. This hybrid approach minimizes complete execution of SAP Data Services jobs each time and helps consume the required data slice.
  5. The paradigm shift is within the information consumption method. In the on-premises solution, user goes to the information repository to get their required slice for decision making. With this hybrid use case, the required (subscribed) slice of information can potentially go to the user.
  6. The SAP Data Services on-premises implementation has no configuration change, thus allowing the ongoing SAP Data Services assets to run as is. This means zero disruption in running the business yet expanding the data outreach
  7. Enables expanded and distributed processing power and opens the scope of handling a wide range of data formats and data transformation between on-premises and cloud applications.
  8. Existing skill set has a very low learning curve. This also helps gradual skill sharpening with the rapid evolution of SAP Data Intelligence Cloud
  9. With the adoption of SAP Data Intelligence Cloud, the hybrid data management solution allows to future-proof user projects, relying on SAP’s strategic solution for cloud data management.

 

The evolution roadmap of the SAP data management platform and SAP Data Intelligence Cloud

Please refer to the corresponding blog here: https://blogs.sap.com/2021/08/03/sap-data-services-sap-information-steward-and-sap-data-intelligence-strategy-maintenance-and-future-vision-of-sap-data-management-products/

 

Is there a user story or video or both available?

We will be releasing a demonstration video of the SAP Data Services and SAP Data Intelligence and upload into YouTube also. Please stay tuned for an extension to this blog which will include the demonstration video.

Please join me in a live session on September 30th, 2021. Kindly register here.

Assigned tags

      4 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Devraj Bardhan
      Devraj Bardhan

      Data Services and DI interplay is quite interesting and I am looking forward to your session on 30th Sep.

      Author's profile photo Werner Dähn
      Werner Dähn

      Hi Shibajee, it has been a while!

      I would have a few questions on above text.

      You did not list Orchestration as a capability on the DataServices (DS) side with its Workflows, Jobs, execute-only-once semantic, error handling. Does that mean all orchestration has to be done on the Data Intelligence (DI) side, the integration is purely on a DS DataFlow level and all workflow logic must be rebuilt in DI? The notion of a Data Services Transform in DI seems to indicate that, too.

      How does DS hand over the data to DI? I see webservices and WebHDFS. Does that mean the exchange happens via files? So for a large source system I first need to wait for an hour for the file being created and DI is idle, then DI starts processing the file for an hour and DS is idle? Or is it streaming the data continuously from end-to-end, so that the overall execution time is just one hour instead of two?

      From an execution point of view, what would happen if I want to read onPrem data, apply a Tensorflow logic, compare the results with a onPrem Hana and feed the changed data into Kafka? Per my understanding the reading is done by DS, Tensorflow exists in the DI engine only, hence all data must be sent over the Internet to DI, then back to the onPrem system for the table comparison and again to DI as Kafka loading is supported only there. This would take forever to execute, wouldn't it?

      The DI DataServices Transform editor, will it have 100% of the DS functionality? Every transform, nested data, all functions, all properties of each DS dataflow object? That is a tall order. On the one hand, all these options exist for a reason, because they are needed by customers, otherwise it would have been removed from DS long time ago.

      When executing data flows a lot is controlled via parameters, global variables, substitution variables in DS. Will it be possible to manage these in DI and pass the values to DS? And vice versa in case needed, e.g. check if the financial period was closed in the onPrem system and if, a different DI task should be started.

       

      Thanks in advance,

       

      Werner

      Author's profile photo Ginger Gatling
      Ginger Gatling

      Excellent - thanks, Shiba!

      Author's profile photo Ginger Gatling
      Ginger Gatling

      Excellent - thanks, Shiba!