Since the introduction of SAP Data Hub in Teched 2017, there has been a tremendous buzz around use case of SAP Data hub. Just like everyone else, our first impression was “Data Hub looks like another group of EIM tools offered by SAP”. However, after a deep dive with SAP product team, we realized that SAP Data Hub is not just only group of EIM tools but also a next generational data management and orchestration platform specially designed to address the digital enterprise (Big Data, ML, IOT, external data…)
This blog we plan on covering the business value proposition of SAP Data Hub and subsequently we will have a technical blog covering the architecture and technical details for SAP Data Hub
To fully understand Data Hub value proposition, we need to first understand IOT business scenario done using traditional methods and then compare how it can be done better with SAP Data Hub.
Scenario: Predicting Machine downtime based on IoT sensor data
Integrated IOT data from 3rd party system into SAP HANA and then build predictive models on data to predict the machine downtime
Traditional Approach w/o SAP Data Hub:
- Loading data from machine sensors using Kafka to AWS S3
- Get data from AWS S3 data lake into HANA using ETL tool (Data Service). Schedule and monitor jobs for data loads
- Integrate machine sensor data with data from SAP. SAP SLT is used to fetch data from SAP ECC tables. Schedule and monitor SLT loads to HANA
- Build predictive models using R/Python using data from HANA. Schedule and monitor the model to run when new data is loaded into the tables
- Interactive UI fetches data from the HANA tables and displays machine downtime information
Based on the above steps as we see there are multiple components which need to be monitored for the final results to be accurately presented in the UI. Please see the diagram below for visual representation of the components and its connections
Approach with SAP Data Hub:
Data Hub has inherent pipelines which can orchestrate the end to end data flow which is depicted in the above scenario. Each type of component connection can be broken down into a task within Data Hub. Each task can be executed individually or in conjunction with other tasks in a pipeline. The end to end monitoring of the pipeline will be done from Data Hub workbench. Each tool in connected individually by the separate system connections, data hub can help in managing the load monitoring and end to end data flow. If failures occur at one point in the task then Data Hub can send alerts and IT teams can dig into the individual systems to troubleshoot errors.
As we can see from the diagram above all the disparate components are managed within one system which gives application developers great advantage when building complex systems with multiple inter-meshing components. One can envision plethora of use scenarios where multiple system process need to be executed in tandem and managed under one umbrella. The question would be how does data hub differ from other data load management or process orchestration tools. Data Hub definitely is more than just a process orchestration tool. For more details please check the SAP Data Hub technical blog to understand the architecture and its components
SAP Data Hub – Use Cases
Let us formulate similar use cases to bolster digital enterprise by using SAP Data Hub as a next generation data integration and data management/orchestration platform
- Automated return order processing with product image identification: – For retail customer returns process is most time and cost consuming process. The entire process can be streamlined if customer is able to send the image of the replacement part to be ordered or replaced and the machine can identify the part and place the order automatically thus reducing manual intervention. First step would be to build a corpus of all product part images and train the model .Once done this model can be fed the images from customers which it will intelligently classify and identify and process the return order. For this Data Hub will be extremely helpful to connect and monitor multitudes of components – Image processing DB, Image feature extraction API, Image classification model , SAP /other ERP system to place orders
- Hadoop for Social media and email for fraud detection – Financial Institutions can build fraud detection models by using Data Hub to integrate social media data and email data stored in Hadoop with transaction data stored in SAP ERP systems. Traditionally this type of integration took months to complete with significant costs but with Data Hub, the integration is quick and seamless, data scientists can directly build off the models built on Vora within the Data Hub. The end to end data pipeline is completely managed in the Data Hub thus giving complete control on all the complicated integration aspects of the deployment
- Predictive maintenance for automobiles using Sensor data in Hadoop – Companies can track automotive performance though continuous monitoring of sensor data. With Data Hub, companies can now integrate real time streaming data from devices with customer master and transaction data stored in HANA/ERP to help improve vehicular safety. The ability to infuse enterprise data with up-to-the-moment data from external sensors provides for contextually savvy decisions and processes that opens a new world of opportunities.
- Track lost baggage metric using RFID on airline baggage – With RFID tracking enabled for all airline baggage, the data stored in Hadoop can be queried within minutes using Data Hub- Vora’s boosted SQL performance. This helps the airline to improve the lost baggage metric and reduce costs. The ability of Data Hub developers to quickly build queries on Hadoop tables gives business the edge to keep up with up-to-date operational scenarios.