I have SAP HANA, when would I need SAP Data Hub?
If you have SAP HANA, when would you need to use SAP Data Hub? In this blog we’ll go through several specific examples and discuss the best match for SAP Data Hub. Feel free to add your example in the comments section of this blog and we will add it and do our list, avoiding ‘it depends’ as much as we can!
SAP Data Hub: Data orchestration hub, not data storage hub
First things first, let’s discuss what SAP Data Hub is and what it isn’t. SAP Data Hub is a hub for data orchestration and next-generation data management, not a hub to store data. Some people use data hub to mean a data warehouse or a single storage for all data. In this case it is a data orchestration hub to manage and orchestrate data.
SAP Data Hub is part of SAP’s digital platform including capabilities from data management and intelligent technologies (Leonardo offerings).
Notice on the graphic below a couple of important points.
Point 1: SAP Data Hub extracts value from distributed data assets. The important words here are ‘extract value’ and ‘distributed data’. SAP Data Hub does not just move data from point a to point b. If you need a data movement or data integration that takes data from a to b, while you could technically do that with SAP Data Hub, it’s not the best use case. The best usage of SAP Data Hub is when you are extracting value. So, you are getting something from the data that doesn’t exist without some type of refinement, curation, intelligence applied to the data.
The distributed data is also important. We don’t necessarily have to lift and shift the data before using the data. For example, we could apply R, Python code to data in a data lake, we wouldn’t necessarily have to move it first. Does this mean that we don’t move the data or don’t store data in SAP Data Hub, no. SAP Data Hub has sophisticated data storage that can be used for processing data and querying the data. But it’s not a data repository where the data will live long term.
Point 2: SAP Data Hub discovers, refines, governs, orchestrations any type, variety, volume of data across your distributed landscape. Let’s look at each of these briefly.
- Discover means you can get a clear view of your data landscape and its interconnections, no matter where the data lives. Important here is that SAP Data Hub is not just about SAP data. It leverages open source technologies, has containerized execution, and is agnostic to the infrastructure. This discovery spans structured, unstructured (video, image, speech, etc), and streaming data. You can profile, understand, track, and prepare the data.
- Refine includes everything from cleansing and data quality to using machine learning to find hidden patterns, to enriching the data, to applying advanced analytics on the data. Refinement includes many, many operators as well as your own custom operators that enable you to extract value.
- Govern data assets with metadata information, creating a data catalog using metadata crawlers. You can view, understand, share, analyze date lineage to understand the impact of the data. You can anonymize, tag, provide the security, access and compliance required.
- Orchestrate is a core pillar of SAP Data Hub. SAP Data Hub orchestrates using modular data pipelines, meaning the pipelines can scale them up and down as the volume and variety of data require it. Imagine an IoT stream of data that is always on and the volume changes moment by moment. Modular data pipelines adjust to the volume and variety changes on-the-fly. SAP Data Hub uses diverse processing engines across distributed infrastructures. So, let’s say as part of a pipeline you need to use SAP HANA Text Analytic service, then execute another service using Apache Spark or R, then execute a SAP Data Services job, and Google pub/sub service. All of these are possible.
When you think about SAP HANA and SAP Data Hub, SAP Data Hub can leverage any of your SAP HANA services.
The following graphic shows a few of the services (referred to as engines) that SAP Data Hub can use as part of the pipeline processing.
To learn more about SAP Data Hub capabilities you can check out the open sap course, the product trial or developer edition, and check out the roadmap. There are also many blogs on the SAP community.
Now, let’s move on to some specific examples and discuss if this is a good situation to use SAP Data Hub.
Scenario 1: Large migration to S/4HANA
When you are migrating to S/4HANA, you have a lot going on. Preparing the data, cleansing the data, hopefully implementing SAP Master Data Governance so that you have ongoing data governance. For migrations, SAP Data Hub is normally not required. SAP has specific solutions and partner solutions that focus on migration. There is so much to offer for migration to S/4HANA that there is an Open SAP course just on the topic!
Scenario 2: Moving 6 ERP’s to 1 ERP and 16 company codes to 3 company codes
Transitioning from multiple systems to one and restructuring the company are always interesting challenges and keep you busy! For this cusotmer question we recommended to use other solutions such as SLT.
Scenario 3: Have data from SAP, non-SAP, we need to correlate the data and write it back to multiple systems
In this use case the customer has multiple data sources where they needed to understand and correlate certain relationships between the data and take action on based on various conditions. This is a perfect use case for SAP Data Hub. It involves multiple data sources, SAP Data Hub can correlate, curate, apply machine learning to discover the relationships, and ensure the appropriate action is taken.
Scenario 4: I have data scientists in a separate team where I need to extract data from SAP, get it to them, then incorporate the results back
This is also a perfect use case for SAP Data Hub. With SAP Data Hub we can directly call the machine learning models the data scientists are creating, and we can provide access to the data the data scientist need. They can access R, Python, and other ML tools directly within SAP Data Hub. One of the major use cases for SAP Data Hub is around data science and machine learning.
SAP recently announced SAP Data Intelligence which is a cloud version of SAP Data Hub with some additional ML capabilities, including Jupyter notebooks embedded inside of SAP Data Intelligence. In the future the additional ML capabilities in SAP Data Intelligence will also available in SAP Data Hub. (SAP Data Intelligence will be released later this year.)
Scenario 5: IoT sensors on products, need to shift through and understand the sensor data to know how our products are being used in the field
This is another great use case for SAP Data Hub. The example of this is a customer who has IoT sensors on washing machines and they are storing all the IoT data in a data lake. In the data lake they collected the raw sensor data that had both data on the machines themselves as well us user behavior when using the product. With over 6 million devices, there was over 16 TB of data collected daily. The challenges were the quality of data, understanding the data, and linking it to enterprise data. SAP Data Hub is a great match for this scenario because it can apply understanding, cleansing, and intelligence to the sensor data, correlating it to enterprise data, so that data analysts could start to understand things like:
- Which customers were using which features
- How the dishwashers were performing against benchmarks
This enabled a cycle to improve the overall quality and usability of the dishwashers.
Just the other day I had to make a service call on my refrigerator freezer because it was always icing over. When I called I asked them if they already knew of my problem from their sensor data and the lady on the phone laughed at me! #TimeToChangeDishWashers
Scenario 6: Use smart data integration to load data into SAP HANA
In this scenario the customer is using SAP Smart Data Integration within SAP HANA to load data into HANA. They asked if they need to move from this to SAP Data Hub. The answer is no, if you are doing a pure ETL process such as the one they have, SAP Smart Data Integration is a good choice.
Scenario 7: I have a lot of custom code that I need to pull stuff together across different sytems
Anytime you have code sprawl and tool sprawl, so you are using writing code with multiple different languages and using various different tools to transform data for various purposes, then SAP Data Hub is a good fit. Assuming that some of these must remain, SAP Data Hub can be used to orchestrate the work and you can leverage the metadata capabilities to have a central place to understand, profile the data, and catalog the metadata.
Scenario 8: Trying to figure out something we don’t know, like how can we reduce product returns
Anytime you’re exploring a new area, trying to transform a scenario, SAP Data Hub can help. In this scenario the customer was looking at different influential factors on why an item is returned. Examples could be: the more items in the basket, the higher the price, if they have multiple sizes of the same item – these are all indications that there will be a return on the order. In this case SAP Data Hub was used to help determine if this order is being placed by a ‘serial returner’, if so , different incentives are used to discourage the return.
In the picture below, you can see that return data was stored in MySQL, sales data saved in SAP HANA. Product and customer data were saved in S3. Python was used to develop a decision tree to predict the propensity of a return. The result can be used to analyze trends of serial returners and also to change incentives on-the-fly in the shopping cart.
Scenario 9: Two specific use cases you might want to check out
If you haven’t seen our Data Bits and Bites program, they are short webinars on Tuesday and Thursdays. The Data Bit is a 15 minute session, the Data Bites are 30 minute sessions. There are a few on SAP Data Hub and two we recorded on use cases.
Customer risk assessment with SAP S/4HANA and SAP Data Hub covers a scenario where the customer wanted to ascertain the risk of a new business partner using multiple sources of data and assign a risk score in S/4HANA.
Improve Quality Management in Manufacturing describes a scenario where the company wanted to use SAP Data Hub to apply machine learning to determine potential manufacturing deficiencies.
So, check these out, let us know if a blog like this with scenarios helps, and tell us is your scenario where you are wondering if SAP Data Hub is a good fit!
Ps… You also might want to check out these other Data Bits and Bites:
- I just heard about SAP Data Intelligence, what is it and how does it relate to SAP Data Hub
- How do you know if you need a data orchestration solution like Data Hub?
- Data orchestration versus ETL, what’s the difference?
- Improve Quality Management in Manufacturing with Applied Intelligence on Big Data
- Customer Risk Intelligence with S/4HANA and SAP Data Hub
- SAP Data Hub on Cisco Container Platform
- SAP Data Hub Github examples
- I have SAP HANA, when do I need SAP Data Hub