In this article I would like introduce SAP Data Hub as a modern data platform that combines raw data and with data from enterprise landscapes together to address a common problem. With the massive generation of information from the advent of the internet and the increasing digitization of business, there is tremendous opportunity in the new amounts and types of data collected. But this increase has also dramatically increased the complexity of the enterprise data landscape, with multiple data lakes, data warehouses, operational applications, eCommerce, online interactions, and so on.
IT is under tremendous pressure to respond to business needs – an increasing number of internal customers who want new analytics, new applications, and better data sharing with partners quickly.
SAP Data Hub is a data operations (DataOps) management solution that enables agile management of data in a diverse landscape across the organization. This enterprise-ready solution provides governance and orchestration for data refinement and enrichment, using pipelining of many complex data processing operations, like machine learning (ML).
For more information watch the video.
Today, enterprise customers are finding it too slow, expensive, and challenging to move data to where it needs to go. This is due to:
Rapidly evolving and expanding data landscapes, with more data silos than ever.
- Increasing number of ways to create data – more applications, digital interactions, sensors, social media, web-based sources
- Quick rise in data volume growth and data diversity – driving the data silo proliferation
- Increasing number of data consumption endpoints – analytics, enterprise apps, mobile apps, cloud apps
Data silos are reinforced by organizational silos
- For example, the Big Data teams managing the Hadoop data lakes are not the same as the people managing the EDW. They use different tools and don’t interact often. Also, business departments often have their own data and their own people managing them. At the overall landscape level, it’s hard to look across all of the systems and information.
To be able to better meet the needs of business and the fast pace of today’s demands, the landscape needs to overcome three challenges:
1. Governance challenge: Lack of visibility. Who changed the data? What was changed? Who is accessing it?
- Example: Where did this strange result come from? Who did it? Financial manager reviews a table, finds an unusual result, traces it back to see that someone mistakenly averaged two averages together from different systems.
2. Data Pipeline challenge: Too hard to refine and enrich data across multiple systems.
- Refine: running computations to move from raw data to candidate data.
- Enrich: append data from different sources together to create a more robust compilation of information
- Enrich data by appending information from other systems, such as connecting sensor data with the asset ID and asset profile information, held in a different system.
- Refine the data by taking the temperature information from the asset sensors to determine how many times assets have gone above the recommended temperature maximum. That takes data from a high volume data store, processes it, and passes that structured result on, where it goes ultimately to an executive’s analytical dashboard, or even to a mobile field rep’s smartphone app, so they can investigate quickly
Social product feedback example:
- Refinement: You’ve launched a new, colorful, line of Instagram-friendly products. From raw social media feeds, count the number that are positive versus the amount that are negative.
- Enrich: Harmonize with a product ID, so that you can line up positive/negative comment totals with what people are praising or complaining about.
- Results can be passed along to product managers, who can use it to evolve the future product, to the services team, to address any negative reviews, or to marketing, to capitalize on a positive trend.
3. Data sharing challenge:
Integration is manual, point-to-point, painful, and slow. If you want to change an integration point or add more points to an integration path, good luck. Get in the IT line and wait for six months.
New challenges require new technologies: Distributed systems in a distributed landscape
Unify your data to achieve scalable visibility and control
Single system view – for data pipelining, orchestration, monitoring, and governance
No centralization of data – no mass data movement to a single data store
Distributed native processing – executes pipeline activities quickly, where the data resides
SAP Data Hub UI:
The VISION for SAP Data Hub is to provide the ability to understand, connect, and drive processes across the multiple data sources and endpoints with which the enterprise struggles today. By providing visibility into the landscape of data opportunities, as well as providing an easy way to connect data sources and easily create powerful data pipelines that hop across the landscape, businesses can better achieve the data agility and business value that they seek.
It is an open architecture, which means that it manages data no matter where it is, in the cloud, on premises, in an SAP system like HANA or in a non-SAP solution like Hadoop or cloud object storage.
Control hybrid landscapes, connections:
Pipelining with Connectivity, Integration & Machine Learning operators along with the flexibility to write your custom Python and R.
Example use case:
In simple terms, Raw data + Enterprise data = Intelligent Insights & Decision Making
Example pipeline built to modernize and derive meaningful insights from raw data(coming from smart devices), combined with enterprise data and further the data is fed into a Continuous Machine Learning cluster model to identify running groups based on various patterns of running behavior also further gives us information on application usage statistics which then can be used to introduce new features into those smart devices.
SAP Data Hub – machine learning (ML) and predictive analytics use case
SAP Data Hub Capabilities:
- Apply machine learning and predictive algorithms to any data set
- Operationalize ML processes rather than serializing individual algorithms manually
- Insert ML and predictive processing to any scenarios within use cases like Big Data warehousing, IoT, and enterprise information management
Insurance industry risk profiling, Credit analysis and automated scoring models and Machine failure prediction leading to automated preventative maintenance.
Overall SAP Data Hub is winning hearts and gaining confidence. I will try to cover more industry use cases in my next articles. For more updates stay tuned.
SAP Data Hub : https://www.sap.com/india/products/data-hub.html