In recent years, many companies have extended the scope of data sources they analyze far beyond classic enterprise systems. Big data sources like web logs, social media channels, sensors, images, etc. have proven to be important data assets from which valuable insights can be extracted. For various reasons, including massive data volumes and complex structures, such data are typically handled outside the relational data warehouse in data lakes and cloud storages. A very common use case is that – through filtering, aggregation or other processing steps in a Hadoop environment – structured information is derived from such big data sources, which then needs to be put in context with enterprise data stored in the classic data warehouse.
In such scenarios, SAP Data Hub greatly simplifies the tasks of organizing, scheduling and monitoring the complete process flow starting from data ingestion into Hadoop to the point where data is readily available for reporting in SAP BW/4HANA.
Introducing SAP Data Hub
SAP Data Hub is a data landscape management solution that enables agile data operations across the enterprise. It supports data sharing, pipelining, and governance of all data in the connected landscape. SAP Data Hub is an open-data architecture that works across Hadoop, data lakes, cloud object storage, relational databases, enterprise applications, and more:
- Experience a simpler, more scalable approach to data operations and landscape management
- Accelerate and expand data projects
- Build agile, data-driven applications
- Archive centralized data governance and visibility of data lineage
- Orchestrate processes across the data landscape, e.g. executing data pipelines, triggering SAP BW Process Chains, SAP Data Services Jobs and many more
Introducing SAP BW/4HANA
SAP BW/4HANA is SAP’s next generation data warehouse product. It builds on many concepts of SAP BW but – in combination with the SAP HANA database – takes them to a next level. Its main characteristics are:
- Simplification – SAP BW/4HANA offers a drastically simplified approach to building data warehouses, allowing to build leaner solutions in less time.
- Openness – SAP BW/4HANA offers connectivity to virtually all data types, from classic SAP ERP systems over relational databases to data lakes.
- Modern User Interfaces – SAP BW/4HANA comes with a new look and feel for business users, data warehouse modelers and administrators.
- High performance – SAP BW/4HANA leverages the power of the SAP HANA database for all data intensive processing, be it OLAP queries, data transformations or predictive analytics processes.
Real Life Customer Example
Let’s have a look at a concrete example from a customer. The scenario is about social media analytics – understanding how effective marketing campaigns are in certain regions and by what channels customers are reached. The analysis includes over 30 social media sources from which results in a uniform, consumable format are generated. Afterwards the transformed data is combined with corporate sales and master data in a dashboard running on SAP BW/4HANA.
The technical representation of such a scenario is an SAP Data Hub data workflow. Those data workflows orchestrate processes across the data landscape, e.g. executing SAP Data Hub data pipelines, triggering SAP BW Process Chains, SAP Data Services Jobs and many more.
Such an SAP Data Hub data workflow can consist of several tasks. Tasks are automatic operations that you can execute, control, and monitor based on certain user-defined conditions. SAP Data Hub supports different task types, for example:
|SAP BW Process Chain||Creating and executing an SAP BW process chain task in a task workflow helps execute an SAP BW process chain in an SAP BW system.|
|Data Pipeline||Creating and executing a data pipeline task in a task workflow helps execute a data pipeline in an SAP Vora system.|
|File Operation Task||Creating and executing a file operation task in a task workflow helps perform file operations such as copy and delete on the data sets.|
|Flowgraph||Creating and executing a flowgraph task in a task workflow helps execute a flowgraph in a Hadoop system or in an SAP Vora system.|
The illustration below shows an SAP Data Hub data workflow like in the above customer scenario: initially, data files are copied from an AWS S3 bucket into an Hadoop cluster, where relevant information is derived from the data. This information is then stored in SAP Vora tables from where an SAP BW process chain loads it to SAP BW/4HANA:
The following SAP Data Hub YouTube channel video shows how such a scenario can be implemented:
In the second part of this blog we will cover two additional scenarios:
- How to transfer data from SAP BW to SAP Vora using a SAP BW data transfer task
- How to trigger an SAP Data Hub data workflow from a SAP BW process chain