Once you have a chance to read my colleague Marc Hartz’s blog on SAP Data Hub 2.3 you may explore multiple options to learn SAP Data Hub 2.3 in depth. This blog summarizes various ways you as a customer and/or a partner and/or a developer can learn SAP Data Hub 2.3. The options are
- Watching videos at SAP Data Hub Channel
- Attending SAP Data Hub hands-on sessions at TechEd
- Arranging SAP Data Hub Technical Academy session(s) contacting your account team
- Starting a fully functional Data Hub 2.3 trial Instance at Google Cloud Platform
- Using Data Hub Developer Edition to build your pipeline
- Registering to SAP Data Hub Webinars
1. SAP Data Hub Channel
We publish videos showing some capabilities of SAP Data Hub including SAP Data Hub 2.3 at SAP Data Hub Channel. We intend to publish additional videos showing new capabilities and end-to-end customer stories. SAP Data Hub 2.3 functionalities are published here
In the first video of this series, Metadata Explorer in SAP Data Hub 2.3 is demonstrated. It highlights metadata extraction, indexing, publishing, searching, labeling and monitoring within Metadata Explorer.
SAP Data Hub 2.3 connects to various enterprise, big data and cloud sources natively including
- Relational databases (Oracle, etc.) and enterprise applications
(e.g. SAP S/4HANA, SAP BW/4 HANA)
- Popular cloud storage platforms such as WASB, S3, and GCS
- Open protocols such as OData and OpenAPI
- Cleansing and enrichment services via integration of SAP Data Quality Management microservices (DQMm) for location data
- Machine Learning Services like SAP Machine Learning Foundation Services
- 3rd Party Services and technologies like Spark, Livy and Google Pub/Sub
The following video shows a sample connectivity to HDFS via connection manager in SAP Data Hub 2.3. It also describes how to connect to a supported source in SAP Data Hub 2.3. This connection created once by an admin user may be reused by multiple users for metadata explorer, modeling, etc..
Once you connect a source to SAP Data Hub, you may wish to browse, discover a dataset or you may want to know more about the quality of dataset which is provided by profiling in SAP Data Hub. The following video demonstrates browsing, profiling and discovery in SAP Data Hub 2.3.
As a data steward or owner of dataset(s), you may need to extract metadata, review metadata periodically and publish the metadata to a catalog which can be used or consumed by end users – data scientists, business analysts, BI developers to search, discover and explore datasets. The following video shows how indexing and publishing are performed in SAP Data Hub.
After you discover your dataset and find insight as quality, you may need to enhance your discovered dataset(s). If you are a business analyst, it is more likely you will prefer a spreadsheet like UI to enhance or prepare your dataset(s). SAP Agile Data Preparation provides this functionality which is used here to create data preparation steps in an easy to use UI and which eventually you can be submitted to a larger dataset which may reside in your Hadoop data lake. The following video shows the steps on how to prepare or enhance your dataset and process them natively in your data lake.
However, if you are a technical user familiar with ETL tools, you may prefer to use SAP Data Hub modeler to create your pipeline or graph to enhance your dataset(s). The following video demonstrates how to perform the same functionalities of enhancing your data with structure transforms using SAP Data Hub Modeler.
Once you have a pipeline and you can execute with additional operators Workflow Trigger and Workflow Terminator as shown below.
Now you have some understanding about end-to-end flow from connection management, metadata explorer, self-service data preparation with SAP Agile Data Preparation and building a pipeline with modeler you may ready to explore more how SAP Data Hub modeler can create advanced pipelines to process unstructured data and perform various advanced processing tasks.
The following four videos show how to model data ingestion pipelines with SAP Data Hub 2.3. First video shows how to read HDFS files with Read File operator and use Wiretap operator in a pipeline.
SAP Data Hub Modeler supports complex processing of data with many out of the box operators including Python 2 operator. The following example shows how easily available Python libraries can be consumed with Python 2 operator to create a pipeline in SAP Data Hub 2.3.
When data is processes by Python 2 operator, it can be loaded to Vora for further analysis. The following video shows how data processed by Python 2 operator is loaded into Vora with an out of the box Vora Avro operator.
Finally how unstructured dataset can be combined in pipeline to enhance it further for analysis of text data is shown here.
You can continue watching additional videos available under SAP Data Hub 2.x feature playlists, particularly how to create new connections including SAP HANA and SAP BW, scheduling, monitoring and troubleshooting in SAP Data Hub.
2. SAP Data Hub Hands-on Sessions at SAP TechEd
Hands-on sessions offered at Las Vegas TechEd are being offered at Barcelona and Bangalore too. Please check out this blog for additional information. You can meet SAP Data Hub experts at these hands-on sessions for additional information.
3. Arranging a SAP Data Hub Technical Academy session contacting your account team
SAP offers 1 day SAP Data Hub Technical Academy workshop at various SAP locations inviting local customers and partners. SAP often offers similar sessions at our customer locations. We request you to contact your account team to arrange such sessions where your team can learn SAP Data Hub 2.3 from SAP Experts.
4. Starting a fully functional Data Hub 2.3 Trial Instance at Google Cloud Platform
If you are a customer and/or a partner, launch a fully functional SAP Data Hub 2.3 trial instance at Google Cloud Platform. SAP Cloud Appliance Library offers SAP Data Hub 2.3 at Google Cloud Platform. You may download Getting Started Document
Please follow this step by step documented in this tutorial on how to set up your trial instance at Google Cloud Platform. Once your instance is up and running you may continue with the following tutorials.
- Discovery Part1 & Part2
- Workflow Part1, Part2, Part3 and Part4
- Pipeline Part1, Part2, Part3, Part4 and Part5
5. Using Data Hub Developer Edition to build your pipeline
SAP Data Hub 2.3 Developer Edition can be download on your laptop to develop pipeline. Please check out this blog to start your SAP Data Hub Developer Edition journey.
6. Registering to SAP Data Hub Webinars
Please check out this blog for additional information on coming webinars available to you and how to register.
Thank you for reaching here and happy learning!