Skip to Content

 📃 Introduction 

Today developers and data-experts are able to choose from an impressive number of public datasets for various purposes:

Of course these datasets are a perfect starting point to build demos and prototypes as well!  

In this post I will describe an analytics demo, based on a public dataset, which is build with the following components:

  • SAP Data-Hub
    • Metadata explorer for data exploration
    • Pipelines for data ingestion
    • Vora for persistency and SQL access
  • SAP HANA
    • Virtualization Layer for integration w. business data
  • SAP Analytics Cloud
    • Visualizations

The public dataset used for this demo is the Deutsche Börse Public Dataset provided by AWS S3(Link).

From an architecture pattern perspective this public dataset could represent a generic data lake based on a cloud object storage.

 

📼 Demo video

 

📐Demo Description:

The storyline of the demo consists of these main building blocks

Explore

The SAP Data Hub Metadata Explorer will be used to graphically browse the content of the S3-bucket.

Manage & Persist

For reading the data from the S3 bucket and in order to persist it in SAP Data-Hub Vora, a basic Data-Hub Pipeline will be used.

Combine w. business Data

In the demo a HANA system will contain business data, or more precise, additional master data for analysis purposes.

Next to combining the trade-data from S3  with business data, in this scenario the calculation view is used to virtually access the data stored in Vora.

From an architecture perspective HANA could be used a virtualization layer to combine multiple distributed big data engines with In-Memory processing capabilities.

One aspect for considering SAP HANA as virtualization layer is the potential reuse of existing SAP data models and authorizations by generating them as HANA views. (Link)

In this scenario SAP BW master-data(InfoObject) and corresponding analysis authorizations could be exported from a BW/4HANA and virtually combined with data from S3 using HANA views.

Visualize

Last but not least it is always a pleasure to build some beautiful visualizations with the SAP Analytics Cloud

As well the SAC HANA live data connection(Link) is an important building block of this demo.

“Some benefits of live data connection are:

  • No data replication and prevents transfer of large datasets from source systems
  • Automatically updated with current data – “live” data
  • Create complex models and calculation in source systems and leverage them within SAC
  • Sensitive data can stay in local network, behind your firewall”

🛠  Implementation

1. SAP Data Hub Metadata Explorer

Maintain the S3-Bucket connection in “Connection Management”

First the S3-bucket needs to be maintained in the DH Connection Management.

Relevant parameters are:

  1. Custom endpoint = “s3.eu-central-1.amazonaws.com”
  2. Region = “eu-central-1”
  3. Root Path = “/deutsche-boerse-xetra-pds”

The AWS Access and Secret Keys have to be maintained.
Leaving the access and secret key empty does not work for this demo.

Picture: Connection to S3-Bucket

 

Browse content of S3 bucket

After maintaining the S3-connection, the data is available for exploration and the first findings:

One file folder per day

Picture: Folder structure S3 bucket

Each folder contains CSV files:

Picture: CSV files in folder of S3 bucket

 

The data types and content of CSV

Picture: Columns and data type description of CSV file with trading data

Picture: Content and data distribution in the CSV file.

 

After exploring the content of the S3 bucket, building the DH pipeline is the next step.

2. SAP Data Hub Pipeline

The pipeline consists of these main elements:

S3-Consumer 

This operator is used to access an S3 instance to read a file or periodically poll a directory for its contents.

Java-Script Operator

The JavaScript Operator allows for execution of JavaScript snippets within a graph.

In this example the java-scripts iterates over the content of the folder, e.g. the passes the file-names to the next S3-Consumer,

Vora Avro Ingestor 

This operator allows you to dynamically ingest data into SAP Vora based on the incoming Avro or other text messages in CSV and Json

Picture. DH Pipeline S3 to Vora Avro Ingestor

 

2. SAP Data-Hub Vora

After running the pipeline the trading data for the selected day is saved in a Vora disk-based streaming table.

SAP HANA Academy – Vora 2.0: Overview

Streaming tables support SQL statements like INSERT, UPDATE, or DELETE.
As well streaming tables persist their content in the distributed log (DLog).

This enables the cluster to recover the data after restart or failure.

Picture: Trading data persisted in Vora streaming table

In addition to persist the data in a updateable disk-based table, a Vora table could directly be created on a S3-bucket:

SAP HANA Academy – Vora 2.0: Connecting to S3 Buckets

3. SAP HANA

As described, in this demo the HANA In-Memory Engine provides two relevant features:

  • Virtual access to the Vora tables without data replication

    Picture: Vora table connect to SAP HANA as remote source

  • Graphical modeling to combine the trading data w. business data or master data.

In this demo additional information about the listed companies will be added for later SAC–visualizations.

Picture: Calculation View based on remote data combined w. company information.

 

3. SAP Analytics Cloud (SAC)

For this demo the folder and trading data files of November 15th were loaded and visualized.

The first visualization is a basic time series or line chart per minute:

Picture: Tradevolume Line-Chart

 

 

The next chart compares start- and end-price per minute of the SAP share:

 

And a basic analysis to identify outliers based on the trade volume and price change:

In the morning of the November 15th international retail holding company had a major share of the overall trade volume:

 

💡 Conclusion

The intention of this blog and demo is to demonstrate how plug&play like integration between several SAP-analytics architecture components works in practise based on a public dataset.

What is important to emphasize from my perspective:

  • No downloads or installations required, this demo was built using only the web-browser as IDE
  • Except of a few lines Java-Script, the demo is implemented graphically,.
  • The DH pipelines are a great tool to visually model dataflows which combine technologies like Kafka, SAP or cloud-based object storages
  • Vora, in combination with SAP HANA, makes the data then accessible for users that have a background in relational databases or SQL
  • And finally the relevant information is made available for a larger audience using SAP Analytics Cloud (SAC)

Many thanks for reading this blog till here!  👍

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

Leave a Reply