Skip to Content
Technical Articles

SAP Data Hub: The Strategic Foundation for Real-Time Big Data Architectures

SAP Data Hub is an all-in-one data orchestration solution that discovers, refines, enriches, and governs any type, variety, and volume of data across your entire distributed data landscape. It supports your intelligent enterprise by rapidly delivering trustworthy data to the right users with the right context at the right time.

With the latest release of SAP Data Hub we see some great new features.  One of the newest features is the Streaming Analytics Operator (Beta).  The streaming analytics operator represents a “Streaming Project”.  Within the Streaming Analytics Operator, input and output adapters can be added with Continuous Computation Language (CCL).  In standard Data Hub fashion, you can also define Input and output ports to the Streaming Analytics Operator.

In this Blog I’ll focus on utilizing the Streaming Analytics operator to support the “speed” layer in a big data Architecture like Lambda or kappa.  We will also have a look at the openness of SAP Data Hub and its ability to distribute data across the entire data landscape via custom built operators.
Lambda Architecture:
Page 10 Figure 3 in SAP CIO Guide to Using SAP Technology for Big Data by SAP
What does the Streaming Analytics Built-in Operator look like?:
The operator has several configuration settings that can be seen here

In addition to the operator there are several Streaming Analytics related Graphs:
The graphs are useful to get a better understanding of how to utilize Streaming Analytics projects within Data Hub graphs.  Details on each of the supplied graphs can be found here
As a baseline for this blog we will work with the “Freezer Monitor” graph.  This graph simulates various events generated by sensors in a freezer.  Events captured are:
  • Door Open
  • Door Close
  • Power On
  • Power Off
  • Inside Temperature

Some of these events are extremely important to action in real-time like prolonged power outages or temperatures that fall below a set threshold.  Some events need to be evaluated and actioned over time like continuous fluctuations in temperature, understanding buying patterns or to support predictive maintenance.

Let’s take a look at the predefined Freezer Monitor Graph:

  1. Data Generator:
    • Simulates different device events.  In a real-world scenario this could potentially be connected to a streaming queue like Kafka.
  2. Sensor Input:
    • Create a continues input stream connected to the data generator.
    • Connect to SAP HANA to pull reference data that can help bring context to the sensors, this includes things like Machine type and location.
    • Join the incoming event data with the reference data.
  3. 1:4 Multiplexer:
    • Split the incoming stream into four outputs for unique processing, monitoring and altering.
  4. Power Alarms:
    • Monitor the Power on and off events.  If the power has been off for more than 20 seconds raise an alarm.
  5. Power Outages:
    • Monitor the Power off events.  Track how long the device has been off for.  Save the results to a HANA table for analysis, action and monitoring.
  6. Dashboard:
    • Record the Door open and close events to a HANA table for analysis, action and monitoring.
    • Maintain the current temperature, power state and max temperature of each device in a window for consumption by a dashboard in real-time.
  7. Temperature Moving Average:
    • Send the moving average of the temperature of each device to a window on 30 second intervals.
  8. JS string Operator:
    • Apply string conversion functions to support the dashboard output.
  9. Temperature Alarms:
    • Send an alert if the moving temperature average is > the max temperature for the device.  Max temperature was retrieved as part of the reference data in step 2.
  10. Histogram Plotter:
    • Plot Freezer events in real-time as they stream in on a histogram.
    • Histogram Plotter example output:

At this point we have:

  • Enabled real-time ingestion of sensor events coming from various devices.
  • Contextualized the raw sensor data with reference data.
  • Built alerts to monitor the usage and performance of the devices.

From a business perspective tracking cold storage enables many business benefits including but not limited to:

  • Helping both private and public entities like schools, restaurants and grocery chains ensure high quality food management in the area of food safety, quality.
  • Ensure Quality Transportation of perishable foods by monitoring the entire supply chain from raw product to shelfs.
  • Create a high-quality care environment in the healthcare industry by preventing spoilage on blood and diagnostic samples to medications and vaccines.

Currently we have recorded summary level and key KPI’s in HANA for future reference and reporting.  What if we want to store every single event in its raw format for future analytics and analysis?  We might not know exactly what we want to do with the data but don’t want to lose it.  In some cases, it might make sense to store this in a data lake.

One of the key features of SAP Data Hub is its ability to work with all data across a complex distributed landscape involving both SAP and non SAP data and solutions.  As part of supporting a Lambda or Kappa architecture a batch layer or data lake may be required.  Data Hub provides many different predefined connection types.  Some of the predefined connections like Azure Data Lake, Google Cloud Storage, HDFS, S3 and Vora could be used to satisfy the batch layer or raw data storage layers of a big data architecture.  Depending on the use case some might be more suitable than others. (won’t get into that in this blog).  In some cases, however you might need to work with an environment that does not yet have a predefined Data Hub connection type or Operator.  Thanks to the openness of Data Hub this should not be a problem.  We will extend our Freezer Monitor Demo to utilize Snowflake for storing raw sensor data.  Since we don’t have a predefined connection type, we will create a new docker image that utilizes the snowflake python connector.  We will then create a custom operator that can accept the raw sensor data.  Finally, we will create a 1:2 multiplexer in the Freezer monitor graph to send the raw sensor data to Snowflake.

Step 1: Create a Docker Image that will install the Snowflake python connector:

Step 2: Create a new custom operator.  Custom operators have 5 main areas:

  • Ports: Define the required input and output ports.  in this case the input port will accept the raw freezer events.
  • Tags: Associate the required tags for the custom operator (reference to tags defined in the custom docker file).
  • Configuration: Create custom configuration parameters that can be referenced in the script.  This will help ensure your operator is reusable and generic.  For this basic example I have created a few basic parameters to pass to the Snowflake connector.
  • Script: The script will define the processing logic of the operator when it is invoked.  This example will connect to the Snowflake and insert data into the table that was defined in the configuration parameters.
  • Documentation:  Provide documentation that can support other developers in understanding how to use your operator.

Step 3: Add the 1:2 Multiplexer operator and our new custom Snowflake operator to the graph:
  1. 1:2 Multiplexer
    • Split the incoming raw sensor data into two streams.  One stream to service the speed layer and one stream to service the batch or data lake layer
  2. Snowflake:
    • Use our custom Snowflake operator to insert the raw data.

After executing our updated graph, we can now see the raw data inserted into Snowflake:

Next Blog: We will look at the “serving” layer and how to bring the real-time and batch data together.

 

Be the first to leave a comment
You must be Logged on to comment or reply to a post.