SAP Data Hub: The Strategic Foundation for Real-Time Big Data Architectures
Page 10 Figure 3 in SAP CIO Guide to Using SAP Technology for Big Data by SAP
- Door Open
- Door Close
- Power On
- Power Off
- Inside Temperature
Some of these events are extremely important to action in real-time like prolonged power outages or temperatures that fall below a set threshold. Some events need to be evaluated and actioned over time like continuous fluctuations in temperature, understanding buying patterns or to support predictive maintenance.
Let’s take a look at the predefined Freezer Monitor Graph:
- Data Generator:
- Simulates different device events. In a real-world scenario this could potentially be connected to a streaming queue like Kafka.
- Sensor Input:
- Create a continues input stream connected to the data generator.
- Connect to SAP HANA to pull reference data that can help bring context to the sensors, this includes things like Machine type and location.
- Join the incoming event data with the reference data.
- 1:4 Multiplexer:
- Split the incoming stream into four outputs for unique processing, monitoring and altering.
- Power Alarms:
- Monitor the Power on and off events. If the power has been off for more than 20 seconds raise an alarm.
- Power Outages:
- Monitor the Power off events. Track how long the device has been off for. Save the results to a HANA table for analysis, action and monitoring.
- Record the Door open and close events to a HANA table for analysis, action and monitoring.
- Maintain the current temperature, power state and max temperature of each device in a window for consumption by a dashboard in real-time.
- Temperature Moving Average:
- Send the moving average of the temperature of each device to a window on 30 second intervals.
- JS string Operator:
- Apply string conversion functions to support the dashboard output.
- Temperature Alarms:
- Send an alert if the moving temperature average is > the max temperature for the device. Max temperature was retrieved as part of the reference data in step 2.
- Histogram Plotter:
- Plot Freezer events in real-time as they stream in on a histogram.
- Histogram Plotter example output:
At this point we have:
- Enabled real-time ingestion of sensor events coming from various devices.
- Contextualized the raw sensor data with reference data.
- Built alerts to monitor the usage and performance of the devices.
From a business perspective tracking cold storage enables many business benefits including but not limited to:
- Helping both private and public entities like schools, restaurants and grocery chains ensure high quality food management in the area of food safety, quality.
- Ensure Quality Transportation of perishable foods by monitoring the entire supply chain from raw product to shelfs.
- Create a high-quality care environment in the healthcare industry by preventing spoilage on blood and diagnostic samples to medications and vaccines.
Currently we have recorded summary level and key KPI’s in HANA for future reference and reporting. What if we want to store every single event in its raw format for future analytics and analysis? We might not know exactly what we want to do with the data but don’t want to lose it. In some cases, it might make sense to store this in a data lake.
One of the key features of SAP Data Hub is its ability to work with all data across a complex distributed landscape involving both SAP and non SAP data and solutions. As part of supporting a Lambda or Kappa architecture a batch layer or data lake may be required. Data Hub provides many different predefined connection types. Some of the predefined connections like Azure Data Lake, Google Cloud Storage, HDFS, S3 and Vora could be used to satisfy the batch layer or raw data storage layers of a big data architecture. Depending on the use case some might be more suitable than others. (won’t get into that in this blog). In some cases, however you might need to work with an environment that does not yet have a predefined Data Hub connection type or Operator. Thanks to the openness of Data Hub this should not be a problem. We will extend our Freezer Monitor Demo to utilize Snowflake for storing raw sensor data. Since we don’t have a predefined connection type, we will create a new docker image that utilizes the snowflake python connector. We will then create a custom operator that can accept the raw sensor data. Finally, we will create a 1:2 multiplexer in the Freezer monitor graph to send the raw sensor data to Snowflake.
Step 1: Create a Docker Image that will install the Snowflake python connector:
Step 2: Create a new custom operator. Custom operators have 5 main areas:
- Ports: Define the required input and output ports. in this case the input port will accept the raw freezer events.
- Tags: Associate the required tags for the custom operator (reference to tags defined in the custom docker file).
- Configuration: Create custom configuration parameters that can be referenced in the script. This will help ensure your operator is reusable and generic. For this basic example I have created a few basic parameters to pass to the Snowflake connector.
- Script: The script will define the processing logic of the operator when it is invoked. This example will connect to the Snowflake and insert data into the table that was defined in the configuration parameters.
- Documentation: Provide documentation that can support other developers in understanding how to use your operator.
- 1:2 Multiplexer
- Split the incoming raw sensor data into two streams. One stream to service the speed layer and one stream to service the batch or data lake layer
- Use our custom Snowflake operator to insert the raw data.
After executing our updated graph, we can now see the raw data inserted into Snowflake: