New in SAP HANA, express edition: Streaming Analytics
For everyone using SAP HANA, express edition (HXE), the great news with the latest update to HXE 2.0 SPS01) is that you can now include real-time streaming analytics in your HANA applications.
SAP HANA smart data streaming (SDS) is HANA’s high speed real-time streaming analytics engine. It lets you easily build and deploy streaming data models (a.k.a. projects) that process and analyze incoming messages as fast as they arrive, allowing you to react in real-time to what’s going on. Use it to generate alerts or initiate an immediate response. With the Internet of Things (IoT), common uses include analyzing incoming sensor data from smart devices – particularly when there is a need to react in real-time to a new opportunity – or in anticipation of a problem.
Use Cases
Streaming Analytics can be applied in a wide variety of use cases – wherever there is fast moving data and value from understanding and acting on it as soon as things happen. Common use cases include:
- Predictive maintenance, predictive quality: detect indications of impending failure in take to take preventative action
- Marketing: customized offers in real-time, reacting to customer activity
- Fraud/threat detection/prevention: detect and flag patterns of events that indicate possible fraud or an active threat
- Location monitoring: detect when equipment/assets are not where they are supposed to be
Streaming data models
Streaming data models define the operations to apply to incoming messages and are contained in streaming projects that run on the SDS server. These models are defined in a SQL-like language that we call CCL (continuous computation language) – it’s really just SQL with some extensions for processing live streams. The big difference though, is that this SQL doesn’t execute against the HANA database, but gets compiled into a set of “continuous queries” that run in the SDS dataflow engine.
Here’s a simple example of a streaming data model that smooths out some sensor data by computing a five minute moving average:
CREATE INPUT STREAM DeviceIn SCHEMA (Id string, Value integer); CREATE OUTPUT WINDOW MvAvg PRIMARY KEY DEDUCED AS SELECT DeviceIn.Id AS Id , avg(DeviceIn.Value) AS AvgValue FROM DeviceIn KEEP 5 MINUTES GROUP BY DeviceIn.Id ;
You can see that it looks pretty much like standard SQL, except that instead of creating Tables we are creating streams and windows. With windows, we can define a retention policy – in this example KEEP 5 MINUTES. And with a moving average we’re just getting started. Filtering events is as simple as a WHERE clause. You can join events streams to HANA tables to combine live events with reference data or historical data. You can also join events to each other. You can match/correlate events, watch for patterns or trends. Anyway – you get the idea.
Capturing streaming data in the HANA database
Any of the data can be captured in the HANA database – and by capturing derived data, rather than raw data – you can reduce the amount of data being captured. You can sample the data or only store data when it changes.
If I wanted to store my moving average from the example above in a HANA table called MV_AVG, I would simply attach a HANA output adapter to the window above by adding this statement to my project:
ATTACH OUTPUT ADAPTER HANA_Output1 TYPE hana_out TO MvAvg PROPERTIES service = 'hdb1', sourceSchema = 'MY_SCHEMA', table = 'MV_AVG';
Connecting to data sources
SDS includes an integrated web service that can expose a REST interface for all input streams. High frequency publishers can use WebSockets for greater efficiency.
SDS also includes a range of pre-built adapters including Kafka, JMS, file loaders and others. An adapter toolkit (Java) makes it easy to build custom adapters.
Using Real-time output
In addition to the ability to capture output from streaming projects in HANA database tables, real-time output can also be streamed to applications, dashboards, published onto a Kafka for JMS message queue, sent as email or stored in Hadoop.
Machine Learning for Predictive Analytics on Streams
SDS includes two machine learning algorithms – a decision tree algorithm and a clustering algorithm – as well as the ability to import decision tree models built using the PAL algorithms in HANA. These are particularly useful for predictive use cases, enabling you to take action based on leading indicators or detecting unusual situations.
High speed, Scalable
The SDS dataflow engine is designed to be highly scalable with support for both scale-up and scale-out, proving the ability to process millions of messages per second (with sufficient CPU capacity) and delivering results within milliseconds of message arrival.
Design time tools
Design time tools for building and testing streaming projects are available as a plugin for Eclipse and are also available in SAP Web IDE for SAP HANA. Both include a syntax aware CCL editor plus testing tools including a stream viewer, record/playback and manual input tools. The Eclipse plugin also includes a visual drag-and-drop style model builder.
Try it Out
If you’re interested in taking it for a test spin, the easiest way to get started is to follow this hands-on tutorial that takes you through the steps of building a simple IoT project to monitor sensor data from freezer units.
More Information
There’s lots of material available to help you put streaming analytics to use:
- Visit the SAP HANA, express edition developer page
- Getting started tutorial: install SDS on your HXE server and build your first streaming project
- Taking it to the next level: a second tutorial series that builds from the getting started tutorial, adds more streaming analytics to your project and shows you how to publish input events via the REST interface
- Visit the SDS Developer Center for more tutorials and other resources
- The SDS Playlist in the SAP HANA Academy
- The full SDS documentation including a developers guide, reference guide and guide to adapters
Hi Jeff ,
Thanks for the article. Streaming as part of HANA Express works very well and opens up so many different solutions to try out. Great effort from all those involved to make this happen.
Isn't SDS Lite part of the HANA Express package too? May be asking for more, but without that SDS cannot be connected to a sensor real-time using SAP Solutions...
Cheers...Elango
It's not correct that without Steaming Lite, SDS can't be connected to a sensor in real-time. In fact Streaming Lite doesn't change or expand the connectivity options. It's really just a deployment option. As such, you really don't need it for development - though it would be useful for prototyping/demonstrating a distributed deployment. Streaming Lite isn't included simple because we wanted to keep the HANA Express package as simple as possible. We could consider adding it if we thought the benefit would outweigh the additional package complexity.
The key thing is that you connect to Streaming Lite the same way you would connect to SDS. In fact, there are more ways to connect to SDS directly. So to connect a sensor to SDS, you have several choices:
To post the sensor reading to streaming lite, you would use #2 above.
Hi Jeff
Thanks for shedding light on the streaming lite. I was also wondering the same thing as Elangovan. I would like to connect a sensor on Raspberry pi 3 model B to SDS on HXE. I am a bit confused on the sensor part, can you also throw some light on the kind of adapter I should use on the SDS on HXE and also the configurations to be done at the hardware part.
2 parts to this answer:
First off, our Streaming Lite tutorials are listed in the Streaming Lite section on this page: https://blogs.sap.com/2016/03/11/table-of-contents/ Migrating these tutorials to the new Github based format is still a work in progress.
Now, in your case, since you are working with HANA Express, you don't have Streaming LIte available so you would need to post the events from the Pi directly back to the HANA Streaming Analytics server running on your HXE instance. You have several options for that but one option would be to post the events via either REST or Websockets. It is fairly common for sensors on a Pi to be read using a Python app and Python in turn provides API's for both REST and Websocket connections. For instructions on how to publish into a streaming project using REST you can refer to these 2 tutorials:
Hi Robert
Thanks for the insight. I shall check the same.