For everyone using SAP HANA, express edition (HXE), the great news with the latest update to HXE 2.0 SPS01) is that you can now include real-time streaming analytics in your HANA applications.
SAP HANA smart data streaming (SDS) is HANA’s high speed real-time streaming analytics engine. It lets you easily build and deploy streaming data models (a.k.a. projects) that process and analyze incoming messages as fast as they arrive, allowing you to react in real-time to what’s going on. Use it to generate alerts or initiate an immediate response. With the Internet of Things (IoT), common uses include analyzing incoming sensor data from smart devices – particularly when there is a need to react in real-time to a new opportunity – or in anticipation of a problem.
Streaming Analytics can be applied in a wide variety of use cases – wherever there is fast moving data and value from understanding and acting on it as soon as things happen. Common use cases include:
- Predictive maintenance, predictive quality: detect indications of impending failure in take to take preventative action
- Marketing: customized offers in real-time, reacting to customer activity
- Fraud/threat detection/prevention: detect and flag patterns of events that indicate possible fraud or an active threat
- Location monitoring: detect when equipment/assets are not where they are supposed to be
Streaming data models
Streaming data models define the operations to apply to incoming messages and are contained in streaming projects that run on the SDS server. These models are defined in a SQL-like language that we call CCL (continuous computation language) – it’s really just SQL with some extensions for processing live streams. The big difference though, is that this SQL doesn’t execute against the HANA database, but gets compiled into a set of “continuous queries” that run in the SDS dataflow engine.
Here’s a simple example of a streaming data model that smooths out some sensor data by computing a five minute moving average:
CREATE INPUT STREAM DeviceIn SCHEMA (Id string, Value integer); CREATE OUTPUT WINDOW MvAvg PRIMARY KEY DEDUCED AS SELECT DeviceIn.Id AS Id , avg(DeviceIn.Value) AS AvgValue FROM DeviceIn KEEP 5 MINUTES GROUP BY DeviceIn.Id ;
You can see that it looks pretty much like standard SQL, except that instead of creating Tables we are creating streams and windows. With windows, we can define a retention policy – in this example KEEP 5 MINUTES. And with a moving average we’re just getting started. Filtering events is as simple as a WHERE clause. You can join events streams to HANA tables to combine live events with reference data or historical data. You can also join events to each other. You can match/correlate events, watch for patterns or trends. Anyway – you get the idea.
Capturing streaming data in the HANA database
Any of the data can be captured in the HANA database – and by capturing derived data, rather than raw data – you can reduce the amount of data being captured. You can sample the data or only store data when it changes.
If I wanted to store my moving average from the example above in a HANA table called MV_AVG, I would simply attach a HANA output adapter to the window above by adding this statement to my project:
ATTACH OUTPUT ADAPTER HANA_Output1 TYPE hana_out TO MvAvg PROPERTIES service = 'hdb1', sourceSchema = 'MY_SCHEMA', table = 'MV_AVG';
Connecting to data sources
SDS includes an integrated web service that can expose a REST interface for all input streams. High frequency publishers can use WebSockets for greater efficiency.
SDS also includes a range of pre-built adapters including Kafka, JMS, file loaders and others. An adapter toolkit (Java) makes it easy to build custom adapters.
Using Real-time output
In addition to the ability to capture output from streaming projects in HANA database tables, real-time output can also be streamed to applications, dashboards, published onto a Kafka for JMS message queue, sent as email or stored in Hadoop.
Machine Learning for Predictive Analytics on Streams
SDS includes two machine learning algorithms – a decision tree algorithm and a clustering algorithm – as well as the ability to import decision tree models built using the PAL algorithms in HANA. These are particularly useful for predictive use cases, enabling you to take action based on leading indicators or detecting unusual situations.
High speed, Scalable
The SDS dataflow engine is designed to be highly scalable with support for both scale-up and scale-out, proving the ability to process millions of messages per second (with sufficient CPU capacity) and delivering results within milliseconds of message arrival.
Design time tools
Design time tools for building and testing streaming projects are available as a plugin for Eclipse and are also available in SAP Web IDE for SAP HANA. Both include a syntax aware CCL editor plus testing tools including a stream viewer, record/playback and manual input tools. The Eclipse plugin also includes a visual drag-and-drop style model builder.
Try it Out
If you’re interested in taking it for a test spin, the easiest way to get started is to follow this hands-on tutorial that takes you through the steps of building a simple IoT project to monitor sensor data from freezer units.
There’s lots of material available to help you put streaming analytics to use:
- Visit the SAP HANA, express edition developer page
- Getting started tutorial: install SDS on your HXE server and build your first streaming project
- Taking it to the next level: a second tutorial series that builds from the getting started tutorial, adds more streaming analytics to your project and shows you how to publish input events via the REST interface
- Visit the SDS Developer Center for more tutorials and other resources
- The SDS Playlist in the SAP HANA Academy
- The full SDS documentation including a developers guide, reference guide and guide to adapters