For everyone using SAP HANA, express edition (HXE), the great news with the latest update to HXE 2.0 SPS01) is that you can now include real-time streaming analytics in your HANA applications.

SAP HANA smart data streaming (SDS) is HANA’s high speed real-time streaming analytics engine.  It lets you easily build and deploy streaming data models (a.k.a. projects) that process and analyze incoming messages as fast as they arrive, allowing you to react in real-time to what’s going on. Use it to generate alerts or initiate an immediate response. With the Internet of Things (IoT), common uses include analyzing incoming sensor data from smart devices – particularly when there is a need to react in real-time to a new opportunity – or in anticipation of a problem.

Use Cases

Streaming Analytics can be applied in a wide variety of use cases – wherever there is fast moving data and value from understanding and acting on it as soon as things happen. Common use cases include:

  • Predictive maintenance, predictive quality: detect indications of impending failure in take to take preventative action
  • Marketing: customized offers in real-time, reacting to customer activity
  • Fraud/threat detection/prevention: detect and flag patterns of events that indicate possible fraud or an active threat
  • Location monitoring: detect when equipment/assets are not where they are supposed to be

Streaming data models

Streaming data models define the operations to apply to incoming messages and are contained in streaming projects that run on the SDS server. These models are defined in a SQL-like language that we call CCL (continuous computation language) – it’s really just SQL with some extensions for processing live streams. The big difference though, is that this SQL doesn’t execute against the HANA database, but gets compiled into a set of “continuous queries” that run in the SDS dataflow engine.

Here’s a simple example of a streaming data model that smooths out some sensor data by computing a five minute moving average:

CREATE INPUT STREAM DeviceIn
SCHEMA (Id string, Value integer);

CREATE OUTPUT WINDOW MvAvg
PRIMARY KEY DEDUCED
AS SELECT
   DeviceIn.Id AS Id ,
   avg(DeviceIn.Value) AS AvgValue
FROM DeviceIn KEEP 5 MINUTES
GROUP BY DeviceIn.Id ;

You can see that it looks pretty much like standard SQL, except that instead of creating Tables we are creating streams and windows. With windows, we can define a retention policy – in this example KEEP 5 MINUTES.  And with a moving average we’re just getting started. Filtering events is as simple as a WHERE clause. You can join events streams to HANA tables to combine live events with reference data or historical data. You can also join events to each other. You can match/correlate events, watch for patterns or trends. Anyway – you get the idea.

Capturing streaming data in the HANA database

Any of the data can be captured in the HANA database – and by capturing derived data, rather than raw data – you can reduce the amount of data being captured. You can sample the data or only store data when it changes.

If I wanted to store my moving average from the example above in a HANA table called MV_AVG, I would simply attach a HANA output adapter to the window above by adding this statement to my project:

ATTACH OUTPUT ADAPTER HANA_Output1
TYPE hana_out TO MvAvg
PROPERTIES
   service = 'hdb1',
   sourceSchema = 'MY_SCHEMA',
   table = 'MV_AVG';

 

Connecting to data sources

SDS includes an integrated web service that can expose a REST interface for all input streams. High frequency publishers can use WebSockets for greater efficiency.

SDS also includes a range of pre-built adapters including Kafka, JMS, file loaders and others. An adapter toolkit (Java) makes it easy to build custom adapters.

Using Real-time output

In addition to the ability to capture output from streaming projects in HANA database tables, real-time output can also be streamed to applications, dashboards, published onto a Kafka for JMS message queue, sent as email or stored in Hadoop.

Machine Learning for Predictive Analytics on Streams

SDS includes two machine learning algorithms – a decision tree algorithm and a clustering algorithm – as well as the ability to import decision tree models built using the PAL algorithms in HANA. These are particularly useful for predictive use cases, enabling you to take action based on leading indicators or detecting unusual situations.

High speed, Scalable

The SDS dataflow engine is designed to be highly scalable with support for both scale-up and scale-out, proving the ability to process millions of messages per second (with sufficient CPU capacity) and delivering results within milliseconds of message arrival.

Design time tools

Design time tools for building and testing streaming projects are available as a plugin for Eclipse and are also available in SAP Web IDE for SAP HANA. Both include a syntax aware CCL editor plus testing tools including a stream viewer, record/playback and manual input tools. The Eclipse plugin also includes a visual drag-and-drop style model builder.

Try it Out

If you’re interested in taking it for a test spin, the easiest way to get started is to follow this hands-on tutorial that takes you through the steps of building a simple IoT project to monitor sensor data from freezer units.

More Information

There’s lots of material available to help you put streaming analytics to use:

To report this post you need to login first.

2 Comments

You must be Logged on to comment or reply to a post.

  1. Elangovan Subbiah

    Hi Jeff ,

    Thanks for the article. Streaming as part of HANA Express works very well and opens up so many different solutions to try out. Great effort from all those involved to make this happen.

    Isn’t SDS Lite part of the HANA Express package too? May be asking for more, but without that SDS cannot be connected to a sensor real-time using SAP Solutions…

    Cheers…Elango

    (0) 
    1. Jeff Wootton Post author

      It’s not correct that without Steaming Lite, SDS can’t be connected to a sensor in real-time.  In fact Streaming Lite doesn’t change or expand the connectivity options.  It’s really just a deployment option. As such, you really don’t need it for development – though it would be useful for prototyping/demonstrating a distributed deployment. Streaming Lite isn’t included simple because we wanted to keep the HANA Express package as  simple as possible.  We could consider adding it if we thought the benefit would outweigh the additional package complexity.

      The key thing is that you connect to Streaming Lite the same way you would connect to SDS.  In fact,  there are more ways to connect to SDS directly. So to connect a sensor to SDS, you have several choices:

      1. post the sensor reading to SDS via http/REST (SDS SWS component)
      2. write the sensor reading to SDS via SDS SDK (Java, C++ or .NET)
      3. post the sensor reading on a Kafka topic – and use SDS Kafka adapter to receive it

      To post the sensor reading to streaming lite, you would use #2 above.

       

      (0) 

Leave a Reply