Applying Machine Learning to Real Time Streaming Analytics – SAP TechEd Lecture of the Week
The combination of machine learning capabilities with streaming analytics provides really rich capabilities for not only generating predictions but even more importantly to act on the predictions.
Machine learning is about letting the software figure things out on its own. For example, the Denstream Clustering algorithm lets you feed in a stream of data and find out *if* there are any related clusters – without having to know ahead of time. More importantly it identifies the outliers for you, or to put it another way – the clustering algorithm figures out groups of “normal” behaviors and flags the “weird” one’s for you to react to. Even more importantly it adapts over time by aging out older values and giving more weight to recent events – the algorithm recognizes the “new normal” long before us humans ever could. That kind of automated learning and adaptability opens up the potential to make predictions not just based on historical data but on the “right here, right now!” data.
By implementing the machine learning capabilities in the streaming analytics engine, we create the ability to generate and use predictions in real time. Do you want to predict a machine failure based on temperature and vibration? Throw a Denstream Cluster on each of the temperature and vibration sensor streams and watch for the outliers. When the readings from either sensor show up as outliers you know the machine isn’t acting normally any more – that’s a prediction of “bad things to come”. When readings from both sensors streams show outliers you know something really isn’t normal. That information lets you act. Acting could be as simple as an alert, but that’s old school. New school is to solve the problem before it happens. Temperature going up? Throttle back the machine, slow it down, let it cool off. Is the temperature back under control? Throttle up again, speed up production, optimize throughput. Vibration going up but within bounds? Schedule the maintenance. Temperature and vibration going up? Shut it down. All of that can be done in milliseconds.
You see, predicting is only half the problem. The other half is making that prediction with enough time to act on it. The closer to the originating event that you can make the prediction, the more time you have to act. By applying machine learning to streaming analytics you can make predictions sooner and act on them before you even reach the database. Or in other words (yeah I’m biased) – predictive needs streaming.
Machine learning with HANA smart data streaming isn’t limited to just 1 algorithm. In addition to the Denstream Clustering algorithm, we also provide an Adaptive Hoeffding Tree decision tree algorithm natively within SDS. To extend the range of algorithms available, Business Objects Predictive Analytics also provides the capability to generate CCL code from their Automated Analyst tool. CCL or Continuous Computation Language is the SQL derived scripting language used to build streaming projects. CCL itself is a rich scripting language that can be readily used to implement other machine learning functions such as statistical Control Charts directly within SDS. Finally, given our HANA integration it is possible to load the scoring model output of a PAL function into a streaming project and execute the scoring there.
The replay video from my TechEd lecture on Applying Machine Learning to Real Time Streaming Analytics goes into a lot more detail on how the machine learning functionality within HANA Smart Data Streaming and the integration between SAP BusinessObjects Predictive Analytics and SDS works.