Streaming + Predictive: immediate response based on predictive analytics
A lot of the use cases for HANA streaming involve initiating an immediate response to incoming event data. But the challenge is often: what’s the best response? Or what should you watch for?
This is where the power of the HANA Predictive Analysis Library (PAL) comes in.
PAL offers data mining and machine learning functions that can be called from within HANA SQLScript procedures to perform analytic algorithms. What people may not realize is that HANA smart data streaming and PAL can be used together in a powerful combination for real-time predictive response to event streams.
While a lot of people focus on HANA streaming as a way of capturing streaming event data in the HANA database, what’s often overlooked is the ability of HANA streaming to query the HANA database and even to run stored-procedures on HANA. This is what allows HANA streaming to leverage the power of PAL. We recently worked with the PAL team to run an internal Proof of Concept to show how this can be done.
Let me take you through it…
In this scenario, we have a retail online shopping site and we want to predict which products are most likely to appeal to the customer based on what they have just looked at.
The predictive analytics run in two stages. First is the Training stage where historical data is analyzed to map products to interests. To keep the model simple, user click-through data are used. The idea is that users’ click history presents their interests combination. Using Latent Dirichlet Allocation (LDA) algorithm, an item (product)-interest matrix could be learned. For example, a product “rose” has 30% probability for interest “birthday celebration” and 70% probability for “wedding anniversary celebration” given that only two interested are allowed.
Then, for our real-time response, we feed a click-stream from the website into HANA streaming. HANA streaming filters out all the uninteresting clicks, only watching for clicks on products. When a product click is received, the data model (project) running on the HANA streaming server calls the PAL function to get the item (or items) with the strongest probability of being related to the same interest(s) as the item(s) just viewed. For online prediction, current interests T could be inferred based on click streams using LDA. Then other items with high probability belongs to those interests will be recommended.
where In stands for the nth item belong to the specific interest
Now this scenario could easily be extended to other scenarios. Just a few examples include:
- Predicting failure of a piece of equipment and estimating the severity/urgency
- Predicting that demand will soon very soon exceed capacity by looking at correlation of drivers of demand (inventory, power, bandwidth,…)
- Identifying patterns of transactions that indicate likely fraud and then producing instant alerts when they are detected
If this is something you think can add value in an application you – or one of your customers or partners – are working on or planning, let me know. We’d be happy to provide more information or help you explore it further.