In this session John MacGregor had several topics first he described the landscape for predictive analysis, then we had a better look at predictive analysis itself. After looking we saw a live demo with k-mean analysis and decision tree analysis. Went a step back in the agenda for customer benefit stories before the wrap-up.
We were seated in the third floor. Even before John started it was evident that predictive analysis is getting more attention as the room was overcrowded and people had to stand or sit on the floor.
In BI we first reported on what happened. Now with real-time that moved to what is happening now. But with predictive analysis we are moving to what will happen (or what might happen, as it is still a prediction and not an inevitable reality). Further questions that arise are why did it happen? What is the risk if something does or does not happen and finally how can we make decisions so that we have the best chance that good things will happen and bad things are avoided?
John had a nice quote
“Management is typically only interested in the future, not the past”
The reason users are more interested in predictive analysis that first they often now what happened but now want to know more, the why and what will happen next? Secondly the amount of data grows exponentially with many data varieties and third there is now technology performance that can perform these predictive tasks.
You can do Predictive analysis in a software tool or directly in HANA by embedding the code into the SQL script. Predictive analysis has a Predictive Analysis Library (PAL) and you can use R open source environment. For the latter you have to install the R software. With HANA HANA will send data via TCP/IP to the R server. The R server will do the statistics and send the result back to HANA.
If you want you are also able to add your own algorithms so you can work with your self defined algorithms. The the following image you see the currently supported algorithms in PAL
The software tool itself is very similar to Lumira. You can connect to a datasource and import data (or work with HANA live where you will be working on HANA itself) The data will show itself as if you were working in lumira. The second step is connecting a statistical algorithm to the data.
You do this by dragging an algorithm from the menu. This can be a native PAL algorithm or an R algorithm. In this menu you don’t have to know the syntax of R because an input screen will ask you to input parameters. After finishing this you can run the analysis and view the results in a visualized way.
If you are satisfied with the model you can save this model to HANA and use it there for further data mining. For example if you have grouped your customers based on several keyfigures into groups you can use this model to automatically group all customers based on their behavior in hana using the same model.
The main take away John MacGregor gave us was to just download the tool and give it a try as this will better show what this tool can do than anything else. In this and his other presentation he gave a lot of references for information about installation and usage :
The R manual: http://cran.r-project.org/doc/manuals/R-intro.html
One of the best for beginners is www.statmethods.net/