Basket Analysis with SAP Predictive Analysis and SAP HANA – Part 1
Market Basket Analysis or Basket Analysis for short is one of those things that people have talked about forever and if you search online, you’ll find examples from the 80s and the early 90s. Things to note Basket Analysis should really be classed as Advanced Analytics, as on it is own it’s not really Predictive Analytics, as you are not predicting anything it more data mining, but all 3 terms are often used interchangeably. What a Predictive tool or the Predictive Algorithm gives you is easy access to the ability to perform this type of analysis without having to be a SQL Guru, Propeller Head or even a Data Scientist.
Basket Analysis is something that has been possible for a long time, but the ability to do this quickly and easily on a large set of data is where many hit problems. With this in mind I wanted to share a couple of days work that I have done recently to improve this experience.
With SAP HANA 1.0 SPS9 we now have 4 algorithms that can be focused on Association Analysis or Market Basket Analysis as it is more often known as.
- Apriori Lite
I’m not going to get into the technical details around the differences between these, the HANA PAL Documentation provides more details of each. I plan to write a follow up blog that will aim to explore any differences between these.
This blogpost was created using SAP Predictive Analytics v1.21, with SAP HANA 1.0 Revision 85. I have performed this on an 80+ million record data set, some may call this Big Data, others may not. The data required for this task is straightforward, you just need 2 columns, the item or items purchased and the transaction number.
Typically with SAP HANA you may build an Analytical View or Calculation View to invoke the HANA OLAP engine, however OLAP is typically about Aggregating data and being able to analyse this by slicing, dicing and drilling into this data set. For Basket Analysis the opposite is true you want to feed in the base transactions so that you can see exactly what people purchased in a single basket.
This can be fed into SAP Predictive Analysis (PA), to use the in-built HANA Predictive libraries you need to “Connect to SAP HANA”
Select your source data
Configure the node parameters, are fairly self explanatory. Depending upon the basket size in your data-set you may need to set the support quite low.
Save the output result rules to a new HANA table so we can easily analyse the rules generated.
You can run the analysis and receive the output in just a couple of minutes.
Switching to the results view, we receive some pre-built analysis showing the rules that have been generated
Predictive Analysis Association Chart – Tag Cloud
The output is fairly readable, but you can further analyse the rules generated to gain further insights as to what your data is telling you.
Some follow up blogs on related topics