Market Basket Analysis or Basket Analysis for short is one of those things that people have talked about forever and if you search online, you’ll find examples from the 80s and the early 90s. Things to note Basket Analysis should really be classed as Advanced Analytics, as on it is own it’s not really Predictive Analytics, as you are not predicting anything it more data mining, but all 3 terms are often used interchangeably. What a Predictive tool or the Predictive Algorithm gives you is easy access to the ability to perform this type of analysis without having to be a SQL Guru, Propeller Head or even a Data Scientist.
Basket Analysis is something that has been possible for a long time, but the ability to do this quickly and easily on a large set of data is where many hit problems. With this in mind I wanted to share a couple of days work that I have done recently to improve this experience.
With SAP HANA SP9 we now have 4 algorithms that can be focused on Association Analysis or Market Basket Analysis as it is more often known as.
- Apriori Lite
I’m not going to get into the technical details around the differences between these, the HANA PAL Documentation provides more details of each. I plan to write a follow up blog that will aim to explore any differences between these.
What I have done is used SAP Predictive Analytics v1.21, with SAP HANA Revision 85. I have performed this on an 80+ million record data set, some may call this Big Data, others may not. The data required for this task is straightforward, you just need 2 columns, the item or items purchased and the transaction number.
Normally with SAP HANA you may build an Analytical View to invoke the HANA OLAP engine, however OLAP is typically about Aggregating data and being able to analyse this by slicing, dicing and drilling into this data set. For Basket Analysis the opposite is true you want to feed in the base transactions so that you can see exactly what people purchased in a single basket.
You can then feed this into SAP Predictive Analysis (PA), if you want to use the in-built HANA Predictive libraries you need to “Connect to SAP HANA”
Select your source data
Configure the node parameters, which are fairly self explanatory. Depending upon the basket size in your data-set you may need to set the support quite low.
I have then chosen to output the resulting rules to a new HANA table so that we can easily analyse the rules generated.
You can then run the analysis and receive the output in just a couple of minutes.
If you switch to the results view you then receive some pre-built analysis showing you the rules that have been generated
Predictive Analysis Association Chart – Tag Cloud
As you can see you the output is fairly readable, but you still may want to further analyse the rules generated to understand what this is telling you.
I have now written some follow up blogs, Part 2 Visualisation of Results and another looking at the The SAP HANA Effect. Here I have looked how to Enhancing Market Basket Analysis with PA 2.0 and SAP HANA. In the future I also plan to explore the other HANA algorithms and compare their results and how they perform.