Personal Insights
Basket Analysis with SAP Predictive Analysis and SAP HANA – Part 1
Introduction
Market Basket Analysis or Basket Analysis for short is one of those things that people have talked about forever and if you search online, you’ll find examples from the 80s and the early 90s. Things to note Basket Analysis should really be classed as Advanced Analytics, as on it is own it’s not really Predictive Analytics, as you are not predicting anything it more data mining, but all 3 terms are often used interchangeably. What a Predictive tool or the Predictive Algorithm gives you is easy access to the ability to perform this type of analysis without having to be a SQL Guru, Propeller Head or even a Data Scientist.
Basket Analysis is something that has been possible for a long time, but the ability to do this quickly and easily on a large set of data is where many hit problems. With this in mind I wanted to share a couple of days work that I have done recently to improve this experience.
HANA Algorithms
With SAP HANA 1.0 SPS9 we now have 4 algorithms that can be focused on Association Analysis or Market Basket Analysis as it is more often known as.
- Apriori
- Apriori Lite
- FP-Growth
- KORD
I’m not going to get into the technical details around the differences between these, the HANA PAL Documentation provides more details of each. I plan to write a follow up blog that will aim to explore any differences between these.
This blogpost was created using SAP Predictive Analytics v1.21, with SAP HANA 1.0 Revision 85. I have performed this on an 80+ million record data set, some may call this Big Data, others may not. The data required for this task is straightforward, you just need 2 columns, the item or items purchased and the transaction number.
Typically with SAP HANA you may build an Analytical View or Calculation View to invoke the HANA OLAP engine, however OLAP is typically about Aggregating data and being able to analyse this by slicing, dicing and drilling into this data set. For Basket Analysis the opposite is true you want to feed in the base transactions so that you can see exactly what people purchased in a single basket.
This can be fed into SAP Predictive Analysis (PA), to use the in-built HANA Predictive libraries you need to “Connect to SAP HANA”
Select your source data
Configure the node parameters, are fairly self explanatory. Depending upon the basket size in your data-set you may need to set the support quite low.
Save the output result rules to a new HANA table so we can easily analyse the rules generated.
You can run the analysis and receive the output in just a couple of minutes.
Switching to the results view, we receive some pre-built analysis showing the rules that have been generated
Predictive Analysis Association Chart – Tag Cloud
The output is fairly readable, but you can further analyse the rules generated to gain further insights as to what your data is telling you.
Some follow up blogs on related topics
Propeller head? Lol. I like it.
Yes, i like to munch a packet of crisps after a shandy, an ale and a lager 🙂
Hi Henry,
Both the blogs (Part 1 and Part 2) are good and interesting. I want to try R-Apriori in SAP PA, as of now I do not yet have access to HANA PAL. Could you please suggest how I can save the result to re-use it for advanced prediction -like in your part 2? Especially the table?
Regards
Sudeepti
Hey Sudeepti
The one way out could be, in the predict tab after you attached your algorithm to your dataset and run it successfully. Attach another component called CSV writer ( Under Data Writers) and configure it. And then run the model. A CSV file will be generated which can be used, modified as per the requirement and can be consumed in PA/Lumira.
Hi Ranajay,
Thank you!
I am able to do that.
Regards
Sudeepti
Glad, I could help!!
Hi Ranajay, Henry,
I am able to use Expert Analytics for MBA(Market Basket Analysis) on 2 column dataset,i.e. 1 column with transaction IDs and another one with items. Can we also apply Apriori on dataset that is in transactions format? I tried using 'Tabular format' option but it is not working. No rules are generated.
Regards
Sudeepti
Hi Sudeepti
That depends on data. you need analyse the data first. As in you need to check the columns which you are selecting inside the configuration of that algorithm, do they really make some sense. Are there any patterns there. Apriori is beneficial when you have some more correlated dimensions to identify hidden association..
Hi Ranajay,
This is the data set am talking about. I have taken this from arules package in R. I am not able to apply Apriori directly on this. I a msure Imight be missing something in PA. Please help out.
Regards
Sudeepti
Hi Sudeepti,
It sounds like you are doing the correct thing with the "Tabular Format" Input Data Format. Is there an error or just no rules being generated?
If there are no rules try reducing the support and confidence values.
Try 0.1 or even 0.01 for both if the combinations are not that common.
And by the way, perhaps you should use Recommendaiton engine from automated PA 2.2. This is the technique that got third in the million song data set challenges, much faster than a-priori, easier to use, and with a the possibility to do in-database apply.
Sure, already tried with 2 column data set, it is pretty good. WIll try for transactions data set and get back. 🙂