Skip to Content
Personal Insights
Author's profile photo Ian Henry

Basket Analysis with SAP Predictive Analysis and SAP HANA – Part 1


Market Basket Analysis or Basket Analysis for short is one of those things that people have talked about forever and if you search online, you’ll find examples from the 80s and the early 90s.  Things to note Basket Analysis should really be classed as Advanced Analytics, as on it is own it’s not really Predictive Analytics, as you are not predicting anything it more data mining, but all 3 terms are often used interchangeably.  What a Predictive tool or the Predictive Algorithm gives you is easy access to the ability to perform this type of analysis without having to be a SQL Guru, Propeller Head or even a Data Scientist.

Basket Analysis is something that has been possible for a long time, but the ability to do this quickly and easily on a large set of data is where many hit problems. With this in mind I wanted to share a couple of days work that I have done recently to improve this experience.

HANA Algorithms

With SAP HANA 1.0 SPS9 we now have 4 algorithms that can be focused on Association Analysis or Market Basket Analysis as it is more often known as.

  • Apriori
  • Apriori Lite
  • FP-Growth
  • KORD

I’m not going to get into the technical details around the differences between these, the HANA PAL Documentation provides more details of each.  I plan to write a follow up blog that will aim to explore any differences between these.

This blogpost was created using SAP Predictive Analytics v1.21, with SAP HANA 1.0 Revision 85. I have performed this on an 80+ million record data set, some may call this Big Data, others may not.  The data required for this task is straightforward, you just need 2 columns, the item or items purchased and the transaction number.
Source Data.png

Typically with SAP HANA you may build an Analytical View or Calculation View to invoke the HANA OLAP engine, however OLAP is typically about Aggregating data and being able to analyse this by slicing, dicing and drilling into this data set.  For Basket Analysis the opposite is true you want to feed in the base transactions so that you can see exactly what people purchased in a single basket.

This can be fed into SAP Predictive Analysis (PA), to use the in-built HANA Predictive libraries you need to “Connect to SAP HANA”

Connect to HANA Full.png


Select your source data

HANA Source Data.png


Drag in the HANA Apriori node
Apriori Node in PA.png


Configure the node parameters, are fairly self explanatory.  Depending upon the basket size in your data-set you may need to set the support quite low.

Actual Apriori Parameters.png


Save the output result rules to a new HANA table so we can easily analyse the rules generated.

HANA Writer Node.png


You can run the analysis and receive the output in just a couple of minutes.

Execution Status.png


Switching to the results view, we receive some pre-built analysis showing the rules that have been generated

PA Results v2.png

Predictive Analysis Association Chart – Tag Cloud

PA Tag Cloud v2.png


The output is fairly readable, but you can further analyse the rules generated to gain further insights as to what your data is telling you.

Some follow up blogs on related topics

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Patrick Bachmann
      Patrick Bachmann

      Propeller head?  Lol.  I like it.

      Author's profile photo Henry Banks
      Henry Banks

      Yes, i like to munch a packet of crisps after a shandy, an ale and a lager 🙂

      Author's profile photo Former Member
      Former Member

      Hi Henry,

      Both the blogs (Part 1 and Part 2) are good and interesting. I want to try R-Apriori in SAP PA, as of now I do not yet have access to HANA PAL. Could you please suggest how I can save the result to re-use it for advanced prediction -like in your part 2? Especially the table?



      Small set.png

      Author's profile photo Former Member
      Former Member

      Hey Sudeepti

      The one way out could be, in the predict tab after you attached your algorithm to your dataset and run it successfully. Attach another component called CSV writer ( Under Data Writers) and configure it. And then run the model. A CSV file will be generated which can be used, modified as per the requirement and can be consumed in PA/Lumira.

      Author's profile photo Former Member
      Former Member

      Hi Ranajay,

      Thank you!

      I am able to do that.



      Author's profile photo Former Member
      Former Member

      Glad, I could help!!

      Author's profile photo Former Member
      Former Member

      Hi Ranajay, Henry,

      I am able to use Expert Analytics for MBA(Market Basket Analysis) on 2 column dataset,i.e. 1 column with transaction IDs and another one with items. Can we also apply Apriori on dataset that is in transactions format? I tried using 'Tabular format' option but it is not working. No rules are generated.



      Author's profile photo Former Member
      Former Member

      Hi Sudeepti

      That depends on data. you need analyse the data first. As in you need to check the columns which you are selecting inside the configuration of that algorithm, do they really make some sense. Are there any patterns there. Apriori is beneficial when you have some more correlated dimensions to identify hidden association..

      Author's profile photo Former Member
      Former Member

      Hi Ranajay,

      This is the data set am talking about. I have taken this from arules package in R. I am not able to apply Apriori directly on this. I a msure  Imight be missing something in PA. Please help out.


      Sudeepti  Groceries data set.png

      Author's profile photo Ian Henry
      Ian Henry
      Blog Post Author

      Hi Sudeepti,

      It sounds like you are doing the correct thing with the "Tabular Format" Input Data Format. Is there an error or just no rules being generated?

      If there are no rules try reducing the support and confidence values.

      Try 0.1 or even 0.01 for both if the combinations are not that common.

      Author's profile photo Erik MARCADE
      Erik MARCADE

      And by the way, perhaps you should use Recommendaiton engine from automated PA 2.2. This is the technique that got third in the million song data set challenges, much faster than a-priori, easier to use, and with a the possibility to do in-database apply.

      Author's profile photo Former Member
      Former Member

      Sure, already tried with 2 column data set, it is pretty good. WIll try for transactions data set and get back. 🙂