Skip to Content

Introduction

Market Basket Analysis or Basket Analysis for short is one of those things that people have talked about forever and if you search online, you’ll find examples from the 80s and the early 90s.  Things to note Basket Analysis should really be classed as Advanced Analytics, as on it is own it’s not really Predictive Analytics, as you are not predicting anything it more data mining, but all 3 terms are often used interchangeably.  What a Predictive tool or the Predictive Algorithm gives you is easy access to the ability to perform this type of analysis without having to be a SQL Guru, Propeller Head or even a Data Scientist.

Basket Analysis is something that has been possible for a long time, but the ability to do this quickly and easily on a large set of data is where many hit problems. With this in mind I wanted to share a couple of days work that I have done recently to improve this experience.


HANA Algorithms
With SAP HANA SP9 we now have 4 algorithms that can be focused on
Association Analysis or Market Basket Analysis as it is more often known as.

  • Apriori
  • Apriori Lite
  • FP-Growth
  • KORD

I’m not going to get into the technical details around the differences between these, the HANA PAL Documentation provides more details of each.  I plan to write a follow up blog that will aim to explore any differences between these.

What I have done is used SAP Predictive Analytics v1.21, with SAP HANA Revision 85. I have performed this on an 80+ million record data set, some may call this Big Data, others may not.  The data required for this task is straightforward, you just need 2 columns, the item or items purchased and the transaction number.
Source Data.png

Normally with SAP HANA you may build an Analytical View to invoke the HANA OLAP engine, however OLAP is typically about Aggregating data and being able to analyse this by slicing, dicing and drilling into this data set.  For Basket Analysis the opposite is true you want to feed in the base transactions so that you can see exactly what people purchased in a single basket.

You can then feed this into SAP Predictive Analysis (PA), if you want to use the in-built HANA Predictive libraries you need to “Connect to SAP HANA”

Connect to HANA Full.png


Select your source data

HANA Source Data.png


Drag in the HANA Apriori node
Apriori Node in PA.png


Configure the node parameters, which are fairly self explanatory.  Depending upon the basket size in your data-set you may need to set the support quite low.

Actual Apriori Parameters.png


I have then chosen to output the resulting rules to a new HANA table so that we can easily analyse the rules generated.

HANA Writer Node.png


You can then run the analysis and receive the output in just a couple of minutes.

Execution Status.png


If you switch to the results view you then receive some pre-built analysis showing you the rules that have been generated

PA Results v2.png

Predictive Analysis Association Chart – Tag Cloud

PA Tag Cloud v2.png


As you can see you the output is fairly readable, but you still may want to further analyse the rules generated to understand what this is telling you.


I have now written some follow up blogs, Part 2 Visualisation of Results and another looking at the The SAP HANA Effect. Here I have looked how to Enhancing Market Basket Analysis with PA 2.0 and SAP HANA. In the future I also plan to explore the other HANA algorithms and compare their results and how they perform.

To report this post you need to login first.

12 Comments

You must be Logged on to comment or reply to a post.

  1. Sudeepti Bandi

    Hi Henry,

    Both the blogs (Part 1 and Part 2) are good and interesting. I want to try R-Apriori in SAP PA, as of now I do not yet have access to HANA PAL. Could you please suggest how I can save the result to re-use it for advanced prediction -like in your part 2? Especially the table?

    Regards

    Sudeepti

    Small set.png

    (0) 
    1. Ranajay Sit

      Hey Sudeepti

      The one way out could be, in the predict tab after you attached your algorithm to your dataset and run it successfully. Attach another component called CSV writer ( Under Data Writers) and configure it. And then run the model. A CSV file will be generated which can be used, modified as per the requirement and can be consumed in PA/Lumira.

      (0) 
      1. Sudeepti Bandi

        Hi Ranajay, Henry,

        I am able to use Expert Analytics for MBA(Market Basket Analysis) on 2 column dataset,i.e. 1 column with transaction IDs and another one with items. Can we also apply Apriori on dataset that is in transactions format? I tried using ‘Tabular format’ option but it is not working. No rules are generated.

        Regards

        Sudeepti

        (0) 
        1. Ranajay Sit

          Hi Sudeepti

          That depends on data. you need analyse the data first. As in you need to check the columns which you are selecting inside the configuration of that algorithm, do they really make some sense. Are there any patterns there. Apriori is beneficial when you have some more correlated dimensions to identify hidden association..

          (0) 
          1. Sudeepti Bandi

            Hi Ranajay,

            This is the data set am talking about. I have taken this from arules package in R. I am not able to apply Apriori directly on this. I a msure  Imight be missing something in PA. Please help out.

            Regards

            Sudeepti  Groceries data set.png

            (0) 
            1. Ian Henry Post author

              Hi Sudeepti,

              It sounds like you are doing the correct thing with the “Tabular Format” Input Data Format. Is there an error or just no rules being generated?

              If there are no rules try reducing the support and confidence values.

              Try 0.1 or even 0.01 for both if the combinations are not that common.

              (0) 
  2. Erik MARCADE

    And by the way, perhaps you should use Recommendaiton engine from automated PA 2.2. This is the technique that got third in the million song data set challenges, much faster than a-priori, easier to use, and with a the possibility to do in-database apply.

    (0) 

Leave a Reply