Skip to Content
Author's profile photo Ian Henry

Enhancing Market Basket Analysis with PA 2.0 and SAP HANA

Market Basket Analysis (MBA) is interesting as a standalone activity, but where is gets more compelling is when you can really find trends, identify unknown relationships, and discover new business opportunities.  Using MBA for both Cross-Sell and Up-Sell is very common and we are all familiar with Amazon who do this very well at the bottom of almost every page. MBA can be used to optimise store layouts, and website designs so that your customers can easily find those items that are frequently bought together.  On your website having multiple layouts should be easy, having these change automatically depending on day, time, etc should also be easy. This can be done on a either temporary or permanently basis, perhaps you discover a totally different set of baskets being sold Monday to Thursday as opposed to the weekend period, to maximise this opportunity warrants a different user experience.  MBA can also be used to identify Driver Items, or PreRules as we call it, which is the initial item that causes the subsequent items to be sold in the same basket.  Another use of MBA is to help classify the basket or shopping trip as a whole, what was the reason that person went shopping? It still may not be black and white but MBA can help get some insight with this classification process.

I’ve already shown how to doBasket Analysis with SAP Predictive Analysis and SAP HANA – Part 1 and how can easy you use Apriori or Association Rules is to perform Association Analysis using SAP Predictive Analytics (PA 2.0), and with SAP HANA.  Visualising the results is also quick and easy using Lumira or Expert Analytics as shown here Basket Analysis with SAP Predictive Analysis and SAP HANA – Part 2: Visualisation of Results. By doing so we can transform the time that it takes to perform this from hours to a few seconds or minutes if we want to look across a larger data set.

One of the advantages to basket analysis is that it is so easy to perform, you just require 2 columns of data, so some of the usual data hurdles are removed.  As there’s only 2 input columns, the TRANSACTION and ITEM it really can be done by anyone.

One of this disadvantages to basket analysis is that is so easy, the algorithms typically do not support additional fields, so it can be more challenging to capitalise on your findings.  For example the output data does not show you which underlying transactions were used to generate a particular rule so you can’t easily attribute revenue, margin, discount, profit, promotional amount, time, location to the rules that have been generated.  This does not mean it can’t be done, it just means that you don’t get it “for free” with the analysis.

There’s 3 further techniques that I’ve incorporated into the Basket Analysis process.

1. Filtering the input data set for targeted analysis, perhaps you want to look at just one or two product categories, or you want to look only at weekend sales, a promotion or a time of day.  This is the easiest to achieve, some of these selections can be easily applied, others may take a small amount of data modelling.  By filtering the dataset we therefore reduce the data volume and increase the speed of executing analysis and further reduce the time it takes to just a few seconds.

2. Apply simple data modelling to the output data set. While the results of basket analysis are only truly valid at the level they are performed at, for example if you were analysing SKUs the rules created would also hold true with the parent members of the Product hierarchy, but the support, confidence and lift values could only be used as an indication as to the values of the underlying base products.  The easiest way to build an interesting model with the output dataset is to either use the PreRule or use Apriori Lite, which restricts the PostRule to a single item.

3. Manipulating the input data set, this is also easy to do, and is perhaps the most powerful, but also the most computationally expensive, but as we can run basket analysis end to end in just 20 seconds or 180 seconds for almost 100 million records, this is not such a problem anymore.  For example if we wish to split the day into 5 bands, perhaps covering Morning, Lunchtime, Afternoon, Evening, Late Night, we can perform Basket Analysis across these. This can now do this in a single execution of the predictive process. Be aware having 5 time bands will take more processing than before.  How about if we wanted to do this a a store level and we had 200+ stores, previously the time taken to do this would have been prohibitive, making this impossible, but by using the combination of SAP HANA and PA 2.0 this now easily achievable.

When we manipulate the input dataset what we can do is append the time period or store to the input data record, now we will retain this information throughout the analysis process.  The input data now contains 3 pieces of information the Store, Item and Transaction.  We can then use output dates within a HANA Calculation View to join the store with the store hierarchy and we’re able to slice and dice the data looking for the best/worst performing stores.  We can look at regional variations, compare the different store types and identify those that respond best to campaigns and those that require individual treatment due to the baskets being sold.  Thanks go to Ran Bittmann for advising me here.

The agility of SAP HANA is one of the key differentiators here.  All 3 of the above enhancements were achieved quickly, easily and graphically with HANA Modelling. Because the HANA models are purely views they don’t store data so require no further storage and no additional steps and no batch jobs to be executed when you want to re-run the Basket Analysis.

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.