Enhancing Market Basket Analysis with PA 2.0 and S...

Ian_Henry · ‎03-31-2015

In previous blogposts, I've described the steps to perform Basket Analysis with some SAP tools

Basket Analysis with SAP Predictive Analysis and SAP HANA - Part 1

Basket Analysis with SAP Predictive Analysis and SAP HANA - Part 2: Visualisation of Results

The SAP HANA Effect – Basket Analysis 60x Faster

This process included using Apriori or Association Rules to perform Affinity Analysis. Visualising the results is quick and easy using your BI tool of choice, such as SAP Analytics Cloud. We can reduce the time that it takes to perform this from hours to a few seconds or minutes if we want to look across a larger data set.

Market Basket Analysis (MBA) is interesting as a standalone activity, but where is gets more compelling is when you can really find trends, identify unknown relationships, and discover new business opportunities. Using MBA for both Cross-Sell and Up-Sell is very common and we are all familiar with Amazon who do this very well at the bottom of almost every page. MBA can be used to optimise store layouts, and website designs so that your customers can easily find those items that are frequently bought together. On your website having multiple layouts should be easy, having these change automatically depending on day, time, etc should also be easy. This can be done on a either temporary or permanently basis, perhaps you discover a totally different set of baskets being sold Monday to Thursday as opposed to the weekend period, to maximise this opportunity warrants a different user experience. MBA can also be used to identify Driver Items, or PreRules as we call it, which is the initial item that causes the subsequent items to be sold in the same basket. Another use of MBA is to help classify the basket or shopping trip as a whole, what was the reason that person went shopping? It still may not be black and white but MBA can help get some insight with this classification process.

One of the advantages to basket analysis is that it is so easy to perform, you just require 2 columns of data, so some of the usual data hurdles are removed. As there's only 2 input columns, the TRANSACTION and ITEM it can be done by almost anyone.

One of this disadvantages to basket analysis is that is so easy, the algorithms typically do not support additional input variables, so it can be more challenging to capitalise on your findings. For example the output data does not show you which underlying transactions were used to generate a particular rule so you can't easily attribute revenue, margin, discount, profit, promotional amount, time, location to the rules that have been generated. This does not mean it can't be done, it just means that you don't get it "for free" with the analysis.

Three further techniques that I've incorporated into the Basket Analysis process.

1. Filtering the input data set for targeted analysis, perhaps you want to look at just one or two product categories, or you want to look only at weekend sales, a promotion or a time of day. This is the easiest to achieve, some of these selections can be easily applied, others may take a small amount of data modelling. By filtering the dataset we therefore reduce the data volume and increase the speed of executing analysis and further reduce the time it takes to just a few seconds.

2. Apply simple data modelling to the output data set. While the results of basket analysis are only truly valid at the level they are performed at, for example if you were analysing SKUs the rules created would also hold true with the parent members of the Product hierarchy, but the support, confidence and lift values could only be used as an indication as to the values of the underlying base products. The easiest way to build an interesting model with the output dataset is to either use the PreRule or use Apriori Lite, which restricts the PostRule to a single item.

3. Manipulating the input data set, this is also easy to do, and is perhaps the most powerful, but also the most computationally expensive, but as we can run basket analysis end to end in just 20 seconds or 180 seconds for almost 100 million records, this is not such a problem anymore. For example if we wish to split the day into 5 bands, perhaps covering Morning, Lunchtime, Afternoon, Evening, Late Night, we can perform Basket Analysis across these. This can now do this in a single execution of the predictive process. Be aware having 5 time bands will take more processing than before. How about if we wanted to do this a a store level and we had 200+ stores, previously the time taken to do this would have been prohibitive, making this impossible, but by using the combination of SAP HANA and PA 2.0 this now easily achievable.

When we manipulate the input dataset we can append the time period or store to the input data record, now we will retain this information throughout the analysis process. The input data now contains 3 pieces of information the Store, Item and Transaction. We can then use output dates within a HANA Calculation View to join the store with the store hierarchy and we're able to slice and dice the data looking for the best/worst performing stores. We can look at regional variations, compare the different store types and identify those that respond best to campaigns and those that require individual treatment due to the baskets being sold. Thanks go to ranm.bittmann for advising me here.

The agility of SAP HANA is one of the key differentiators here. All 3 of the above enhancements were achieved quickly, easily and graphically with HANA Modelling. Because the HANA models are purely virtual views they don't physically store data so require no further storage and no additional steps and no batch jobs to be executed when you want to re-run the Basket Analysis.

Enhancing Market Basket Analysis with PA 2.0 and SAP HANA

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win