Have you ever wondered how retail web sites like amazon.com suggests and recommends other products, when you are looking at any item on the site. Before I share how it is done, let me lay out some background.


Every Business and specially retail business is always looking to answer questions around their customers such as How should their products be bundled? Which customers are most likely to leave? What are their customers talking about? Who are their most valuable customers? and How can we improve marketing campaign response? etc.

To respond to these questions, business collect huge amounts of data on customers, products with objectives of customer acquisition, customer retention, up sell or cross sell etc.  Then Data mining and analysis on this data can potentially answer these questions.

These answers are achieved using predictive analysis by spotting patterns in data and by mining data, using various clustering, classification and association analysis techniques for the unexpected insights or facts.

The traditional data warehouse implementations and data mining and that with predictive analysis have been employed typically by large size business only i.e. only those which could support a fairly complex information management infrastructure.

Now with the availability of SAP HANA Technology platform, such analytics are in the easy reach of any information worker. It even becomes easier for the business user, since these analytics can be provided as a HANA Cloud Platform solution as well.

I have been able to successfully build such applications on HANA Cloud Platform, which only requires data file from business user and all customer analysis can be performed real time and results shared instantaneously using a secure single sign on functionality.

Typical customer analytics include

  1. Customer segmentation allowing analysts to understand the landscape of the market in terms of customer characteristics and whether they naturally can be grouped into segments that have something in common and unique marketing campaign be done for each customer segment.
  2. Product Segmentation allowing the optimization of using product affinity; in most cases using Market Basket Analysis.
  3. Voice of customer and sentiment analysis on customer feedback and social media.You may see my earlier blog on this topic here at Voice of Customer , Sentiment analysis & Feedback service
  4. Customer Churn analysis, allowing to predict if business is going to loose any segment of their existing customers. Read about employee churn or turnovers in my blog here at Will your Employees leave in (say) next 2 years ?as an adaption of customer churn example.
  5. Upsell and Cross sell which aim to provide existing customers with additional or more valued products. This falls in the category  of what  sites like Amazon.com uses.

To implement it I chose to use Association analysis predictive algorithm which uncovers the hidden patterns, correlations among a set of items or objects. It helps to understand what products and services customers tend to purchase at the same time and thus by analyzing the purchasing trends of your customers with association analysis, you can predict their future behavior.

In order to perform association analysis, transaction history or list of Sales orders items is needed and  real time analysis can be performed in HANA in memory database . Analysis is done on products bought together or bought over a period of time i.e. not necessarily on the same date & time and rules are identified. Rules such as for example 98% of customers that purchase tires also get automotive services done or customers who buy mustard & ketchup also buy burgers.

Such analysis become basis of rules or past behavior and proposals such as frequently bought items together or customers also buy a product when they purchase a product can be made from these rules. So a rule is classified as if a customer buys a specific product(s) then customer also buys another product(s).

Such proposals can be helpful in retail not only for providing recommendations  or cross selling to customers  but also can further help in store layout, planning for buying patterns and add-on sales .

The key to use these rules or patterns is to understand how useful they are . Statistical terms for the usefulness is Support , Confidence & Lift indicators.

Support is indicator on how frequently the buying pattern is happening i.e. higher the percentage the frequent is thepurchasing pattern. For example of all the sales transactions how many times transactions did have mustard, ketchup and burgers ?
Confidence is indicator how certain is this rule ? i.e. Of all the transactions that have mustard and ketchup how many did also have burgers ? because there can be transactions where customers bought mustard and ketchup but did not buy burgers.

Higher Support and Confidence values are good indicator of a useful rule. However it may not be always true. For example in lots of transactions customers could have bought the so called another product anyways, irrespective of the specific product bought in first place or not. i.e. they might be purchasing burgers a lot of times, without buying mustard and ketchup , so it weakens the confidence of rule that if mustard & ketchup are bought then so is burger . So we need to compare the confidence with another indicator  which is number of transactions that will have specific product (in our example burger) out of total transactions.

The ratio of confidence and this new indicator is called Lift . Lift value of more than 1  makes the rule useful.

Running association analysis on several thousands of transactions often results in multiple rules. It is advised to break the rules that have minimum support called large itemsets and the use these large itemsets to find rules with atleast minimum confidence.

For example see image below in which first row is an example of rule that whenever a customer purchases product 15686 then customer also purchases 15692 . This rule has over 46% (i.e. it occurs quite often) support value and 95% confidence value.

Shopping basket rules 1.jpg

In the attached video here, for my test application I am sharing the results in a table display. To keep data private, I am using SAP Product numbers on display though Product description will make more sense to the business user.

The rules are available in both JSON and Predictive Model Markup Language (PMML).

A REST web service can  be easily made available on HANA for any web site application to read these rules.

My testing’s done on roughly 50,000 transactions or sales line items of test data records, which was extracted from SAP ECC system for a chosen time period. Chosing different time period allowed me to do comparative trend analysis of this predictive behavior analysis, and it could be basis of a further what if investigative analysis for planning of sales in future time periods for different regions or sales areas and/or customer groups. i.e. see purchasing behavior of your customers in Alaska versus customers in Florida in the month of may for past few years , can you spot a pattern ?

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply