Technical Articles
Empower your business with the SQL-Based Recommender
SAP Predictive Analytics – SQL Based Recommender
The new SQL-Based Recommender released with SAP Predictive Analytics 3.2 is a powerful tool for optimizing your business
The SQL-Based Recommender is a machine learning system capable of treating millions of transactions and quickly and accurately predicting the items users will most likely buy.
Why use a recommender system?
Recommender systems have become increasingly popular in recent years and are used in a variety of sales areas including movies, music, news, and books because of the increase in modern online commerce.
Free of the physical limits that restrict brick-and-mortar businesses, online businesses sell a number of products that is growing exponentially.
A bookstore, for example, has limited shelf space and can display only a small fraction of books published and typically features only the few recent bestsellers.
Online commerce can offer more choice through its larger distribution channel and leverage the “long tail” phenomenon whereby products with low demand and low sales volume can collectively result in greater profitability than the few current best sellers.
To succeed, online businesses must tailor product recommendations for individual customers based on a detailed analysis of their shopping carts.
This process consists in, first, extracting recurrent product choices among all customers to establish association rules and then, applying the rules to specific customers.
What is the SQL-Based Recommender?
The SQL-Based Recommender is part of APL (SAP Automated Predictive Library) starting with the recent release 3.2. It is based on the principles of the Apriori algorithm.
An association rule tells you: for a customer who buys an item A (called antecedent), what is the next item B that this customer is likely to buy (called consequent).
These principles first discover the probability of customers buying products together and express these combinations as association rules. An association rule tells you, for a customer who buys item A – called the antecedent – the next item B – called the consequent — that the customer is most likely to buy.
SQL-Recommender, lets you evaluate associations based on different metrics:
Relying on the discovered association rules, you can then propose to customers the items that they are the most likely to buy.
How Do you use SQL-Based Recommender?
SQL-Based Recommender is entirely implemented in SAP HANA. It offers the benefit of high-performance execution with all processing done at the core of the SAP HANA SQL engine.
Use of the SQL-Based Recommender is simple. Three procedures are available:
Please refer to the examples available from APL installation (folder samples/sql/procedure/apl_samples/recommender).
Using the SQL-Based recommender on a dataset of 49 million transactions
The “Million Song“ dataset is a well-known public dataset for building a recommendation system.
https://www.kaggle.com/c/msdchallenge#description.
It contains 49 million transaction rows.
# of rows | 48,373,586 |
# of user | 1,019,318 |
# of item | 384,546 |
The SQL-Based Recommender demonstrated impressive performance:
- 11 minutes to train the model, on a transaction table of 49 million rows,
- 35 million association rules generated,
- 2 minutes to apply those association rules to 110 thousand users in 1.3 million transaction rows
- 43 million items recommended (about 500 recommended items per user)
This experiment was done on a physical SAP HANA machine:
- 80 logical CPUs / 40 Cores / Intel(R) Xeon(R) CPU E7- 4870 2.4GHz,
- 512GB RAM
- SAP HANA Hana version 1.0 sps12.
The accuracy expressed as MAP (Mean Average Precision) is 15.94%, which is 7 times higher than Best Sellers based only.
Another way to evaluate the accuracy is through the chart of “Precision” x “Recall”. These two metrics express the tradeoff between the capacity to predict good items and the sensibility to discover all potential good items. One is typically careful about the area under the Precision-Recall curve. The higher the surface (close to 1.0), the better the model can both predict the correct items and cover the maximum number of relevant items in a meaningful order.
This chart below shows the Precision-Recall graph with SQL-Based Recommender applied on the Million Song dataset. We can see the surface of its curve (orange line) is largely above the one of “Best Sellers based”. That means in any case, SQL-Recommender provides a greater value than the basic method of proposing best sellers to customers.
This performance is equivalent to the one of the 3 top-ranked competitors in Kaggle. The “Score” number shown in the following screenshot is indeed the MAP number.
Using Datasets of different sizes
Another experiment was done with a series of datasets extracted from the dataset (another well-known public dataset).
The purpose was to see how the response time changes when the size of transaction dataset increases:
Less than 1 minute to train the model under 12 million transactions. Less than 3 minutes to train the model on 20 million transactions.
Take-away
The SQL-Based Recommender is a proven, scalable solution over large dataset and product catalogs.
It relies on a pure SQL based implementation. The architecture is very simple and natively integrated into the powerful SAP HANA platform.