# How we built a Movie Recommendation engine by leveraging R on SAP HANA

**1. Background & motivation**

Recommendation is very popular in various industries and it benefits not only the companies but also the customers. Collaborative filtering is one famous recommendation algorithm used by many big companies including Amazon. Many one have already read a very good blog New Released Movie Recommender on SAP HANA One written by Wenjun. In his app, he used content based recommendation, since what it recommends are new released movies, which no one has seen before. In this app, we will use collaborative filtering algorithms to recommend old movies.

HANA integrates R language directly. So it becomes flexible to use R and it’s abundant external packages. We will use existing recommendation algorithms in R external package recommenderlab to do movie recommendations. We also get many evaluation results from these different recommenders, so that we can choose the best recommender based on one specific training dataset. Actually, the system is general, and it can not only recommend movies but also recommend others.

**2. Functionalities**

Several recommenders are available to be selected. After training the dataset for the specific problem, the best recommender will be selected. This can be done automatically based on comparing evaluation results of different recommenders. Or it can be done with some inputs from non-expert users, since we can visualize the evaluation results in a friendly and easy to understand way. This selection process can be done periodically, since the features of the dataset of one problem may change with the times. The selection process is yet to be considered and implemented.

For movie recommendation as an example, we built a demo, where one will be recommended several movies after he gives rating for at least 10 movies.

**3. Architecture**

We implement the logic of training, evaluation and comparison of recommenders in XS engine, so that the client can control and access them directly.

You may notice that model1 and model2 are stored in different places. Model1 is stored in the main memory on HANA server. So it’s more suitable to do real time recommendation, since the performance is much better when using PAL or SQLScript to implement some recommendation logic. While model 2 is stored on the hard disk of R server. Every time when do recommendation, the model need to be loaded into memory. The advantage is we can flexibly use R language and its abundant external packages. For most cases the models will not be so large, thus it can also do real time recommendation. In this prototype, we only use the later scenario. And we actually combine the training and evaluation process of recommenders to one procedure, so that the dataset should only be transferred once.

# 4. R integration with SAP HANA

SAP HANA is not only a fast in memory database. It also embeds powerful analytic capabilities. In includes PAL, BFL, R, Text analysis and so on, which make HANA an excellent platform to do predictive analytics.

In this app, we leverage R integration on SAP HANA. R is a free software programming language and a software environment for statistical computing and graphics. It is widely used in industries and universities. SAP HANA integrates R directly. That is you can write a procedure with R language and the tables in HANA can be accessed in R code directly. Also R’s abundant external packages can be used. This R integration make HANA more capabable to do predictive analytics.

# 5. Algorithms

We use collaborative filtering(CF) algorithms to do the recommendation just like other famous websites. recommenderlab is a R external package used to develop and research in recommenders. There are already some CF algorithms and their evaluation logic implemented in recommenderlab. We use them directly in this prototype. For more complex or specific implementation one can implement their own recommender, maybe by using some primitive functions in the packages. We want to show the easiness and flexibility to implement data mining algorithms when using HANA and R integration. First, CF algorithms will be briefly described. Then some code will be pasted here.

#### 4.1 Collaborative Filtering (CF)

CF algorithms will investigate the information contained in a dataset recording users’ ratings to items, based on that it predict the missing ratings of users. There are two categories, one is user based the other is item based. User based CF will first find some similar users for the target user, and then aggregate these similar users’ ratings of items as a prediction of the target user’s ratings. The similarity between users is defined on their rating pattern. That is we view every user’s rating on all items as a vector (see the following figure). And define the similarity between two users with cosine similarity or pearson similarity. Item based CF will find for each item some similar items. The similarity is defined like user based CF except the pattern is all users’ rating on the item. Item based CF is more suitable for real time recommendation, since the similarities among items can be calculated and stored in advance.

#### 4.2 Code

Following is the main procedure to train and evaluate the recommenders by using R language integrated in HANA.

```
create procedure evaluate_recommender_r(in rating TT_RATING,in cntl TT_EVALUATION_CNTL,out result TT_KPI)
language rlang
as
begin
library(recommenderlab)
r <- as(as(rating[1],"list")[[1]],"numeric")
c <- as(as(rating[2],"list")[[1]],"numeric")
v <- as(as(rating[3],"list")[[1]],"numeric")
rowNames <- as(c(1:max(r)),"character")
colNames <- as(c(1:max(c)),"character")
sm <- sparseMatrix(r,c,x=v,dimnames=list(rowNames,colNames))
rm <- new("realRatingMatrix",data=sm)
m <- as(cntl[1,1],"character")
tr <- cntl[1,2]
g <- as(rowCounts(rm)*cntl[1,3],"integer")
tn <- cntl[1,4]
kv <- cntl[1,5]
gr <- cntl[1,6]
if(kv>0){
scheme <- evaluationScheme(rm,method=m,train=tr,k=kv,given=g,goodRating=gr)
}else{
scheme <- evaluationScheme(rm,method=m,train=tr,given=g,goodRating=gr)
}
#Evaluation of predicted ratings
t_rec_1 <- system.time( rec_ubcf_cosine <- Recommender(getData(scheme,"train"),"UBCF"))[3]
save(rec_ubcf_cosine,file="/home/ruser/models/rec_ubcf_cosine.rda")
t_pre_1 <- system.time( p1 <- predict(rec_ubcf_cosine,getData(scheme,"known"),type="ratings"))[3]
t_rec_2 <- system.time( rec_ubcf_pearson <- Recommender(getData(scheme,"train"),"UBCF",
parameter=list(method="pearson")))[3]
save(rec_ubcf_pearson,file="/home/ruser/models/rec_ubcf_pearson.rda")
t_pre_2 <- system.time( p2 <- predict(rec_ubcf_pearson,getData(scheme,"known"),type="ratings"))[3]
t_rec_3 <- system.time( rec_ibcf_cosine <- Recommender(getData(scheme,"train"),"IBCF"))[3]
save(rec_ibcf_cosine,file="/home/ruser/models/rec_ibcf_cosine.rda")
t_pre_3 <- system.time( p3 <- predict(rec_ibcf_cosine,getData(scheme,"known"),type="ratings"))[3]
#t_rec_4 <- system.time( rec_popular <- Recommender(getData(scheme,"train"),"POPULAR"))[3]
#save(rec_popular,file="/home/ruser/models/rec_popular.rda")
#t_pre_4 <- system.time( p4 <- predict(rec_popular,getData(scheme,"known"),type="ratings"))[3]
t_rec_4 <- system.time( rec_ibcf_pearson <- Recommender(getData(scheme,"train"),"IBCF",
parameter=list(method="pearson")) )[3]
save(rec_ibcf_pearson,file="/home/ruser/models/rec_ibcf_pearson.rda")
t_pre_4 <- system.time( p4 <- predict(rec_ibcf_pearson,getData(scheme,"known"),type="ratings") )[3]
ratingKPI <- rbind(
calcPredictionError(p1,getData(scheme,"unknown")),
calcPredictionError(p2,getData(scheme,"unknown")),
calcPredictionError(p3,getData(scheme,"unknown")),
calcPredictionError(p4,getData(scheme,"unknown"))
)
#Evaluation of a top-N recommender algorithm
er1 <- evaluate(scheme,method="UBCF",n=c(tn))
er2 <- evaluate(scheme,method="UBCF",n=c(tn),parameter=list(method="pearson"))
er3 <- evaluate(scheme,method="IBCF",n=c(tn))
er4 <- evaluate(scheme,method="POPULAR",n=c(tn))
#er4 <- evaluate(scheme,method="IBCF",n=c(tn),parameter=list(method="pearson"))
er1 <- avg(er1)[,c(6,7,8,9)]
er2 <- avg(er2)[,c(6,7,8,9)]
er3 <- avg(er3)[,c(6,7,8,9)]
er4 <- avg(er4)[,c(6,7,8,9)]
topnKPI <- rbind(er1,er2,er3,er4)
preTimeKPI <- rbind(t_pre_1,t_pre_2,t_pre_3,t_pre_4)
recTimeKPI <- rbind(t_rec_1,t_rec_2,t_rec_3,t_rec_4)
KPI <- cbind(c("UBCF_COSINE","UBCF_PEARSON","IBCF_COSINE","IBCF_PEARSON"),
topnKPI,ratingKPI,preTimeKPI,recTimeKPI)
#KPI <- cbind(c("UBCF_COSINE","UBCF_PEARSON","IBCF_COSINE","POPULAR"),
topnKPI,ratingKPI,preTimeKPI,recTimeKPI)
result <- data.frame(ALGORITHM_NAME=as(KPI[,1],"character"),RECALL=as(KPI[,2],"character"),
PRECISION=as(KPI[,3],"character"),FPR=as(KPI[,4],"character"),TPR=as(KPI[,5],"character"),
MAE=as(KPI[,6],"character"),MSE=as(KPI[,7],"character"),RMSE=as(KPI[,8],"character"),
TIME_PREDICTION=as(KPI[,9],"character"),TIME_TRAIN=as(KPI[,10],"character"))
end;
```

** **

**6. Demo**

Here is the link to the demo for movie recommendation

http://10.58.13.7:8001/AMR/ui/WebContent/

This link is temporary and can only be accessed in SAP network.

You can also view a short video for this demo.

# 7. Summary and next steps

We built a prototype for a general recommendation system by leveraging R on SAP HANA. And we use movie recommendation as an example. In the current implementation, we store the model on R server only. Later we may implement some recommenders with models storing in HANA or even some logic of the recommendation algorithms are implemented in HANA.

Below are some references:

R Language http://www.r-project.org/

recommenderlab http://cran.r-project.org/web/packages/recommenderlab/index.html

SAP HANA manuals http://help.sap.com/hana_appliance

SAP HANA Cookbook for MySQL Developers http://www.saphana.com/docs/DOC-4534

For more information about predictive analytics on SAP HANA, here is a good link: saphana.com/predictive

Good effort, This one make me to remember NetFlix million dollar challenge.. ðŸ˜‰

Thanks.