Skip to Content

1. Background & motivation


This June, I took a mini-fellowship in Palo Alto and built a smart app, Real-time sentiment rating of movies on SAP HANA One. After I came back to Shanghai, my colleagues and I wanted to extend this app. Based on the real-time sentiment rating of movies, we can rank the new released movies according to the sentiment rating. But in the smart app, the result is same to everyone. The result has no difference between different users. In order to improve the smart app, an idea came to our mind: if we combine the info of users e.g. Twitter info, what can we do?

We decided to build an advanced movie rating app with the movie recommender. We can detect the movie taste of users and recommend personalized different new released movies to different users based on their Twitter info. With the smart app, movie goers can go to cinemas directly and choose the favorite movie which the recommender recommended to watch in real-time!

2. Functionalities

#movienight is a smart application based on SAP HANA One. It has three main functionalities:

  1. Show the real-time sentiment rating of new released movies based on Twitter. For each movie, we can show the distribution of strong positive, weak positive, neutral, weak negative and strong negative sentiment. It uses the native text analysis in SAP HANA to do the sentiment analysis.
  2. Recommend the favorite movie from new released movies to the user based on user’s Twitter. It uses the content-based recommender algorithm to calculate the correlation between the new released movies and the old movies in the movie library.

3. Architecture & Workflow

The following is the system architecture:

system arch.png

Why “Rotten Tomatoes Crawler”, “Twitter Crawler” and “Twitter Auth” are outside SAP HANA?

Since SP6, XS has the outbound connectivity feature. So, we can make HTTP/HTTPS request in our XS app. We can crawl data using this feature. However, currently XS does not support background task. For example, we need to crawl the movie metadata and tweets periodically. Currently the XSJS can only be called passively, it cannot run tasks actively. That’s the reason why we use Java to crawl data and handle the auth currently. In SP7, XS will support background task. Maybe we can update it later.

The following shows the workflow.

/wp-content/uploads/2013/08/workflow_260583.png

4. Algorithms

We used content-based recommender algorithm, heuristic approach, TF-IDF. The following is a simple step example:

(1) Get the top 200 critically acclaimed movies last 5 years http://listology.com/nukualofa/list/top-200-critically-acclaimed-movies-last-5-years, use it as the initial old movie library.

(2) For each movie from (1), use Rotten Tomatoes API to get the detail info and insert into table “MOVIE_ATTR”. Fill the column “MOVIE_ID”, “ATTR”, “ATTR_VALUE”, e.g.:

MOVIE_ID

ATTR

ATTR_VALUE

ATTR_WEIGHT

A

GERNE

Action

A

GERNE

Comedy

A

DIRECTOR

Tom

A

CAST

Alex

A

CAST

Max

A

STUDIO

Warner

B

GERNE

Action

B

GERNE

Horror

B

DIRECTOR

Jerry

B

CAST

Mary

B

CAST

Tim

B

STUDIO

Warner

(3) Get the new released movies from Rotten Tomatoes API and insert into table “MOVIE_ATTR”. Also fill the column “MOVIE_ID”, “ATTR”, “ATTR_VALUE”, e.g.:

MOVIE_ID

ATTR

ATTR_VALUE

ATTR_WEIGHT

C

GERNE

Horror

C

GERNE

Comedy

C

DIRECTOR

Jerry

C

CAST

Max

C

STUDIO

Times

D

GERNE

Action

D

GERNE

Comedy

D

DIRECTOR

Tom

D

CAST

Max

D

STUDIO

Warner

(4) Based on “MOVIE_ATTR”, fill column “MOVIE_ID_NEW” and “MOVIE_ID” in table “MOVIE_CORRELATION”, e.g.:

MOVIE_ID_NEW

MOVIE_ID

CORRELATION_WEIGHT

C

A

C

B

C

C

C

D

D

A

D

B

D

C

D

D

(5) Calculate the column “ATTR_WEIGHT” in table “MOVIE_ATTR”. For example: If we use the example MOVIE_ATTR in 2) and 3), for each “ATTR_WEIGHT”, the value can be calculated using the following formula:

f1.PNG

So, we can get the following result in “MOVIE_ATTR”:

MOVIE_ID

ATTR

ATTR_VALUE

ATTR_WEIGHT

A

GERNE

Action

log 4/3

A

GERNE

Comedy

log 4/3

A

DIRECTOR

Tom

log 4/2

A

CAST

Alex

log 4/1

A

CAST

Max

log 4/3

A

STUDIO

Warner

log 4/3

B

GERNE

Action

log 4/3

B

GERNE

Horror

log 4/2

B

DIRECTOR

Jerry

log 4/2

B

CAST

Mary

log 4/1

B

CAST

Tim

log 4/1

B

STUDIO

Warner

log 4/3

C

GERNE

Horror

log 4/2

C

GERNE

Comedy

log 4/3

C

DIRECTOR

Jerry

log 4/2

C

CAST

Max

log 4/3

C

STUDIO

Times

log 4/1

D

GERNE

Action

log 4/3

D

GERNE

Comedy

log 4/3

D

DIRECTOR

Tom

log 4/2

D

CAST

Max

log 4/3

D

STUDIO

Warner

log 4/3

(6) Calculate the column “CORRELATION_WEIGHT” in table “MOVIE_CORRELATION”. Use the following formula, where u stands for the vector of a new movie while v stands for the vector of the other movie:

f2.PNG

For instance, considering the new movie C and the other movie A, actually we have the following attribute weight table (matrix). As one table in SAP HANA can have maximum 1000 columns, we cannot keep the following matrix in database, so we calculate it in real-time:

MOVIE

GERNE

GERNE

GERNE

DIRECTOR

DIRECTOR

CAST

CAST

STUDIO

STUDIO

Horror

Comedy

Action

Jerry

Tom

Max

Alex

Times

Warner

C

log 4/2

log 4/3

0

log 4/2

0

log 4/3

0

log 4/1

0

A

0

log 4/3

log 4/3

0

log 4/2

log 4/3

log 4/1

0

log 4/3

Now, we can use the above formula to calculate the correlation weight between movie C and A:

f3.PNG

Fill the whole “CORRELATION_WEIGHT” column in the same way.

(7) Now we have the complete “MOVIE_CORRELATION” table, e.g.,

MOVIE_ID_NEW

MOVIE_ID

CORRELATION_WEIGHT

C

A

1

C

B

0

C

C

1

C

D

0.5

D

A

0.2

D

B

0.7

D

C

0.5

D

D

1

The value of “CORRELATION_WEIGHT” should be in [0, 1].

(8) When the user logs on, check whether the table “USER_MOVIE_LIKE” contains the info of the user or not.

a. If yes, calculate directly which new released movie should be recommended. For example, user “12345” logs on and we have the following records in “USER_MOVIE_LIKE”:

_USER

MOVIE_ID

_LIKE

12345

A

1

12345

B

-1

We can calculate the score for both movie C and D in the following way:

score(C) = 1 * 1 + (-1) * 0 = 1

score(D) = 1 * 0.2 + (-1) * 0.7 = -0.5

The higher the score is, the more the movie should be recommended. After we calculate the score for all new released movies, we will choose the most three highest score movies to recommend to the user.

b. If no, we first crawl Twitter data from the user to see if there are any tweets about movies in the library.

a) If there are some tweets, we will do sentiment analysis for those tweets and update the table “USER_MOVIE_LIKE”. After that, the same way in a.

b) If there are no such tweets, we will recommend the most three popular movies among the new released movies.

(9) Till now, some new released movies are recommended to the user. It does not matter if they are based on the user’s favor (using “USER_MOVIE_LIKE”) or not (no tweets about movies). For each recommended movie, the user can choose “Like” or “Dislike”. This feedback will be sent to SAP HANA and stored in “USER_MOVIE_LIKE”.

(10) When a new movie becomes an old movie or a new movie is displayed, update/recalculate “MOVIE_ATTR” and “MOVIE_CORRELATION”.

5. XS app step by step

We still choose XS to build the app to prevent data transfer latency between the database and the web application server. The application was build in the following steps:

Step1: Create schema, tables and full text indexes

“Movie” and “Tweet” are two main tables. “Movie” is to store movie metadata crawled from Rotten Tomatoes; “Tweet” is to store tweets crawled from Twitter. And we need to create full text index on Tweet(content) to get tweets’ sentiment of movies with text analysis. Table definition is as following.

Table Movie:

movietable.PNG

Table Tweet:

tweettable.PNG

Full text index:

CREATE FULLTEXT INDEX TWEET_INDEX ON TWEET (CONTENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' ASYNC FLUSH EVERY 1 MINUTES LANGUAGE DETECTION ('EN') TEXT ANALYSIS ON;
ALTER FULLTEXT INDEX TWEET_INDEX FLUSH QUEUE;

After that, we need to create tables for recommender algorithm. They are MOVIE_ATTR, MOVIE_ATTR_WEIGHT, MOVIE_WEIGHT, MOVIE_CORRELATION and USER_MOVIE_LIKE. About detailed table definition, you can refer to algorithm description.


Step2: Create stored procedures

There are 8 procedures. We illustrate their function and IO parameters in a table, and show some key ones.

Procedure Name

Function

IO

getMovieList_R

get movies order by rating desc

IN: N/A

OUT:Table Type Movie_Brief

getMovieList_V

get movies order by voting number desc

IN: N/A

OUT:Table Type Movie_Brief

getTopMovies

get top N movies order by voting number desc

IN: N

OUT: Table Type Movie_Synopsis

getMovieDetail

get movie’s detail information

IN: Movie_ID

OUT: Table Type Movie_Detail

getSentiment

get sentiment information about one movie

IN: Movie_ID

OUT: Table Type Sentiment

calculateWeight

fill tables ‘movie_attr_weight’,’movie_weight’,

’movie_correlation’

IN: N/A

OUT: N/A

getUserMovieLike

get like/dislike number of one movie

IN: Movie_ID

OUT: Table Type UserMovieLike

recommendMovies

recommend 3 movies for one user

IN: User_ID

OUT: Table Type Movie_Synopsis

We show 4 key procedures below.

(1) getTopMovies

CREATEPROCEDURE GETTOPMOVIES(IN N INTEGER, OUT RESULT MOVIESYNOPSIS) LANGUAGE SQLSCRIPT READS SQL DATA AS
BEGIN
RESULT =
SELECT TOP :N A.ID, A.TITLE, A.SYNOPSIS, A.POSTER, B.RATING, B.NUM, IFNULL(C.LIKE, 0) LIKE, IFNULL(C.DISLIKE, 0) DISLIKE
FROM MOVIE A
LEFT JOIN
(
SELECT A.MOVIEID AS ID, SUM(NUM) AS NUM, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL)/SUM(NUM), 5, 2) END AS RATING
FROM
 (
SELECT A.MOVIEID, B.TA_TYPE, COUNT(B.TA_TYPE) AS NUM,
CASE B.TA_TYPE
WHEN 'StrongPositiveSentiment' THEN COUNT(B.TA_TYPE) * 5
                        WHEN 'WeakPositiveSentiment' THEN COUNT(B.TA_TYPE) * 4
                        WHEN 'NeutralSentiment' THEN COUNT(B.TA_TYPE) * 3
                        WHEN 'WeakNegativeSentiment' THEN COUNT(B.TA_TYPE) * 2
                        WHEN 'StrongNegativeSentiment' THEN COUNT(B.TA_TYPE) * 1
                END AS TOTAL
                FROM TWEET A LEFT JOIN "$TA_TWEET_INDEX" B
                ON A.ID = B.ID
                AND B.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')
                GROUP BY A.MOVIEID, B.TA_TYPE
 ) A
GROUP BY A.MOVIEID
) B
ON A.ID = B.ID
LEFT JOIN
(
SELECT MOVIE_ID AS ID,  SUM(CASE WHEN _LIKE =1 THEN 1 END) AS LIKE, SUM(CASE WHEN _LIKE = -1 THEN 1 END) AS DISLIKE
FROM USER_MOVIE_LIKE
GROUP BY MOVIE_ID
) C
ON A.ID = C.ID
ORDER BY B.NUM DESC;
END;

(2) getMovieDetail


CREATEPROCEDURE GETMOVIEDETAIL(IN ID INTEGER, OUT RESULT MOVIEDETAIL) LANGUAGE SQLSCRIPT READS SQL DATA AS
BEGIN
RESULT =
SELECT A.ID, A.TITLE, A.YEAR, A.GENRES, A.MPAA_RATING, A.RUNTIME, A.RELEASE_DATE, A.SYNOPSIS, A.POSTER, A.ABRIDGED_CAST, A.DIRECTOR, A.STUDIO,
 A.CLIP1, A.CLIP2, A.CLIP3, A.CLIP4, IFNULL(B.LIKE,0) LIKE, IFNULL(B.DISLIKE, 0) DISLIKE, C.NUM, C.RATING
FROM MOVIE A
LEFT JOIN
(
SELECT MOVIE_ID AS ID,  SUM(CASE WHEN _LIKE =1 THEN 1 END) AS LIKE, SUM(CASE WHEN _LIKE = -1 THEN 1 END) AS DISLIKE
FROM USER_MOVIE_LIKE
WHERE MOVIE_ID = :ID
GROUP BY MOVIE_ID
) B
ON A.ID = B.ID
INNER JOIN
(
SELECT A.MOVIEID AS ID, SUM(NUM) AS NUM, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL)/SUM(NUM), 5, 2) END AS RATING
FROM
 (
SELECT A.MOVIEID, B.TA_TYPE, COUNT(B.TA_TYPE) AS NUM,
CASE B.TA_TYPE
WHEN 'StrongPositiveSentiment' THEN COUNT(B.TA_TYPE) * 5
                        WHEN 'WeakPositiveSentiment' THEN COUNT(B.TA_TYPE) * 4
                        WHEN 'NeutralSentiment' THEN COUNT(B.TA_TYPE) * 3
                        WHEN 'WeakNegativeSentiment' THEN COUNT(B.TA_TYPE) * 2
                        WHEN 'StrongNegativeSentiment' THEN COUNT(B.TA_TYPE) * 1
                END AS TOTAL
                FROM TWEET A LEFT JOIN "$TA_TWEET_INDEX" B
                ON A.ID = B.ID
                AND B.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')
                WHERE A.MOVIEID = :ID
                GROUP BY A.MOVIEID, B.TA_TYPE
 ) A
GROUP BY A.MOVIEID
) C
ON A.ID = C.ID
WHERE A.ID = :ID;
END;

(3) calculateWeight

CREATEPROCEDURE CALCULATEWEIGHT() LANGUAGE SQLSCRIPT AS
BEGIN
DELETE FROM MOVIE_ATTR_WEIGHT;
INSERT INTO MOVIE_ATTR_WEIGHT
(
        SELECT A.ID, A.ATTR, A.ATTR_VALUE, LN((SELECT COUNT(DISTINCT ID) FROM MOVIE_ATTR) / B.NUM) ATTR_WEIGHT,
                   CASE C.TAG WHEN 1 THEN 1 ELSE 0 END AS _NEW
        FROM MOVIE_ATTR A
        INNER JOIN
        (
                SELECT ATTR, ATTR_VALUE, COUNT(1) NUM FROM MOVIE_ATTR GROUP BY ATTR, ATTR_VALUE
        ) B
        ON A.ATTR = B.ATTR AND A.ATTR_VALUE = B.ATTR_VALUE
        LEFT JOIN
        (
                SELECT ID, 1 AS TAG FROM MOVIE
        ) C
        ON A.ID = C.ID
);
DELETE FROM MOVIE_WEIGHT;
INSERT INTO MOVIE_WEIGHT (
        SELECT ID, SQRT(SUM(POWER(ATTR_WEIGHT, 2))) WEIGHT FROM MOVIE_ATTR_WEIGHT GROUP BY ID
);
DELETE FROM MOVIE_CORRELATION;
INSERT INTO MOVIE_CORRELATION
(
        SELECT A.*, IFNULL(B.CORRELATION_WEIGHT, 0) CORRELATION_WEIGHT FROM
        (
                SELECT A.ID MOVIE_ID_NEW, B.ID MOVIE_ID FROM
                (SELECT DISTINCT ID FROM MOVIE_ATTR_WEIGHT WHERE _NEW = 1) A,
                (SELECT DISTINCT ID FROM MOVIE_ATTR_WEIGHT) B
        ) A
        LEFT JOIN
        (
                SELECT A.MOVIE_ID_NEW, A.MOVIE_ID, A.NUM / (B.WEIGHT * C.WEIGHT) CORRELATION_WEIGHT
                FROM
                (
                        SELECT A.ID MOVIE_ID_NEW, B.ID MOVIE_ID, SUM(POWER(A.ATTR_WEIGHT, 2)) NUM
                        FROM
                        ( SELECT * FROM MOVIE_ATTR_WEIGHT WHERE _NEW = 1 ) A
                        INNER JOIN
                        MOVIE_ATTR_WEIGHT B
                        ON A.ATTR = B.ATTR AND A.ATTR_VALUE = B.ATTR_VALUE
                        GROUP BY A.ID, B.ID
                ) A
                INNER JOIN MOVIE_WEIGHT B
                ON A.MOVIE_ID_NEW = B.ID
                INNER JOIN MOVIE_WEIGHT C
                ON A.MOVIE_ID = C.ID
        ) B
        ON A.MOVIE_ID_NEW = B.MOVIE_ID_NEW AND A.MOVIE_ID = B.MOVIE_ID
);
END;

(4) recommendMovies

CREATEPROCEDURE RECOMMENDMOVIES(IN USERID VARCHAR(100), OUT RESULT MOVIESYNOPSIS) LANGUAGE SQLSCRIPT READS SQL DATA AS
BEGIN
RESULT =
SELECT A.ID, A.TITLE, A.SYNOPSIS, A.POSTER, C.RATING, C.NUM, IFNULL(D.LIKE, 0) LIKE, IFNULL(D.DISLIKE, 0) DISLIKE
FROM MOVIE A
INNER JOIN
(
        SELECT TOP 3 A.MOVIE_ID_NEW AS ID, SUM(A.CORRELATION_WEIGHT*B._LIKE) AS SCORE
        FROM MOVIE_CORRELATION A INNER JOIN USER_MOVIE_LIKE B
        ON A.MOVIE_ID = B.MOVIE_ID
        WHERE B._USER = :USERID
        GROUP BY A.MOVIE_ID_NEW
        ORDER BY SCORE DESC
) B
ON A.ID = B.ID
LEFT JOIN
(
SELECT A.MOVIEID AS ID, SUM(NUM) AS NUM, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL)/SUM(NUM), 5, 2) END AS RATING
FROM
 (
SELECT A.MOVIEID, B.TA_TYPE, COUNT(B.TA_TYPE) AS NUM,
CASE B.TA_TYPE
WHEN 'StrongPositiveSentiment' THEN COUNT(B.TA_TYPE) * 5
                        WHEN 'WeakPositiveSentiment' THEN COUNT(B.TA_TYPE) * 4
                        WHEN 'NeutralSentiment' THEN COUNT(B.TA_TYPE) * 3
                        WHEN 'WeakNegativeSentiment' THEN COUNT(B.TA_TYPE) * 2
                        WHEN 'StrongNegativeSentiment' THEN COUNT(B.TA_TYPE) * 1
                END AS TOTAL
                FROM TWEET A LEFT JOIN "$TA_TWEET_INDEX" B
                ON A.ID = B.ID
                AND B.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')
                GROUP BY A.MOVIEID, B.TA_TYPE
 ) A
GROUP BY A.MOVIEID
) C
ON A.ID = C.ID
LEFT JOIN
(
SELECT MOVIE_ID AS ID,  SUM(CASE WHEN _LIKE =1 THEN 1 END) AS LIKE, SUM(CASE WHEN _LIKE = -1 THEN 1 END) AS DISLIKE
FROM USER_MOVIE_LIKE
GROUP BY MOVIE_ID
) D
ON A.ID = D.ID
ORDER BY B.SCORE DESC;
END;

Step3: Create XSJS

XSJS provides service APIs for frontend. XSJS will call SQL and procedures to get data from database and return it back to the frontend. Here we also illustrate them in a table below. After that, we also give some key ones among these XSJSs.

XSJS Name

Function

IO

Procedure and SQL

getMovieDetail.xsjs

get movie detail

IN: movie_id

OUT: JSON movie_detail

getMovieDetail,

getSentiment

getMovieList.xsjs

Get movie list

IN: type(0: order by rating, 1: order by voting)

OUT: JSON movie_list

getMovieList_r,

getMovieList_v

getTop3Movies.xsjs

Get top 3 movies order by voting desc

IN: N/A

OUT: JSON movie_list

getTopMovies,

setLikeTag.xsjs

Set user like some movie or not, and return recommended movies

IN: movie_id, user_id, like_tag

OUT: JSON movie_list

upsert user_movie_like,

recommendMovies

(1) getMovieDetail.xsjs

function createEntry(rs) {             
                        return {                             
                                                id : rs.getInteger(1),
                                                title : rs.getString(2),
                                                year : rs.getInteger(3),
                                                genres : rs.getString(4),
                                                mpaa_rating : rs.getString(5),
                                                runtime : rs.getString(6),
                                                release_date : rs.getDate(7),
                                                synopsis : rs.getString(8),
                                                poster : rs.getString(9),
                                                abridged_cast : rs.getString(10),
                                                director : rs.getString(11),
                                                studio : rs.getString(12),
                                                clip1 : rs.getString(13),
                                                clip2 : rs.getString(14),
                                                clip3 : rs.getString(15),
                                                clip4 : rs.getString(16),
                                                like : rs.getInteger(17),
                                                dislike : rs.getInteger(18),
                                                num : rs.getInteger(19),
                                                rating : rs.getDecimal(20)
                        };
}
try {
                        var body = '';
                        var list = [];
                        var id = parseInt($.request.parameters.get("id"), 10);
                        var movie = null;
                        var conn = $.db.getConnection();
                        var query = "CALL SMARTAPP_MOVIE.GETMOVIEDETAIL(?, ?)";   
                        var pcall = conn.prepareCall(query);
                        pcall.setInteger(1, id);
                        pcall.execute();
                        var rs = pcall.getResultSet();
                        if(rs.next()) {
                                                movie = createEntry(rs);
                        }
                        rs.close();
                        pcall.close();
                        query = "CALL SMARTAPP_MOVIE.GETSENTIMENT(?, ?)";
                        pcall = conn.prepareCall(query);
                        pcall.setInteger(1, id);
                        pcall.execute();
                        rs = pcall.getResultSet();
        var strongPositive = 0;
                  var weakPositive = 0;
                  var neutral = 0;
                  var weakNegative = 0;
                  var strongNegative = 0;
                        while(rs.next()) {
                                    var sentiment = rs.getString(1);
                                    if(sentiment === 'Strong Positive') {
                                                                        strongPositive = rs.getInteger(2);
                                                } else if(sentiment === 'Weak Positive') {
                                                                        weakPositive = rs.getInteger(2);
                                                } else if(sentiment === 'Neutral') {
                                                                        neutral = rs.getInteger(2);
                                                } else if(sentiment === 'Weak Negative') {
                                                                        weakNegative = rs.getInteger(2);
                                                } else if(sentiment === 'Strong Negative') {
                                                                        strongNegative = rs.getInteger(2);
                                                }
                        }
                        rs.close();
                        pcall.close();
                        conn.close();                  
                        var num = strongPositive + weakPositive + neutral + weakNegative + strongNegative;
                        movie.weakPositive = (weakPositive*100/num).toFixed(0);      
                        movie.neutral = (neutral*100/num).toFixed(0);
                        movie.weakNegative = (weakNegative*100/num).toFixed(0);
                        movie.strongNegative = (strongNegative*100/num).toFixed(0);
                        movie.strongPositive = 100 - movie.weakPositive - movie.neutral - movie.weakNegative - movie.strongNegative;
                        body = JSON.stringify(movie);
                        $.response.contentType = "application/json;charset=UTF-8";
                        $.response.status = $.net.http.OK;
                        $.response.setBody(body);
} catch(e) {
                        $.response.status = $.net.http.INTERNAL_SERVER_ERROR;
                        $.response.setBody(e.message);
}

(2) getTop3Movies.xsjs

function createEntry(rs) {
                        return {
                                                id: rs.getInteger(1),
                                                title : rs.getNString(2),
                                                synopsis : rs.getNString(3),
                                                poster : rs.getNString(4),
                                                rating: rs.getDecimal(5),
                                                num: rs.getInteger(6),
                                                like : rs.getInteger(7),
                                                dislike : rs.getInteger(8)
                        };
}
try {
                        var body = '';
                        var list = [];
                        var topN = 3;
                        var query = "CALL SMARTAPP_MOVIE.GETTOPMOVIES(?, ?)";
                        var conn = $.db.getConnection();
                        var pcall = conn.prepareCall(query);            
                                pcall.setInteger(1, topN);
                                pcall.execute();
                        var rs = pcall.getResultSet();
                        while (rs.next()) {
                                                list.push(createEntry(rs));
                        }
                        rs.close();
                        pcall.close();
                        conn.close();
                        body = JSON.stringify({
                                                "entries" : list
                        });
                        $.response.contentType = 'application/json; charset=UTF-8';
                        $.response.setBody(body);
                        $.response.status = $.net.http.OK;
} catch (e) {
                        $.response.status = $.net.http.INTERNAL_SERVER_ERROR;
                        $.response.setBody(e.message);
}

Step4: Code the frontend

You can use SAP UI5 or any other UI library (such as jQuery UI, Bootstrap) you like to implement the presentation layer. The following is part of the frontend code using Bootstrap.

<!DOCTYPE html>
<html>
<head>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel="stylesheet" type="text/css" href="bootstrap/css/bootstrap.min.css"/>
<link rel="stylesheet" type="text/css" href="index.css"/>
<base>
<title>New Released Movie Recommender</title>
</head>
<body>
<div class="container containerCtn">
       <div id="header">
 <div class="pull-left"><!-- <a href=""> --><img class="titleImg" src="image/title.png"/><!-- </a> --></div>
 <div class="pull-right twitterLogin">
 <a href="http://10.58.13.133:8080/MovieRecommender/signin" class="logIn">Sign in with Twitter</a>
 <span class="userId"></span>
 <span class="logOut">Sign Out</span>
 </div>
       </div>
       <div id="carouselContainer">
       </div>
       <div id="main">
       </div>
       <div id="movieDetail">
       </div>
</div>
<script type="text/javascript" src="lib/jquery.js"></script>
<script type="text/javascript" src="lib/jsrender.min.js"></script>
<script type="text/javascript" src="bootstrap/js/bootstrap.min.js"></script>
<!-- <script type="text/javascript" src="lib/jquery.ellipsis.min.js"></script> -->
<script type="text/javascript" src="lib/jquery.raty.min.js"></script>
<script type="text/javascript" src="tmpl/index.tmpl.js"></script>
<script type="text/javascript" src="bootstrap/box/bootbox.min.js"></script>
<!--[if IE 7]>
       <script type="text/javascript" src="lib/json2.js"></script>
       <script type="text/javascript" src="lib/jquery.ba-hashchange.min.js"></script>
<![endif]-->
<script type="text/javascript" src="index.js"></script>
</body>
</html>

6. Rotten Tomatoes Crawler, Twitter Crawler, Twitter Auth

(1) Rotten Tomatoes Crawler

We use JTomato to get movie metadata from Rotten Tomatoes API and insert data into SAP HANA.

– Get new released movies: http://developer.rottentomatoes.com/docs/read/json/v10/Opening_Movies

– Get movie info: http://developer.rottentomatoes.com/docs/read/json/v10/Movie_Info

The following is code part:

                    Properties prop = new Properties();
                    prop.load(new FileInputStream(propertyFile));
                    String rottenKey = prop.getProperty(apiKey);
                    JTomato rottenClient = new JTomato(rottenKey);
                    prop.load(new FileInputStream(jdbcPro));
                    String connStr = prop.getProperty(connStrKey);
                    Connection conn = DriverManager.getConnection(connStr);
                    PreparedStatement stmt = conn
                                        .prepareStatement("INSERT INTO MOVIE (ID,TITLE,QUERY,YEAR,GENRES,MPAA_RATING,RUNTIME,RELEASE_DATE,SYNOPSIS,POSTER,"
                                                            + "ABRIDGED_CAST,DIRECTOR,STUDIO,CLIP1,CLIP2,CLIP3,CLIP4) VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)");
                    List<Movie> movies = rottenClient.getOpeningMovies(null, 0);
                    SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
                    for (Movie movie : movies) {
   ......
   }

Similarly, we retrieved the information of 200 critically acclaimed movies last 5 years through querying titles on Rotten Tomatoes. These 200 movies were used as the initial movie library.

public List<Movie> getTopMovies() {
                    List<Movie> result = new ArrayList<Movie>();
                    for(String movietitle : topmovies.topmvs){
                              List<Movie> queryResult = new ArrayList<Movie>();
                              String m = movietitle;
                              int total = searchMovie(movietitle,queryResult,1);
                              boolean flag =false;
                              for(int i=0; i<queryResult.size();i++){
                                        if(queryResult.get(i).title.equals(m)) {
                                                  result.add(queryResult.get(i));
                                                  flag =true;
                                                  break;
                                        }
                              }
                              if(flag ==false && total>0){
                                        result.add(queryResult.get(0));
                              }
                    }
                    return result;
          }

How about using XS outbound connectivity instead of JTomato?

var client = new $.net.http.Client();
var request = new $.net.http.Request($.net.http.GET, "/api/public/v1.0/lists/movies/opening.json?apikey=xxx");
client.request(request, "http://api.rottentomatoes.com", "http://proxy.pal.sap.corp:8080");
var response = client.getResponse();
var body;
if (!response.body) {
          body = "";
} else {
          body = JSON.parse(response.body.asString());
}
$.response.contentType = "application/json";
$.response.setBody(JSON.stringify({
          "status" : response.status,
          "body" : body
}));

It looks much easier!!!


(2) Twitter Crawler

We still use Twitter4J to crawl tweets. We crawl two kinds of tweets: movie’s tweets and user’s tweets. Movie’s tweets are used for sentiment rating of movies. User’s tweets are crawled to detect the movie taste of the user in order to recommend movies.

Code part of crawling movie’s tweets:

Twitter twitter = new TwitterFactory().getInstance();
                    try {
                              Connection conn = DriverManager.getConnection(connStr);
                              Statement stmt = conn.createStatement();
                              ResultSet rs = stmt.executeQuery("SELECT ID,QUERY FROM MOVIE");
                              Map<Integer, Query> queries = new HashMap<Integer, Query>();
                              while (rs.next()) {
                                        Query query = new Query(rs.getNString(2));
                                        query.setLang("en");
                                        query.setCount(100);
                                        query.setResultType(Query.RECENT);
                                        queries.put(rs.getInt(1), query);
                                        logger.info("Movie: " + rs.getInt(1) + " " + rs.getNString(2));
                              }
                              rs.close();
                              stmt.close();
                              PreparedStatement pstmt = conn
                                                  .prepareStatement("INSERT INTO TWEET VALUES (?,?,?,?,?,?,?,?)");
                              Query query;
                              QueryResult result;
                              List<Status> tweets;
                              while (true) {
                                        // System.out.println(new Date());
                                        for (int id : queries.keySet()) {
                                                  query = queries.get(id);
                                                  result = twitter.search(query);
                                                  query.setSinceId(result.getMaxId());
                                                  tweets = result.getTweets();
                                                  logger.info("Tweet: " + query.getQuery() + " "
                                                                      + tweets.size() + " " + query.getSinceId());
                                                  for (Status tweet : tweets) {
   ......
   }

Code part of crawling user’s tweets:

public Map<String, Status> queryUserTweets(Twitter twitter) {
                    Map<String, Status> map = new HashMap<String, Status>();
                    Paging paging;
                    List<Status> tweets = null;
                    int p = 1;
                    int length;
                    while (true) {
                              paging = new Paging(p, 200);
                              try {
                                        tweets = twitter.getUserTimeline(username, paging);
                              } catch (TwitterException e) {
                                        e.printStackTrace();
                              }
                              length = tweets.size();
                              if (length == 0) {
                                        break;
                              }
                              System.out.println("length: " + length);
                              String content;
                              String q;
                              for (Status tweet : tweets) {
                                        content = tweet.getText();
                                        q = getQuery(content);
                                        if (!q.equals("")) {
                                                  map.put(q, tweet);
                                        }
                              }
                              p++;
                    }
                    return map;
          }

(3) Twitter Auth

One major functionality of the app is to recommend movies to users according to their personal tastes. Therefore, it is essential to let the user sign in/out Twitter account. You can find the detail process of implementing sign in with Twitter via https://dev.twitter.com/docs/auth/implementing-sign-twitter.

The application has to obtain a request token with consumer token so that generate an authentication URL for authentication.

        TwitterFactory tf = new TwitterFactory(cb.build());
        Twitter twitter = tf.getInstance();
        request.getSession().setAttribute("twitter", twitter);
        try {
            RequestToken requestToken = twitter.getOAuthRequestToken(callbackurl);
            request.getSession().setAttribute("requestToken", requestToken);
                  response.sendRedirect(requestToken.getAuthenticationURL());
        } catch (TwitterException e) {
            throw new ServletException(e);
        }

Then, redirect to this URL for authentication.

twitter redirect.PNG

After authentication, Twitter will return to the callback URL with a valid OAuth request token. The redirect to twitter.com is not obvious to the user. Upon a successful authentication, the callback_url would receive a request containing the oauth_token and oauth_verifier parameters. The application should verify that the token matches the request token received in first step. Then, backend can get some information of the user’s account, such as username, which will be shown on web page after sign in.

String verifier = (String) request.getParameter("oauth_verifier");
AccessToken at = twitter.getOAuthAccessToken(requestToken,verifier);
request.getSession().setAttribute("accessToken", at);
screenName = at.getScreenName();

For sign out, the server will invalidate the session. However, twitter.com creates cookies on the browser when sign in so the Twitter account is still in sign in status. You have to sign out your account completely from twitter.com.

7. Website, video, screenshots

The live web app is available at http://10.58.13.133:8000/smartapp_movie/ui4/WebContent/. Since it is hosted on a virtual machine of my team instead of on SAP HANA One, the link is currently only accessible via SAP network. I have captured a brief video and some screenshots if you cannot access it.


Some screenshots step by step:


(1) Homepage of the app, new released movies are sorted by mention count.

_#movienight sort by mentions.png

(2) Click “Rating”, movies are sorted by rating.

_#movienight sort by rating.png

(3) Click “Rush” poster to see the details.

#movienight rush.png

(4) Press “Sign in with Twitter” to sign in.

Twitter - Authorize an application.png

(5) Recommend movies for you!

#movienight man of steel.PNG

(6) Add a friend to see common movies in this weekend!!!

#movienight add a friend.png

8. Resources

(1) SAP HANA One & SAP HANA Dev Edition

If you are interested in building your own awesome apps for production or commercial use, you can spin up SAP HANA One.You can find SAP HANA One on AWS Marketplace

If you are interested in just trying out SAP HANA, go with the SAP HANA developer edition. You can follow the steps in Get your own SAP HANA, developer edition on Amazon Web Services

(2) Previous blog – Real-time sentiment rating of movies on SAP HANA One

http://scn.sap.com/community/developer-center/hana/blog/2013/06/19/real-time-sentiment-rating-of-movies-on-sap-hana-one

(3) Webinar on BrightTALK: Big Data – Sentiment analysis of movies on SAP HANA One

https://www.brighttalk.com/webcast/9727/82203

What’s next

This application is just a prototype now and we hope to build a sentiment rating engine & recommendation engine in the future.

Hope you enjoyed reading my blog. 🙂

To report this post you need to login first.

5 Comments

You must be Logged on to comment or reply to a post.

  1. Ken Tsai

    Awesome blog Wenjun. Thanks for sharing your insights on how to build a consumer-facing smart app using machine learning and text analytics in HANA. I am sure many iPhone or Android app developers can embed this capabilities within their app.

    We should offer the product recommender system as a function services within HANA platform so 3rd party developers can easily re-use

    Many 3rd party developers are using open source options to build out the machine-learning and semantic processing capabilities. I would like to hear from a developer perspective where you think the value-add would be to leverage the HANA-based approach instead.

    Thanks for getting this project off the ground and sharing your insights with the community.

    (0) 

Leave a Reply