Skip to Content

This blog is added into SAP HANA Cookbook for MySQL Developers | SAP HANA

I am an intern visiting Palo Alto from SAP’s Shanghai office for a month-long project. It’s my first trip to the bay area so I am soaking up all the sun and all the excitement here. Last weekend, I found myself wanting to watch a movie. I searched the internet and found all the new releases listed on rottentomatoes and imdb but it was hard to pick one. I wanted to get a pulse of the movie before I watch it not from the critics but actual movie goers like me. Also, I wanted one which had high buzz not only in US but also in China. So I decided, why don’t I build one myself, after all I am in the heart of Silicon Valley.

I decided to pick SAP HANA One to power my app not just because I got the db & application server in the cloud but also because the platform would support sentiment analysis for English & Simplified Chinese right out-of-the-box! I used the Rotten Tomatoes API to find newly released movies and twitter & Sina Weibo APIs for sentiment for US & China respectively.

Prerequisites

Before we start to build the application, we need to get SAP HANA.

If you are interested in using this multilingual sentiment analysis for production or commercial use, you can spin up SAP HANA One.You can find SAP HANA One on AWS Marketplace

If you are interested in just trying out SAP HANA, go with the SAP HANA developer edition. You can follow the steps in How to create a SAP HANA Developer Edition in the cloud

I used SPS6 prerelease version for sentiment analysis in Chinese.

Schema

I did most of my work in the HANA Studio which is based on the eclipse IDE so very familiar for Java and other open-source developers.

/wp-content/uploads/2013/06/1_233206.jpg

First, I created a schema and full text index for all the movie metadata, including title, rating, running time, release data, synopsis, etc. Then I used the JTomato (https://github.com/geeordanoh/JTomato) to populate the table. You can find my code via https://github.com/linkinchow/crawler.

MOVIE: Stores movie metadata, including the title, rating, runtime, release date, etc.

/wp-content/uploads/2013/06/2_233252.jpg

Then I used Twitter4J (http://twitter4j.org) to search the movie keywords on Twitter. I found that twitter, given just the keyword, did a good job pulling all combinations of the movie name: fast and furious, fast & furious. You can find my code via https://github.com/linkinchow/crawler.

TWEET: Stores crawled tweets from Twitter, including ID, time, location, content, etc.

/wp-content/uploads/2013/06/3_233253.jpg

However, I ran into problems while crawling Sina Weibo because they have a strict process for usage of their API. So I decided to use Tencent Weibo instead.

TWEET_ZH: Stores crawled tweets from Tencent Weibo

/wp-content/uploads/2013/06/4_233254.jpg

Next I created a fulltext index and sentiment tables (called VoiceOfCustomer) using the following SQL. Voila! I now have sentiment analysis for all twitter and tencent weibo data!

CREATE FULLTEXT INDEX TWEET_I ON TWEET (CONTENT) CONFIGURATION ‘EXTRACTION_CORE_VOICEOFCUSTOMER’ ASYNC FLUSH EVERY 1 MINUTES LANGUAGE DETECTION (‘EN’) TEXT ANALYSIS ON;


CREATE FULLTEXT INDEX TWEET_ZH_I ON TWEET_ZH (CONTENT) CONFIGURATION ‘EXTRACTION_CORE_VOICEOFCUSTOMER’ ASYNC FLUSH EVERY 1 MINUTES LANGUAGE DETECTION (‘ZH’) TEXT ANALYSIS ON;

TWEET_I: Used to perform sentiment analysis for the table TWEET

/wp-content/uploads/2013/06/5_233255.jpg

TWEET_ZH_I: Used to perform sentiment analysis for the table TWEET_ZH

/wp-content/uploads/2013/06/6_233256.jpg

In addition to the tables in SAP HANA and the full text index to perform sentiment analysis, I also wrote stored procedures to wrap complex SQL making it easy for XS (HANA’s application server) to consume.

Architecture

The final architecture looks like this:

/wp-content/uploads/2013/06/7_233257.jpg

Rating


Now, I had to create a formula to quantify rating. I used a very simple formula for this:

Score = (# of strong positive sentiment * 5 + # of weak positive sentiment * 4 + # of neutral sentiment * 3 + # of weak negative sentiment * 2 + # of strong negative sentiment *1) / # of total sentiments

This score would be helpful to rank movies so I could easily pick the top one.

Additionally, I showed a distribution of the sentiments, positive vs. negative vs. neutral, so I could better understand how strong or weak people’s opinion was on the movie both in US & in China.

XS Application

The application should be built on XS Engine to prevent data transfer latency between the database and the web application server so users can access the website directly. The application was built in the following steps:

Step 1: Create stored procedures for rating and sentiment analysis

Currently, there are two stored procedures in the app. One is for rating and the other is for sentiment analysis:

1. Rating

We can use the following SQLs to create the type and the stored procedure:

CREATE TYPE MOVIEINFO AS TABLE (

      POSTER NVARCHAR(100),

      TITLE NVARCHAR(100),

      RATING DECIMAL(5, 2),

      NUM INTEGER,

      TITLE_ZH NVARCHAR(100),

      RATING_ZH DECIMAL(5, 2),

      NUM_ZH INTEGER,

      YEAR INTEGER,

      MPAA_RATING NVARCHAR(100),

      RUNTIME NVARCHAR(100),

      CRITICS_CONSENSUS NVARCHAR(2000),

      RELEASE_DATE DATE,

      SYNOPSIS NVARCHAR(2000),

      ID INTEGER

);

CREATE PROCEDURE GETMOVIEINFO(OUT RESULT MOVIEINFO) LANGUAGE SQLSCRIPT READS SQL DATA AS

BEGIN

RESULT =

SELECT A.POSTER, A.TITLE, B.RATING, B.NUM, A.TITLE_ZH, C.RATING_ZH, C.NUM_ZH, A.YEAR, A.MPAA_RATING, A.RUNTIME, A.CRITICS_CONSENSUS, A.RELEASE_DATE, A.SYNOPSIS, A.ID

FROM MOVIE A

INNER JOIN

(SELECT ID, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) END AS RATING, SUM(NUM) AS NUM FROM

(SELECT

  1. A.ID,
  2. C.TA_TYPE,

COUNT(C.TA_TYPE) AS NUM,

CASE C.TA_TYPE

      WHEN ‘StrongPositiveSentiment’ THEN COUNT(C.TA_TYPE) * 5

      WHEN ‘WeakPositiveSentiment’ THEN COUNT(C.TA_TYPE) * 4

      WHEN ‘NeutralSentiment’ THEN COUNT(C.TA_TYPE) * 3

      WHEN ‘WeakNegativeSentiment’ THEN COUNT(C.TA_TYPE) * 2

      WHEN ‘StrongNegativeSentiment’ THEN COUNT(C.TA_TYPE) * 1

END AS TOTAL

FROM MOVIE A

LEFT JOIN TWEET B

ON A.ID = B.MOVIEID

LEFT JOIN “$TA_TWEET_I” C

ON B.ID = C.ID AND C.TA_TYPE IN (‘StrongPositiveSentiment’, ‘WeakPositiveSentiment’, ‘NeutralSentiment’, ‘WeakNegativeSentiment’, ‘StrongNegativeSentiment’)

GROUP BY

  1. A.ID,
  2. C.TA_TYPE) A

GROUP BY ID) B ON A.ID = B.ID

INNER JOIN

(SELECT ID, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) END AS RATING_ZH, SUM(NUM) AS NUM_ZH FROM

(SELECT

  1. A.ID,
  2. C.TA_TYPE,

COUNT(C.TA_TYPE) AS NUM,

CASE C.TA_TYPE

      WHEN ‘StrongPositiveSentiment’ THEN COUNT(C.TA_TYPE) * 5

      WHEN ‘WeakPositiveSentiment’ THEN COUNT(C.TA_TYPE) * 4

      WHEN ‘NeutralSentiment’ THEN COUNT(C.TA_TYPE) * 3

      WHEN ‘WeakNegativeSentiment’ THEN COUNT(C.TA_TYPE) * 2

      WHEN ‘StrongNegativeSentiment’ THEN COUNT(C.TA_TYPE) * 1

END AS TOTAL

FROM MOVIE A

LEFT JOIN TWEET_ZH B

ON A.ID = B.MOVIEID

LEFT JOIN “$TA_TWEET_ZH_I” C

ON B.ID = C.ID AND C.TA_TYPE IN (‘StrongPositiveSentiment’, ‘WeakPositiveSentiment’, ‘NeutralSentiment’, ‘WeakNegativeSentiment’, ‘StrongNegativeSentiment’)

GROUP BY

  1. A.ID,
  2. C.TA_TYPE) A

GROUP BY ID) C ON A.ID = C.ID

ORDER BY B.RATING DESC

;

END;


After creating the type and the stored procedure successfully, we can use the following SQL to test:

CALL GETMOVIEINFO(?);

/wp-content/uploads/2013/06/8_233261.jpg

From the column “RATING” and “RATING_ZH”, we can show the score on the main page.

2. Sentiment analysis

We can use the following SQLs to create the type and the stored procedure:

CREATE TYPE SENTIMENT AS TABLE (SENTIMENT NVARCHAR(100), NUM INTEGER);

CREATE PROCEDURE GETSENTIMENT(IN ID INTEGER, IN LANG VARCHAR(2), OUT RESULT SENTIMENT) LANGUAGE SQLSCRIPT READS SQL DATA AS

BEGIN

      IF LANG = ‘EN’ THEN

      RESULT = SELECT ‘Strong Positive’ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_I” A

            INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = ‘StrongPositiveSentiment’

            UNION ALL

            SELECT ‘Weak Positive’ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_I” A

            INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = ‘WeakPositiveSentiment’

            UNION ALL

            SELECT ‘Neutral’ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_I” A

            INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = ‘NeutralSentiment’

            UNION ALL

            SELECT ‘Weak Negative’ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_I” A

            INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = ‘WeakNegativeSentiment’

            UNION ALL

            SELECT ‘Strong Negative’ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_I” A

            INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = ‘StrongNegativeSentiment’;

      ELSEIF LANG = ‘ZH’ THEN

      RESULT = SELECT 很好 AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_ZH_I” A

            INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = ‘StrongPositiveSentiment’

            UNION ALL

            SELECT AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_ZH_I” A

            INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = ‘WeakPositiveSentiment’

            UNION ALL

            SELECT 一般 AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_ZH_I” A

            INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = ‘NeutralSentiment’

            UNION ALL

            SELECT AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_ZH_I” A

            INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = ‘WeakNegativeSentiment’

            UNION ALL

            SELECT 很差 AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_ZH_I” A

            INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = ‘StrongNegativeSentiment’;

      END IF;

END;


After creating the type and the stored procedure successfully, we can use the following SQLs to test:

CALL GETSENTIMENT(771313125, ‘EN’, ?);

/wp-content/uploads/2013/06/9_233262.jpg

CALL GETSENTIMENT(771313125, ‘ZH’, ?);

/wp-content/uploads/2013/06/10_233263.jpg

Step 2: Build the application based on XS Engine

Till now, we can access the tables, indexes, data and stored procedures directly from the XS Engine. To build the application, follow the following steps:

1. Create .xsaccess, .xsapp and .xsprivileges to do the access control.


2. Create getMovies.xsjs to call the stored procedure “GETMOVIEINFO

function createEntry(rs) {

      return {

            “poster” : rs.getNString(1),

            “title” : rs.getNString(2),

            “rating”: rs.getDecimal(3),

            “num”: rs.getInteger(4),

            “title_zh” : rs.getNString(5),

            “rating_zh”: rs.getDecimal(6),

            “num_zh”: rs.getInteger(7),

            “year”: rs.getInteger(8),

            “mpaa_rating”: rs.getNString(9),

            “runtime”: rs.getNString(10),

            “critics_consensus”: rs.getNString(11),

            “release_date”: rs.getDate(12),

            “synopsis”: rs.getNString(13),

            “id”: rs.getInteger(14)

      };

}

try {

      var body = ;

      var list = [];

      var query = “{CALL SMARTAPP.GETMOVIEINFO(?)}”;

      $.trace.debug(query);

      var conn = $.db.getConnection();

      var pcall = conn.prepareCall(query);

      pcall.execute();

      var rs = pcall.getResultSet();

      while (rs.next()) {

            list.push(createEntry(rs));

      }

      rs.close();

      pcall.close();

      conn.close();

      body = JSON.stringify({

            “entries” : list

      });

      $.response.contentType = ‘application/json; charset=UTF-8’;

      $.response.setBody(body);

      $.response.status = $.net.http.OK;

} catch (e) {

      $.response.status = $.net.http.INTERNAL_SERVER_ERROR;

      $.response.setBody(e.message);

}

3. Create getSentiment.xsjs to call the stored procedure “GETSENTIMENT”

function createEntry(rs) {

      return {

            “sentiment” : rs.getString(1),

            “num” : rs.getInteger(2)

      };

}

try {

      var id = parseInt($.request.parameters.get(“id”));

      var lang = $.request.parameters.get(“lang”);

      var body = ;

      var list = [];

      var query = “{CALL SMARTAPP.GETSENTIMENT(?, ?, ?)}”;

      $.trace.debug(query);

      var conn = $.db.getConnection();

      var pcall = conn.prepareCall(query);

      pcall.setInteger(1, id);

      pcall.setString(2, lang);

      pcall.execute();

      var rs = pcall.getResultSet();

      while (rs.next()) {

            list.push(createEntry(rs));

      }

      rs.close();

      pcall.close();

      conn.close();

      body = JSON.stringify({

            “entries” : list

      });

      $.response.contentType = ‘application/json; charset=UTF-8’;

      $.response.setBody(body);

      $.response.status = $.net.http.OK;

} catch (e) {

      $.response.status = $.net.http.INTERNAL_SERVER_ERROR;

      $.response.setBody(e.message);

}

4. Create index.html and code the HTML part.

<!DOCTYPE HTML>

<html>

      <head>

            <meta http-equiv=“X-UA-Compatible” content=“IE=edge”>

            <title>Real-Time Movie Rating</title>

            <script src=“/sap/ui5/1/resources/sap-ui-core.js”

                        id=“sap-ui-bootstrap”

                        data-sap-ui-libs=“sap.ui.commons,sap.ui.ux3,sap.viz”

                        data-sap-ui-theme=“sap_goldreflection”>

            </script>

            <!– add sap.ui.table,sap.ui.ux3 and/or other libraries to ‘data-sap-uilibs‘ if required –>

            <script>

                        sap.ui.localResources(“movieui”);

                        var view = sap.ui.view({id:“idMovieMatrix1”, viewName:“movieui.MovieMatrix”, type:sap.ui.core.mvc.ViewType.JS});

                        view.placeAt(“content”);

            </script>

      </head>

      <body class=“sapUiBody” role=“application”>

            <h1>Real-Time Movie Rating</h1>

            <div id=“content”></div>

      </body>

</html>

5. Create some views and controllers to use native SAP UI 5 to accelerate building the application.

Website

The live webapp is available at (http://107.20.137.184:8000/workshop/sessionx/00/ui/MovieUI/WebContent/index.html) but I bring down the AWS instance to reduce the billing cost. I have captured screenshots and a brief video if you find the server is down.

Real-time movie rating homepage

The following screenshot is the app’s main page. For each movie, there are two scores: the upper score is from Twitter and the lower score from Tencent Weibo.

/wp-content/uploads/2013/06/11_233264.jpg

I heard a lot of buzz about “Man of Steel” but it is currently ranked No. 7 so I was really curious. “Man of Steel” had a 3.72 rating but “20 Feet from Stardom” had a 4.54 rating. Interesting! Looking closer I discovered that this was because “20 Feet” had only 351 mentions but “Man of Steel” had more than 20K, meaning that a popular movie may not necessarily be the one with the highest score but could also be one which has the most buzz.     

I then created a page with detailed breakdown of the sentiments of the Movie’s sentiments for both Twitter and Tencent Weibo. Looks like “Man of Steel” has a higher positive sentiment in China compared to the US. Well not surprising, we like superhero movies and Superman is our favorite.

Sentiment/Social Media

Twitter

Tencent Weibo

#

%age

#

%age

Strong Positive

9,986

44%

528

34%

Weak Positive

5,903

26%

723

47%

Neutral

839

4%

12

1%

Weak Negative

2,067

9%

123

8%

Strong Negative

3,757

17%

166

11%

/wp-content/uploads/2013/06/12_233266.jpg

Let’s see what the score on Rotten Tomatoes looks like. The critics have given it a meager 56% but 82% of the audience liked it. That number is compares well with 70% positive sentiment rating from my real-time rating app.

/wp-content/uploads/2013/06/13_233267.jpg

“20 Feet from stardom” has 97% rating from critics and 100% from the audience on rottentomatoes. So my real-time rating app was able to successfully identify this hit from social sentiment on twitter. Looks like the movie is a sleeper hit!

/wp-content/uploads/2013/06/15_233272.jpg

/wp-content/uploads/2013/06/14_233271.jpg

This application is just a prototype now and I hope to make more enhancements to the drill-down page. For the next version, I want to use the predictive libraries in SAP HANA to create a recommendation engine for movie based on a user’s interests, something like a “Pandora for movies”. Hope you enjoyed reading my blog.

What’s next

We get requests to publish the potential multilingual sentiment analysis core service used in this movie rating demo as an API to support similar requests from other industries. SAP HANA SPS6 supports insert/update data through oData service. So, we want to provide a set of RESTful web service calls to help application developers to access this feature easily and fast based on SPS6. With this API, developer just needs to provide the search keyword to query the Twitter data. And the output will be the sentiment analysis result (positive/negative summary or sentiment trend) based on the automatically loaded Twitter data.

Do you think this API as a Service is useful? Please share your feedback in the comment!

To report this post you need to login first.

29 Comments

You must be Logged on to comment or reply to a post.

  1. Richard Hirsch

    Great blog.

     

    Liked all the code samples as well as the demonstration of multi-language support for sentiment analysis.

     

    It would cool to put like this on github (if possible), because the community could really benefit on your design and your crawlers.

     

    D.

    (0) 
  2. Henrique Pinto

    Hello Wenjun,

     

    this is really interesting, but could I make an additional request?


    There are several use cases for sentiment analysis where the initial modeling starts data from Twitter/Facebook. Would it be possible for you to share more details on your Twitter crawler app? I’d be really interested in knowing that (what did you use? Like a python script or something etc.). It’d be even better if you could share the crawler code.

     

    Cheers,

    Henrique.

    (0) 
  3. Miller Hu

    Wonderful work, Wenjun!

    A good example for us to study NLP features and native application development on SAP HANA.

     

    Best Regards,

    Miller

    (0) 
  4. Arun Devan

    Hi Wenjun,

     

    This post is brilliant!

     

    Your app is probably one of the best examples I have seen of integrated Design Thinking with SAP HANA. It is an app that the general public can relate to. Plus you have given us a detailed insight into using the available tools to realise your app. Thank you.

     

    Best Regards,

    Arun Devan

     

    PS: Let me know if you would like to spend some time in Sngapore with our team…

    (0) 
  5. Prashanth K S

    Wenjun, this is real awesome and inspirational stuff.

    Thanks for the exhaustive write-up and sharing the things that you did.

     

    Looking for more write-ups/blogs from you

     

    Cheers,

    Prashanth

    (0) 
  6. Tom Cenens

    Hi Wenjun

     

    It’s a very nice HANA app example you have build there. Thanks for sharing with the community at large.

     

    Enjoy your time in Palo Alto, it’s a marvelous place .

     

    Best regards

     

    Tom

    (0) 
  7. Jason Scott

    Excellent blog Wenjun, would be even more excellent if you could ad another blog about your use of twitter4j when you have time…

    😉

    (0) 
  8. Rajiv Nema

    Hi Wenjun:

     

    This is really cool use case – something that everyone can use in their everyday (or at least in their weekend) life. I am like you – who trusts “friends” opinion more than the critics before watching a movie.

     

    Thanks for doing this and taking time to document the entire process. Looking forward to the next rev.

     

    -Rajiv

    (0) 
  9. Venu Cherupillil

    Hello Wenjun,

     

    This is one of the excellent examples of Text Analysis capability of SAP HANA.  Thanks a lot for your explanation and SQLscript, JavaScript and HTML5 code samples!

     

    Regards,

    Venu

    (0) 
  10. Annie Chan

    Hi Wenjun,

     

    You have picked a topic that everyone could relate to which is brilliant. Very interesting to read the potential on how SAP HANA can apply on ‘soft’ subject. Great work!

     

    Regards,

    Annie

    (0) 
  11. Mahalakshmi Krishnan

    hi Wenjun,

         i felt bored and searching something and then i typed sap in movies then only i found your forum sap hana sentiment rating for movies really very nice job yaar  very interesting to read nice .

     

    With thanks,

    Mahalakshmi.

    (0) 

Leave a Reply