This blog is added into SAP HANA Cookbook for MySQL Developers | SAP HANA
I am an intern visiting Palo Alto from SAP’s Shanghai office for a month-long project. It’s my first trip to the bay area so I am soaking up all the sun and all the excitement here. Last weekend, I found myself wanting to watch a movie. I searched the internet and found all the new releases listed on rottentomatoes and imdb but it was hard to pick one. I wanted to get a pulse of the movie before I watch it not from the critics but actual movie goers like me. Also, I wanted one which had high buzz not only in US but also in China. So I decided, why don’t I build one myself, after all I am in the heart of Silicon Valley.
I decided to pick SAP HANA One to power my app not just because I got the db & application server in the cloud but also because the platform would support sentiment analysis for English & Simplified Chinese right out-of-the-box! I used the Rotten Tomatoes API to find newly released movies and twitter & Sina Weibo APIs for sentiment for US & China respectively.
Prerequisites
Before we start to build the application, we need to get SAP HANA.
If you are interested in using this multilingual sentiment analysis for production or commercial use, you can spin up SAP HANA One.You can find SAP HANA One on AWS Marketplace
If you are interested in just trying out SAP HANA, go with the SAP HANA developer edition. You can follow the steps in How to create a SAP HANA Developer Edition in the cloud
I used SPS6 prerelease version for sentiment analysis in Chinese.
Schema
I did most of my work in the HANA Studio which is based on the eclipse IDE so very familiar for Java and other open-source developers.
First, I created a schema and full text index for all the movie metadata, including title, rating, running time, release data, synopsis, etc. Then I used the JTomato (https://github.com/geeordanoh/JTomato) to populate the table. You can find my code via https://github.com/linkinchow/crawler.
MOVIE: Stores movie metadata, including the title, rating, runtime, release date, etc.
Then I used Twitter4J (http://twitter4j.org) to search the movie keywords on Twitter. I found that twitter, given just the keyword, did a good job pulling all combinations of the movie name: fast and furious, fast & furious. You can find my code via https://github.com/linkinchow/crawler.
TWEET: Stores crawled tweets from Twitter, including ID, time, location, content, etc.
However, I ran into problems while crawling Sina Weibo because they have a strict process for usage of their API. So I decided to use Tencent Weibo instead.
TWEET_ZH: Stores crawled tweets from Tencent Weibo
Next I created a fulltext index and sentiment tables (called VoiceOfCustomer) using the following SQL. Voila! I now have sentiment analysis for all twitter and tencent weibo data!
CREATE FULLTEXT INDEX TWEET_I ON TWEET (CONTENT) CONFIGURATION ‘EXTRACTION_CORE_VOICEOFCUSTOMER’ ASYNC FLUSH EVERY 1 MINUTES LANGUAGE DETECTION (‘EN’) TEXT ANALYSIS ON;
CREATE FULLTEXT INDEX TWEET_ZH_I ON TWEET_ZH (CONTENT) CONFIGURATION ‘EXTRACTION_CORE_VOICEOFCUSTOMER’ ASYNC FLUSH EVERY 1 MINUTES LANGUAGE DETECTION (‘ZH’) TEXT ANALYSIS ON;
TWEET_I: Used to perform sentiment analysis for the table TWEET
TWEET_ZH_I: Used to perform sentiment analysis for the table TWEET_ZH
In addition to the tables in SAP HANA and the full text index to perform sentiment analysis, I also wrote stored procedures to wrap complex SQL making it easy for XS (HANA’s application server) to consume.
Architecture
The final architecture looks like this:
Rating
Now, I had to create a formula to quantify rating. I used a very simple formula for this:
Score = (# of strong positive sentiment * 5 + # of weak positive sentiment * 4 + # of neutral sentiment * 3 + # of weak negative sentiment * 2 + # of strong negative sentiment *1) / # of total sentiments
This score would be helpful to rank movies so I could easily pick the top one.
Additionally, I showed a distribution of the sentiments, positive vs. negative vs. neutral, so I could better understand how strong or weak people’s opinion was on the movie both in US & in China.
XS Application
The application should be built on XS Engine to prevent data transfer latency between the database and the web application server so users can access the website directly. The application was built in the following steps:
Step 1: Create stored procedures for rating and sentiment analysis
Currently, there are two stored procedures in the app. One is for rating and the other is for sentiment analysis:
1. Rating
We can use the following SQLs to create the type and the stored procedure:
CREATE TYPE MOVIEINFO AS TABLE (
POSTER NVARCHAR(100),
TITLE NVARCHAR(100),
RATING DECIMAL(5, 2),
NUM INTEGER,
TITLE_ZH NVARCHAR(100),
RATING_ZH DECIMAL(5, 2),
NUM_ZH INTEGER,
YEAR INTEGER,
MPAA_RATING NVARCHAR(100),
RUNTIME NVARCHAR(100),
CRITICS_CONSENSUS NVARCHAR(2000),
RELEASE_DATE DATE,
SYNOPSIS NVARCHAR(2000),
ID INTEGER
);
CREATE PROCEDURE GETMOVIEINFO(OUT RESULT MOVIEINFO) LANGUAGE SQLSCRIPT READS SQL DATA AS
BEGIN
RESULT =
SELECT A.POSTER, A.TITLE, B.RATING, B.NUM, A.TITLE_ZH, C.RATING_ZH, C.NUM_ZH, A.YEAR, A.MPAA_RATING, A.RUNTIME, A.CRITICS_CONSENSUS, A.RELEASE_DATE, A.SYNOPSIS, A.ID
FROM MOVIE A
INNER JOIN
(SELECT ID, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) END AS RATING, SUM(NUM) AS NUM FROM
(SELECT
- A.ID,
- C.TA_TYPE,
COUNT(C.TA_TYPE) AS NUM,
CASE C.TA_TYPE
WHEN ‘StrongPositiveSentiment’ THEN COUNT(C.TA_TYPE) * 5
WHEN ‘WeakPositiveSentiment’ THEN COUNT(C.TA_TYPE) * 4
WHEN ‘NeutralSentiment’ THEN COUNT(C.TA_TYPE) * 3
WHEN ‘WeakNegativeSentiment’ THEN COUNT(C.TA_TYPE) * 2
WHEN ‘StrongNegativeSentiment’ THEN COUNT(C.TA_TYPE) * 1
END AS TOTAL
FROM MOVIE A
LEFT JOIN TWEET B
ON A.ID = B.MOVIEID
LEFT JOIN “$TA_TWEET_I” C
ON B.ID = C.ID AND C.TA_TYPE IN (‘StrongPositiveSentiment’, ‘WeakPositiveSentiment’, ‘NeutralSentiment’, ‘WeakNegativeSentiment’, ‘StrongNegativeSentiment’)
GROUP BY
- A.ID,
- C.TA_TYPE) A
GROUP BY ID) B ON A.ID = B.ID
INNER JOIN
(SELECT ID, CASE SUM(NUM) WHEN 0 THEN 0 ELSE TO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) END AS RATING_ZH, SUM(NUM) AS NUM_ZH FROM
(SELECT
- A.ID,
- C.TA_TYPE,
COUNT(C.TA_TYPE) AS NUM,
CASE C.TA_TYPE
WHEN ‘StrongPositiveSentiment’ THEN COUNT(C.TA_TYPE) * 5
WHEN ‘WeakPositiveSentiment’ THEN COUNT(C.TA_TYPE) * 4
WHEN ‘NeutralSentiment’ THEN COUNT(C.TA_TYPE) * 3
WHEN ‘WeakNegativeSentiment’ THEN COUNT(C.TA_TYPE) * 2
WHEN ‘StrongNegativeSentiment’ THEN COUNT(C.TA_TYPE) * 1
END AS TOTAL
FROM MOVIE A
LEFT JOIN TWEET_ZH B
ON A.ID = B.MOVIEID
LEFT JOIN “$TA_TWEET_ZH_I” C
ON B.ID = C.ID AND C.TA_TYPE IN (‘StrongPositiveSentiment’, ‘WeakPositiveSentiment’, ‘NeutralSentiment’, ‘WeakNegativeSentiment’, ‘StrongNegativeSentiment’)
GROUP BY
- A.ID,
- C.TA_TYPE) A
GROUP BY ID) C ON A.ID = C.ID
ORDER BY B.RATING DESC
;
END;
After creating the type and the stored procedure successfully, we can use the following SQL to test:
CALL GETMOVIEINFO(?);
From the column “RATING” and “RATING_ZH”, we can show the score on the main page.
2. Sentiment analysis
We can use the following SQLs to create the type and the stored procedure:
CREATE TYPE SENTIMENT AS TABLE (SENTIMENT NVARCHAR(100), NUM INTEGER);
CREATE PROCEDURE GETSENTIMENT(IN ID INTEGER, IN LANG VARCHAR(2), OUT RESULT SENTIMENT) LANGUAGE SQLSCRIPT READS SQL DATA AS
BEGIN
IF LANG = ‘EN’ THEN
RESULT = SELECT ‘Strong Positive’ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_I” A
INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = ‘StrongPositiveSentiment’
UNION ALL
SELECT ‘Weak Positive’ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_I” A
INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = ‘WeakPositiveSentiment’
UNION ALL
SELECT ‘Neutral’ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_I” A
INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = ‘NeutralSentiment’
UNION ALL
SELECT ‘Weak Negative’ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_I” A
INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = ‘WeakNegativeSentiment’
UNION ALL
SELECT ‘Strong Negative’ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_I” A
INNER JOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = ‘StrongNegativeSentiment’;
ELSEIF LANG = ‘ZH’ THEN
RESULT = SELECT ‘很好‘ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_ZH_I” A
INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = ‘StrongPositiveSentiment’
UNION ALL
SELECT ‘好‘ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_ZH_I” A
INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = ‘WeakPositiveSentiment’
UNION ALL
SELECT ‘一般‘ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_ZH_I” A
INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = ‘NeutralSentiment’
UNION ALL
SELECT ‘å·®‘ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_ZH_I” A
INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = ‘WeakNegativeSentiment’
UNION ALL
SELECT ‘很差‘ AS SENTIMENT, COUNT(*) AS NUM FROM “$TA_TWEET_ZH_I” A
INNER JOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = ‘StrongNegativeSentiment’;
END IF;
END;
After creating the type and the stored procedure successfully, we can use the following SQLs to test:
CALL GETSENTIMENT(771313125, ‘EN’, ?);
CALL GETSENTIMENT(771313125, ‘ZH’, ?);
Step 2: Build the application based on XS Engine
Till now, we can access the tables, indexes, data and stored procedures directly from the XS Engine. To build the application, follow the following steps:
1. Create .xsaccess, .xsapp and .xsprivileges to do the access control.
2. Create getMovies.xsjs to call the stored procedure “GETMOVIEINFO”
function createEntry(rs) {
return {
“poster” : rs.getNString(1),
“title” : rs.getNString(2),
“rating”: rs.getDecimal(3),
“num”: rs.getInteger(4),
“title_zh” : rs.getNString(5),
“rating_zh”: rs.getDecimal(6),
“num_zh”: rs.getInteger(7),
“year”: rs.getInteger(8),
“mpaa_rating”: rs.getNString(9),
“runtime”: rs.getNString(10),
“critics_consensus”: rs.getNString(11),
“release_date”: rs.getDate(12),
“synopsis”: rs.getNString(13),
“id”: rs.getInteger(14)
};
}
try {
var body = ”;
var list = [];
var query = “{CALL SMARTAPP.GETMOVIEINFO(?)}”;
$.trace.debug(query);
var conn = $.db.getConnection();
var pcall = conn.prepareCall(query);
pcall.execute();
var rs = pcall.getResultSet();
while (rs.next()) {
list.push(createEntry(rs));
}
rs.close();
pcall.close();
conn.close();
body = JSON.stringify({
“entries” : list
});
$.response.contentType = ‘application/json; charset=UTF-8’;
$.response.setBody(body);
$.response.status = $.net.http.OK;
} catch (e) {
$.response.status = $.net.http.INTERNAL_SERVER_ERROR;
$.response.setBody(e.message);
}
3. Create getSentiment.xsjs to call the stored procedure “GETSENTIMENT”
function createEntry(rs) {
return {
“sentiment” : rs.getString(1),
“num” : rs.getInteger(2)
};
}
try {
var id = parseInt($.request.parameters.get(“id”));
var lang = $.request.parameters.get(“lang”);
var body = ”;
var list = [];
var query = “{CALL SMARTAPP.GETSENTIMENT(?, ?, ?)}”;
$.trace.debug(query);
var conn = $.db.getConnection();
var pcall = conn.prepareCall(query);
pcall.setInteger(1, id);
pcall.setString(2, lang);
pcall.execute();
var rs = pcall.getResultSet();
while (rs.next()) {
list.push(createEntry(rs));
}
rs.close();
pcall.close();
conn.close();
body = JSON.stringify({
“entries” : list
});
$.response.contentType = ‘application/json; charset=UTF-8’;
$.response.setBody(body);
$.response.status = $.net.http.OK;
} catch (e) {
$.response.status = $.net.http.INTERNAL_SERVER_ERROR;
$.response.setBody(e.message);
}
4. Create index.html and code the HTML part.
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv=“X-UA-Compatible” content=“IE=edge”>
<title>Real-Time Movie Rating</title>
<script src=“/sap/ui5/1/resources/sap-ui-core.js”
id=“sap-ui-bootstrap”
data-sap-ui-libs=“sap.ui.commons,sap.ui.ux3,sap.viz”
data-sap-ui-theme=“sap_goldreflection”>
</script>
<!– add sap.ui.table,sap.ui.ux3 and/or other libraries to ‘data-sap-ui–libs‘ if required –>
<script>
sap.ui.localResources(“movieui”);
var view = sap.ui.view({id:“idMovieMatrix1”, viewName:“movieui.MovieMatrix”, type:sap.ui.core.mvc.ViewType.JS});
view.placeAt(“content”);
</script>
</head>
<body class=“sapUiBody” role=“application”>
<h1>Real-Time Movie Rating</h1>
<div id=“content”></div>
</body>
</html>
5. Create some views and controllers to use native SAP UI 5 to accelerate building the application.
Website
The live webapp is available at (http://107.20.137.184:8000/workshop/sessionx/00/ui/MovieUI/WebContent/index.html) but I bring down the AWS instance to reduce the billing cost. I have captured screenshots and a brief video if you find the server is down.
Real-time movie rating homepage
The following screenshot is the app’s main page. For each movie, there are two scores: the upper score is from Twitter and the lower score from Tencent Weibo.
I heard a lot of buzz about “Man of Steel” but it is currently ranked No. 7 so I was really curious. “Man of Steel” had a 3.72 rating but “20 Feet from Stardom” had a 4.54 rating. Interesting! Looking closer I discovered that this was because “20 Feet” had only 351 mentions but “Man of Steel” had more than 20K, meaning that a popular movie may not necessarily be the one with the highest score but could also be one which has the most buzz.
I then created a page with detailed breakdown of the sentiments of the Movie’s sentiments for both Twitter and Tencent Weibo. Looks like “Man of Steel” has a higher positive sentiment in China compared to the US. Well not surprising, we like superhero movies and Superman is our favorite.
Sentiment/Social Media |
|
Tencent Weibo |
||
|
# |
%age |
# |
%age |
Strong Positive |
9,986 |
44% |
528 |
34% |
Weak Positive |
5,903 |
26% |
723 |
47% |
Neutral |
839 |
4% |
12 |
1% |
Weak Negative |
2,067 |
9% |
123 |
8% |
Strong Negative |
3,757 |
17% |
166 |
11% |
Let’s see what the score on Rotten Tomatoes looks like. The critics have given it a meager 56% but 82% of the audience liked it. That number is compares well with 70% positive sentiment rating from my real-time rating app.
“20 Feet from stardom” has 97% rating from critics and 100% from the audience on rottentomatoes. So my real-time rating app was able to successfully identify this hit from social sentiment on twitter. Looks like the movie is a sleeper hit!
This application is just a prototype now and I hope to make more enhancements to the drill-down page. For the next version, I want to use the predictive libraries in SAP HANA to create a recommendation engine for movie based on a user’s interests, something like a “Pandora for movies”. Hope you enjoyed reading my blog.
What’s next
We get requests to publish the potential multilingual sentiment analysis core service used in this movie rating demo as an API to support similar requests from other industries. SAP HANA SPS6 supports insert/update data through oData service. So, we want to provide a set of RESTful web service calls to help application developers to access this feature easily and fast based on SPS6. With this API, developer just needs to provide the search keyword to query the Twitter data. And the output will be the sentiment analysis result (positive/negative summary or sentiment trend) based on the automatically loaded Twitter data.
Do you think this API as a Service is useful? Please share your feedback in the comment!
Hi Wenjun, thanks for sharing a detailed walk-through of your development process. Good luck working on the next version! Paul
Great blog.
Liked all the code samples as well as the demonstration of multi-language support for sentiment analysis.
It would cool to put like this on github (if possible), because the community could really benefit on your design and your crawlers.
D.
Hi Wenjun
Great blog and explanation. Thanks for the code samples and demo.
Regards
Kumar
WOW! I wish I could develop something like this at least in the far future
Great one Wenjun
Regards,
Benedict
Hello Wenjun,
this is really interesting, but could I make an additional request?
There are several use cases for sentiment analysis where the initial modeling starts data from Twitter/Facebook. Would it be possible for you to share more details on your Twitter crawler app? I’d be really interested in knowing that (what did you use? Like a python script or something etc.). It’d be even better if you could share the crawler code.
Cheers,
Henrique.
I believe Wenjun is using twitter4j (http://twitter4j.org) for crawling twitter.
Wonderful work, Wenjun!
A good example for us to study NLP features and native application development on SAP HANA.
Best Regards,
Miller
Hi Wenjun,
Great Blog!
Awesome work done
Hi wenjun,
It’s very interesting.And u like a pioneer makes me know more features about HANA sp6.
Hi Wenjun,
This post is brilliant!
Your app is probably one of the best examples I have seen of integrated Design Thinking with SAP HANA. It is an app that the general public can relate to. Plus you have given us a detailed insight into using the available tools to realise your app. Thank you.
Best Regards,
Arun Devan
PS: Let me know if you would like to spend some time in Sngapore with our team…
Wenjun, this is real awesome and inspirational stuff.
Thanks for the exhaustive write-up and sharing the things that you did.
Looking for more write-ups/blogs from you
Cheers,
Prashanth
Fantastic blog. Thank you so much.
Cheers.
Nick.
Being in the entertainment industry and using SAP business one I can see how this is useful.
nice blog
Hi Wenjun
It’s a very nice HANA app example you have build there. Thanks for sharing with the community at large.
Enjoy your time in Palo Alto, it’s a marvelous place
.
Best regards
Tom
Hi Wenjun,
This is really great stuff. As Benedict says, I hope I can do something nearly as cool someday. Keep the nice work!
Cheers,
Custodio
Excellent blog Wenjun, would be even more excellent if you could ad another blog about your use of twitter4j when you have time…
😉
Hi Wenjun:
This is really cool use case – something that everyone can use in their everyday (or at least in their weekend) life. I am like you – who trusts “friends” opinion more than the critics before watching a movie.
Thanks for doing this and taking time to document the entire process. Looking forward to the next rev.
-Rajiv
Hello Wenjun,
This is one of the excellent examples of Text Analysis capability of SAP HANA. Thanks a lot for your explanation and SQLscript, JavaScript and HTML5 code samples!
Regards,
Venu
Wow this is nice! I think this really shows the power of HANA.
A complex scenario without HANA, an easy on with HANA.
really good job.
composite api works greate and use hana to analysis the sns is major idea for usecase
really good job.
composite api works greate and use hana to analysis the sns is major idea for usecase
http://107.20.137.184:8000/workshop/sessionx/00/ui/MovieUI/WebContent/index.html
It’s asking for USER and PASS..What will I use?
Great blog and explanation.
Thanks
This is just awesome Wenjun!!!
Hi Wenjun,
You have picked a topic that everyone could relate to which is brilliant. Very interesting to read the potential on how SAP HANA can apply on ‘soft’ subject. Great work!
Regards,
Annie
Hi Wenjun,
When i am trying to execute the java file moviewcrawler i am getting the error as invalid table name: Could not find table/view OTABLE in schema.
regards
Dinesh.
hi Wenjun,
i felt bored and searching something and then i typed sap in movies then only i found your forum sap hana sentiment rating for movies really very nice job yaar very interesting to read nice .
With thanks,
Mahalakshmi.