You can find the series and project from Real-time sentiment rating of movies on SAP HANA (part 6) – wrap-up and looking ahead

Intro

Hi everyone, welcome to the series of “Real-time sentiment rating of movies”. In this series of blogs I want to share with you how to build a SAP HANA native application named “Real-time sentiment rating of movies” step by step in detail. Of course, I will also share my project to GitHub when I finish it. The goal of this smart application is that we can analyze the sentiment rating of new release movies from social media in real-time. The basic approach to build this app consists of three steps as follows.

1. Crawl metadata of new release movies as well as social media data and insert into SAP HANA in real-time

2. Use the text analysis feature of SAP HANA to analyze the social sentiment

3. Build some fancy UIs to expose the result of sentiment analsyis

Motivation

Actually, what I want to share with you in this series is the second version of this smart app. If you are interested in the first version, you can have a look at Real-time sentiment rating of movies on SAP HANA One. You may ask why I want to build the second version and what’s the difference between the first version and the second version. So, I’ll answer this question first. I’ve already explained some reasons in Use XSJS outbound connectivity to search tweets. The first version was built with SAP HANA SPS05 and at that time XS Engine does not support some advanced features such as Core Data Services (CDS, since SPS06), Outbound Connectivity (since SPS06) and Job Scheduling (since SPS07), etc.. Without these awesome features I could only use some external tools/methods to finish some part of the app, e.g., I used Twitter4J to crawl tweets which means I used Java in the first version. With the rapid development of XS, now I am able to build the smart app with pure XS which means now I can build the pure SAP HANA native application. That’s the major motivation why I want to rebuild the smart app and share with you. The second reason is that I’m not satisfied with the UI of the first version. I’d like to build a more fancy UI and the app should be able to run on mobile devices. Besides I also fixed some bugs in the first version. I’ll also show you in the series later.

I can’t wait to share with you. 🙂 Are you ready? Now let’s start!

Prerequisites

1. A running SAP HANA system, at least SPS07, since Job Scheduling was introduced since SPS07. I am using SAP HANA SPS08 Rev. 80.

2. A developer account for Rotten Tomatoes API – Welcome to the Rotten Tomatoes API. If you don’t have, register here

3. A developer account for Twitter Developers. If you have a Twitter account, just login. If you don’t have, sign up here

Basics about XS

In the series of “Real-time sentiment rating of movies”, I won’t explain basics about XS, for example how to create XS project, how to commit/activate your design-time object and some basic concepts. If you are new to SAP HANA XS, you can first have a look at the following blogs/materials/references.

SAP HANA Extended Application Services

SAP HANA SPS6 – Various New Developer Features

SAP HANA SPS07 – Various New Developer Features

http://help.sap.com/hana/SAP_HANA_Developer_Guide_en.pdf

JSDoc: Index

Find APIs

Before we start to develop the smart app, we need to first do some research work or some tests. The most important thing is that we need to figure out which APIs we can call to get the metadata of new release movies and the social media data, e.g. tweets. I’ll still use Rotten Tomatoes API – Welcome to the Rotten Tomatoes API and Twitter Developers as the data sources which I used in the first version of this app. Now let’s have a look at which APIs we need in our app.

Metadata of new release movies

Among Rotten Tomatoes API – API Overview, we can find that we can get new release movies of current week via Rotten Tomatoes API – Opening Movies. There are some parameters you can configure, e.g., limit the number of movies to return. Let’s give it a shot. The URL is http://api.rottentomatoes.com/api/public/v1.0/lists/movies/opening.json?apikey=[your_api_key] You can find your API key from http://developer.rottentomatoes.com/apps/mykeys if you’ve already signed in.

1.PNG

It works! 🙂 However, we need some details of movie metadata which is not included in this API call, e.g., the director, studio and genres of the movie. Don’t worry! There is another API we can use, Rotten Tomatoes API – Movie Info With this API, we can get all information about a certain movie as follows.

2.PNG

Tweets

Now let’s take a look at REST APIs | Twitter Developers. Since we will search tweets about new release movies and do the sentiment analysis in SAP HANA, we need to use GET search / tweets | Twitter Developers. I’ve already written a blog about this part, so please refer step 1 and 2 in Use XSJS outbound connectivity to search tweets. Make sure you can successfully get the result with Postman – REST Client as follows. If you want to know more about the search API, you can also have a look at The Search API | Twitter Developers

3.PNG

OK. In short we need the following three APIs in our app.

Create schema and tables

Now we can start to develop our smart app with SAP HANA XS. First of all, we need to create XS project, then add .xsapp and .xsaccess. Since we want to build a pure SAP HANA native application, we can use .hdbschema and .hdbdd these two artifacts to create our schema and tables.

MOVIE_RATING.hdbschema


schema_name = "MOVIE_RATING";
















movieRating.hdbdd


namespace movieRating.data;
@Schema: 'MOVIE_RATING'
context movieRating {
  type SString : String(20);
  type MString : String(200);
  type LString : String(2000);
  @Catalog.tableType : #COLUMN
  Entity Movies {
  key id : Integer;
  title : MString;
  year : Integer;
  mpaa_rating : SString;
  runtime : SString;
  release_date : LocalDate;
  synopsis : LString;
  poster : MString;
  studio : MString;
  hashtag : MString;
  timestamp : UTCTimestamp;
  since_id : SString;
  };
  @Catalog.tableType : #COLUMN
  @nokey
  Entity Genres {
  movie_id : Integer not null;
  genre : MString not null;
  };
  @Catalog.tableType : #COLUMN
  @nokey
  Entity AbridgedCast {
  movie_id : Integer not null;
  cast : MString not null;
  };
  @Catalog.tableType : #COLUMN
  @nokey
  Entity AbridgedDirectors {
  movie_id : Integer not null;
  director : MString not null;
  };
  @Catalog.tableType : #COLUMN
  Entity Tweets {
  key id : SString;
  created_at : UTCDateTime;
  text : MString;
  source : MString;
  user_screen_name : SString;
  user_profile_image_url : MString;
  longitude : Decimal(20, 17);
  latitude : Decimal(20, 17);
  movie_id : Integer;
  timestamp : UTCTimestamp;
  };
};
















You can find there are five entities as follows which means after the activation there will be five corresponding runtime tables under schema “MOVIE_RATING”.

  • Movies
  • Genres
  • AbridgedCast
  • AbridgedDirectors
  • Tweets

I’ll explain the above five entities respectively.

Movies

– We can get “id”, “title”, “year”, “mpaa_rating”, “runtime”, “release_date”, “synopsis”, “poster”, “studio” from Rotten Tomatoes API – Movie Info directly.

– For “hashtag”, it’s an improvement in the second version. We can generate a hashtag for each movie with the following three advantages.

1. Hashtag means a topic/keyword in Twitter. See Twitter Help Center | Using hashtags on Twitter It’s common to “hashtag” movie titles in tweets. So, we can use the hashtags of movies to search tweets instead of movie titles in plain text.

2. Since we will use SAP HANA to do the sentiment analysis of tweets, we need to insert tweets into SAP HANA. However, sometimes the movie title contains sentiment itself which we need to avoid, e.g., if the movie title is “I love you”, SAP HANA will detect there is a strong positive sentiment in each of tweet about this movie. But if we use hashtag, SAP HANA will consider it as a topic instead of a potential sentiment.

3. Another case is that sometimes the movie title is a common word/phrase which is used widely. The tweet may contain this word/phrase but the user is not talking about movies. For instance, if the movie title is “go to work”, it is obviously a common phrase. Usually we will post including “go to work” directly instead of #gotowork.

– For “timestamp”, we use it to record when we crawl this movie.

– For “since_id”, in order to only search new tweets of each movie, we will use it as a parameter to search tweets. See GET search / tweets | Twitter Developers

Genres, AbridgedCast, AbridgedDirectors

Since movies and genres/cast/directors have n:m relationship, we can store this info in additional mapping tables. For simplicity, we do not store the info about actor/actress/director, so we do not create tables for these entities. If you create tables about actor/actress/director, you can use associations in CDS. See Create an Association in CDS – SAP HANA Developer Guide – SAP Library

Tweets

First of all, you can find the JSON format of tweet object returned from Tweets | Twitter Developers. As you can see, we just store info what we need in our smart app instead of all information of a tweet object.

– For “id”, we use “id_str” instead of “id”. You can find the reason from this thread Google Groups

– For “created_at”, “text”, “source”, “user_screen_name”, “user_profile_image_url”, we can get them directly from GET search / tweets | Twitter Developers

– For “longitude” and “latitude”, we can get the geo info from the “coordinates” field instead of the “geo” field. See Google Groups. Since currently CDS does not support geospatial data types officially, we just use “longitude” and “latitude” instead.

– For “movie_id”, we need to record which movie this tweet mentioned.

– For “timestamp”, we use it to record when we crawl this tweet.

After the activation, you may find five tables are created under schema “MOVIE_RATING” in run-time. However, now you cannot do select/insert/update/delete operations on these five tables, since both design-time objects and run-time objects are owned by the technical user _SYS_REPO.

4.PNG

Next steps

Till now, we’ve created several tables in our smart app. In the next blog, we will first create few roles and grant these roles to some users, so that they can do select/insert/update/delete operations on these tables. Then we will use Outbound Connectivity and Job Scheduling features to crawl metadata of new release movies from Rotten Tomatoes API and tweets from Twitter API in real-time!

Hope you enjoyed reading my blog. 🙂

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

  1. Eduardo Rubia

    Great material, kudos! Wanted to implement this, but too bad now Rotten Tomatoes is not issuing API keys any longer… Read the full series, though, and great job. Learned a lot from you. Thanks for sharing your knowledge.

    (0) 

Leave a Reply