[Update: April 5th, 2016 – The Live3 on HCP tutorial series was created using the SAP HANA Cloud Platform free developer trial landscape in January 2015. The HCP landscape has significantly evolved over the past year. Therefore one may encounter many issues while following along with the series using the most recent version of the free developer trail edition of HCP.]
Continuing the SAP HANA Academy’s Live3 course Philip Mugglestone details how to create a view that contains scoring for each Twitter user based on their influence and stance (attitude). Check out Philip’s tutorial video below.
(0:20 – 1:50) Twitter Influence Index Explained
The influence score reflects a Tweeter’s influence based on the number of retweets and replies their Tweets have garnered. Our Tweets table notes the Twitter handles of users who have retweeted and replied to each individual Tweet. People are considered more influential if their Tweets are replied to and retweeted than if they are the ones who retweet and reply to others’ Tweets. Taking the total number of Tweets that you retweet and reply to and subtracting it from the total amount your personal Tweets have been retweet and replied to gives a numeric index reflecting your individual Twitter influence.
Twitter Influence = (# of times your Tweets that have been retweeted + # of times your Tweets that have been replied to) – (# of Tweets you’ve retweeted + # of Tweet you’ve replied to)
(1:50 – 3:30) How to Establish a User’s Stance
Stance is based on sentiment analysis. Open the $TA_Tweets table created in the prior video. Open the Analysis tab and left click on the TA_TYPE in the String folder and choose to add it as a filter. Choose to filter on StrongPositive Sentiment.
Drag TA_TOKEN as the Labels axis and id(Count) as the Values axis. Then choose Tag Cloud as your Analytic and you will get an indication of what words are used to represent StrongPositive sentiment. We can also choose WeakPositive, WeakNegative, and StrongNegative as TA_TYPE to see the words that are classified as those particular sentiments.
So we can add up the occurrence of the words categorized into the different sentiments for a particular user, assign those sentiment categories particular numeric values and then subtract the weak total from the strong total to get an overall indicator of a user’s stance (attitude).
(3:30 – 7:05) Overview of SQL View Code – Influence Score
In the scripts folder of the Live3 code repository on GitHub select the 04 SetupView.sql file and choose to edit with Notepad++. Copy the lines of SQL code and paste them into a SQL console in Eclipse. First make sure you’re set to the proper schema.
This code will create a view that will process the influence and stance calculations described above. To showcase part of the code Philip runs the four lines before the last of code (see below). Running that returns a count of the number of retweets each Tweeter has.
The section of code of above this highlighted part counts the number of replies in a similar manner. Both parts of this code are straight forward SQL select group byes. This SQL only looks at data within the time that it has been collected.
The code also looks at the number times a user has been retweeted and the number of replies. So we must run these four subqueries and join the results together. We need to ensure that we capture all of the user because many user may not have preformed one of those four Twitter actions.
At the top of the code, the case statement will determine the influence score by adding together the number of received retweet (if retweets is missing then it’s set to zero so propagation of missing values isn’t returned) and replies and then subtracting the number of sent retweets and replies. Using a select from distinct will ensure we capture all of the Tweeters irrespective if they have used retweets or replies.
(7:05 – 9:20) Overview of SQL View Code – Stance Score
For the stance we will run a query that will pull out the number of StrongPositive, WeakPositive, StrongNegative and WeakNegative sentiments tweeted by an individual user. Instead of having a row for each sentiment per user we transpose that data into four individual columns for each user.
Finally we will join that data together and apply a stance ratio. We have given StrongPositive a score of 5, WeakPositive a score of 2, WeakNegative a score of 2, and StrongNegative a score of 5. We will add up those scores and then subtract the overall negative value from the overall positive value to get a stance score.
Note other ways to do this exist in SAP HANA including utilizing a calculation engine in a calculation view.
(9:20 – 11:30) Running the Code to Create the Stance and Influence View
Highlight all of the SQL code and run it. Creating this view means we aren’t materializing the results until we need them. So this view will be up to date when loading data in real-time.
Refreshing the list of views shows the newly created Tweeters view. After choose to preview the data of the new view we can see the stance and influence score for every Tweeter. For example a user could have a very positive stance but a very low influencer score. This information is incredibly useful as we can now group together users with similar influences and stances.
Follow along with the Live3 course here.
SAP HANA Academy over 900 free tutorial videos on using SAP HANA and SAP HANA Cloud Platform.