Skip to Content

I am a software developer who wants to learn to develop application on HANA. Yes, I have done HANA-related courses at Open SAP, I have gone through several videos provided by SAP HANA Academy and they are excellent resources to get myself familiar with HANA. However, I found myself wanting to learn more. Rather than going through HANA guides without having a useful context, I challenged myself to develop a scenario/application that utilises HANA.

I came across this blog describing a strategy to detect world cup goal using twitter data so I thought to myself why not build this on HANA since all major components (except raspberry pi) are available and can be easily configured in HANA. The major components are namely:

  1. Connection with Twitter API using HANA XS Outbound  Connection
  2. Full-text indexes for text analysis to find GOAL
  3. Exponential smoothing algorithm to produce smoothed version of tweets data
  4. SAPUI5 for presentation

After that, I will compare the result with actual events and I will describe some improvements that can be made.

My source code is available here.

Requirements


– SAP HANA SPS6 or above is needed to make XS Outbound connection. I am using SAP HANA developer edition rev 72 available in SAP CAL. To get your own developer edition, refer to http://scn.sap.com/docs/DOC-28294

– Twitter application created in your twitter account. Refer to the following link for more information http://scn.sap.com/docs/DOC-49203

1. Connection with Twitter API – HANA XS Outbound  Connection


To get data from Twitter into HANA, I need XSJS Outbound connection, XSJS service to store tweets to database and XSJS scheduler to run the service in regular basis.

XSJS Outbound connection – twitter_connection.xshttpdest

/wp-content/uploads/2014/07/outbound_connection_490150.jpg

XSJS service – TwitterCollector.xsjs

In the service, I want to collect tweets that contain “#BELvsUSA” which is the hashtag suggested by Twitter for Belgium vs USA soccer game.

/wp-content/uploads/2014/07/twitter_1_490182.jpg

/wp-content/uploads/2014/07/twitter_2_490183.jpg

If you are familiar with Twitter API, you may be wondering why in my code, I’m not using HTTPS protocol since Twitter only allows HTTPS protocol. That is because I don’t have access to download SAP cryptographic library (sapgenpse) which is the prerequisite to make an HTTPS outbound connection with HANA XS (reference). So I created php service to act as an adapter to Twitter. I would have made HTTPS connection directly to Twitter API if I have access to the library.

2. Full-text indexes for text analysis


Once I have tweets data, I resort to full-text indexes that I have created on the tweet data for text analysis. I use LINGANALYSIS_FULL to basically breakdown the tweets into individual words. For example: User @juanvofficial tweeted “Well deserved goal by the Belgium team #BELvsUSA”. The index will provide the following information

/wp-content/uploads/2014/07/full_index_490184.jpg

Why do I need to breakdown the tweet into words? It’s just to make it easier for me to exclude the text that does not indicate an actual goal, i.e. @SmileyElie tweeted “#USA has the best goalkeeper ever!!! So impressed!!! #BELvsUSA”. The tweet contains “goal” but the tweet does not indicate that one of the team has scored.

3. Exponential smoothing algorithm


The formula I chose to indicate goal is percentage of tweets mentioning goal to indicate whether a goal has happened.

     No of tweets mentioning goal per minute * 100 / No of tweets per minute

After we get all tweets that mentions GOAL, I want to perform exponential smoothing algorithm which is usually used to remove noise data. Let me just show you a Line chart indicating the percentage of tweets mentioning goal / minute before the algorithm.

/wp-content/uploads/2014/07/percent_raw_490185.jpg

I chose single exponential smoothing algorithm which is available SAP PAL library and after applying the algorithm, the result is shown in green line which is smoother than original graph.

/wp-content/uploads/2014/07/percent_combination_490186.jpg

4. SAPUI5 for presentation


Oh well, you can see that the graphs above are SAPUI5 graphs.

Result


Now let’s compare our graphs with actual events. There are 3 goals during Belgium vs USA game in round of 16 World Cup 2014.

/wp-content/uploads/2014/07/actual_result_490195.jpg

If I can translate the timeline into our graph, the goals happened at the bold green points in the graph below and after a goal happens, the percentage number of tweets mentioning goal increase to more than approximately 8%.

/wp-content/uploads/2014/07/percent_smooth_490196.jpg

First goal by Belgium in extra time definitely gets people excited because a lot of people has been expecting a goal. The percentage went up again because of second goal by Belgium but there weren’t as many tweets as the before. Moreover, Twitter was busy again after first goal by USA in second half of the extra time. This definitely raised hope for USA to turn around the game. Unfortunately, USA could not score another goal and lost the game.

Improvements


1. Twitter API

I collect the data from search/tweets which is an REST API source that provides relevant tweets from a limited corpus of recent tweets. To get more relevant search result, Streaming API should be used and it requires more complex solution. However, for the purpose of this exercise, I did not want to put too much effort in getting the tweets and as you can see from the graph above, the collected tweets from REST API gives sufficient information to indicate when a goal happens.

2. Text analysis

Improvement can be made when performing analysis especially if you are working with data from different language i.e. golazo means goal in Spanish and there could be many different variations in other languages.

There is also limitation with text analysis I chose, i.e. @Guevara_Caro tweeted “Were it not for Howard and Beasley, #USA would be losing by a couple of goals! #BEL has a great team. Let’s pick it up!! #GoUSA #BELvsUSA”. Obviously, the tweet contains word goal but the whole sentence does not indicate a goal has happened. Regular expression (which is available in R & other programming language, but not in HANA SQL) should be used to capture goal in tweets more accurately.

Conclusion


Various components available in HANA make it easier for me to obtain unstructured data, perform analysis on the data and present the analysis result using SAPUI5. In this case, goals can be identified by using twitter data. The solution is far from perfect and the threshold of 8% may not be used to indicate a goal in other world cup matches but I am hoping to gain more analytical knowledge and explore more interesting scenario in near future using SAP HANA.

To report this post you need to login first.

10 Comments

You must be Logged on to comment or reply to a post.

  1. Stefan Kuehnlein

    Hi Stavanic,

    I’m also happy about this example. I tried this on my own SAP HANA instance. I have one question: In the File TwitterCollector.xsjs I don’t find any information about the necessary authentification for twitter. How do you resolve this?

    (0) 
    1. Stevanic Artana Post author

      Hi Stefan,

      In my example, I do not establish a connection directly from HANA to Twitter.

      I have used a twitter client written in PHP to call Twitter API. Thus, HANA calls PHP page which then calls Twitter API.

      In TwitterCollector.xsjs, you will not find authentication data but there is a line that specifies the name of XSJS outbound connection which in this case is twitter_connection.

      var dest = $.net.http.readDestination(“wc2014”, “twitter_connection”);


      In twitter_connection.xshttpdest (XSJS outbound connection), you can specify the URL you want to call. In my example, the URL points to a PHP page. You can find the php code under PHP folder.


      Twitter authentication data needs to be specified in PHP/appOnly.php.

      // fill in your consumer key and consumer secret below

      define(‘CONSUMER_KEY’, ‘<your consumer key>’);

      define(‘CONSUMER_SECRET’, ‘<your consumer secret>’);


      Thank you for trying out my example. Let me know if you need more information.


      Regards,

      Stev



      (0) 
  2. Mayank Khandelwal

    Hello Stevanic,

    I am trying to make a connection between Twitter and HANA. Need your help in understanding the steps required for this.

    Looking forward to your support.

    Thanks

    Mayank Khandelwal

    (0) 
    1. Stefan Kuehnlein

      Hello

      I had also some problems to establish a connection to twitter – but now it is running. There are two different ways to authorize with twitter: OAuth or Application-only authentication. For my application the application-only authentication is sufficiently. I determine the Bearer Token with following code:

      function getBearerToken() {

          var dest = $.net.http.readDestination(“opitz.wetteranalyse”, “wetter”);

          var client = new $.net.http.Client();

          var req = new $.web.WebRequest($.net.http.POST,

                  “/oauth2/token?grant_type=client_credentials”);

          req.headers.set(“Authorization”, getBased64EncodeBearerToken());

          req.headers.set(“Content-Type:”,

                  “application/x-www-form-urlencoded;charset=UTF-8”);

          client.request(req, dest);

          var access = JSON.parse(client.getResponse().body.asString());

          return access.access_token;

      }

      see also:

      Application-only authentication | Twitter Developers

      Regards

      Stefan

      (0) 

Leave a Reply