It was Friday at work and didn’t had much of work to do .
While my surf at SCN  surfing came across a Blog
from Hillary Bliss ..

Sharknado Social Media Analysis with SAP HANA and Predictive Analysis


I had already planned for movie The Wolf Of Wall Street for the evening and had heard some good reviews of it
(It has to be Caprio always has Good ones πŸ˜‰   ..where the Hell is his Oscar!! ).

Thought of trying to get the reviews from Twitter using the basic outlines of the Text Data Processing Blueprints.

Here are the steps and my explorations.

Data Extraction

Twitter provides an open Search API that provides an option to retrieve “popular tweets” in addition to real-time search results.

Source data consists of unstructured text in form of tweets which are retrieved from Twitter REST based Search API with the search term. Can check this out at.. Apigee twitter API console..

Capture3.PNG

As in the Blueprint:

  • Step 1: Create an account on https://dev.twitter.com
    • Because Twitter is a third-party application, these steps may change.
      • Create or log into your Twitter developer account at http://dev.twitter.com.
      • In My Applications, create a new application with a unique name and a placeholder URL
      • Open the OAuth settings to locate the Consumer Key and the Consumer Secret values, and add the values to the search.cfg file.
      • Create an access token and refresh the page.
      • Add the access token and the access token secret values to the search.cfg file.
      • Edit the search.cfg configuration file to specify the terms or hashtags that are used for the Twitter search.

Capture2.PNGCapture.PNG

Dataflow Development

  • So now we have the source we are good to design a Data Services Job and Dataflow to extract and analyze the sentiment in tweets  .
  • Create a Job in the BODS Designer and develop the dataflows for the process

  Below is the implementation screenshot of the Dataflow

/wp-content/uploads/2014/02/4_378667.png


Twitter Search Dataflow:

  • This dataflow primarily connects to the Twitter API and extracts the tweets and loads it into the Database tables
  • It uses two User Defined Transforms which has the python code  to connect to the twitter

/wp-content/uploads/2014/02/5_378671.png

  • GET_SEARCH_TASKS:  It’s a User defined Transform which prepares the inputs for the twitter search API by extracting information from Search.cfg file.
  • Search Twitter Transform: It’s a User defined Transform which retrieves the tweets from Twitter Search API

Sentiment Analysis:


  • Twitter Process Dataflow:
    • This dataflow uses the Base Entity Extraction Transform of TEXT DATA PROCESSING and analyzes the sentiments in the tweets.

/wp-content/uploads/2014/02/6_378672.png

  • Entity_Extraction_Transform:
    This transform extracts the basic entities for sentiment analysis.
    It uses the English language module and the “english-tf-voc-sentiment.fsm” rule file provided by SAP for the analysis.

/wp-content/uploads/2014/02/7_378677.png

Run the Job.

And get reviews in general from Twitter Public stream.

/wp-content/uploads/2014/02/8_378678.png

Though the tables can be used to build universe and and detailed reports could have been generated I thought I would try that later..

I had to catch up with the Movie πŸ˜€

To report this post you need to login first.

17 Comments

You must be Logged on to comment or reply to a post.

    1. Rishabh Awasthi Post author

      Hi Manuel,

      Proxy needs to be the current proxy that you are using.

      URL is the address of twitter search api,as in the screenshot.

      Regards,

      Rishabh

      (0) 
    1. Rishabh Awasthi Post author

      Hi Sharayu,

      This is an interesting case but the entity extraction transform uses the specific language packs. Will have to check for multiple language support.

      Regards,

      Rishabh

      (0) 
  1. Sarathy V

    Hi Rishab

    Nice Blog!!!

    Tried with following the steps mentioned. I am getting access violation issues. Could u please suggest me how to over come this.

    “SYSTEM EXCEPTION <ACCESS_VIOLATION> OCCURED”

    Regards

    (0) 
      1. Venkata Ramana Paidi

        Hi Rishabh,

        While I am running the job, I am getting the below error.

        5340    5836    PRINTFN    10/23/2015 11:56:02 AM    https://api.twitter.com/1.1/search/tweets.json?q=uae&count=100&lang=en&result_type=recent

        5340    5836    PRINTFN    10/23/2015 11:56:02 AM    Failed to reach the https://api.twitter.com/1.1/search/tweets.json?q=uae&count=100&lang=en&result_type=recent server due to

        5340    5836    PRINTFN    10/23/2015 11:56:02 AM    [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581).

        Could you please help me to resolve this issue.

        Thanks & Regards,

        Venkata Ramana Paidi

        (0) 

Leave a Reply