Skip to Content
Author's profile photo Rishabh Awasthi

Reviews from Social Media Data Extraction

It was Friday at work and didn’t had much of work to do .
While my surf at SCN  surfing came across a Blog
from Hillary Bliss ..

Sharknado Social Media Analysis with SAP HANA and Predictive Analysis


I had already planned for movie The Wolf Of Wall Street for the evening and had heard some good reviews of it
(It has to be Caprio always has Good ones 😉   ..where the Hell is his Oscar!! ).

Thought of trying to get the reviews from Twitter using the basic outlines of the Text Data Processing Blueprints.

Here are the steps and my explorations.

Data Extraction

Twitter provides an open Search API that provides an option to retrieve “popular tweets” in addition to real-time search results.

Source data consists of unstructured text in form of tweets which are retrieved from Twitter REST based Search API with the search term. Can check this out at.. Apigee twitter API console..

Capture3.PNG

As in the Blueprint:

  • Step 1: Create an account on https://dev.twitter.com
    • Because Twitter is a third-party application, these steps may change.
      • Create or log into your Twitter developer account at http://dev.twitter.com.
      • In My Applications, create a new application with a unique name and a placeholder URL
      • Open the OAuth settings to locate the Consumer Key and the Consumer Secret values, and add the values to the search.cfg file.
      • Create an access token and refresh the page.
      • Add the access token and the access token secret values to the search.cfg file.
      • Edit the search.cfg configuration file to specify the terms or hashtags that are used for the Twitter search.

Capture2.PNGCapture.PNG

Dataflow Development

  • So now we have the source we are good to design a Data Services Job and Dataflow to extract and analyze the sentiment in tweets  .
  • Create a Job in the BODS Designer and develop the dataflows for the process

  Below is the implementation screenshot of the Dataflow

/wp-content/uploads/2014/02/4_378667.png


Twitter Search Dataflow:

  • This dataflow primarily connects to the Twitter API and extracts the tweets and loads it into the Database tables
  • It uses two User Defined Transforms which has the python code  to connect to the twitter

/wp-content/uploads/2014/02/5_378671.png

  • GET_SEARCH_TASKS:  It’s a User defined Transform which prepares the inputs for the twitter search API by extracting information from Search.cfg file.
  • Search Twitter Transform: It’s a User defined Transform which retrieves the tweets from Twitter Search API

Sentiment Analysis:


  • Twitter Process Dataflow:
    • This dataflow uses the Base Entity Extraction Transform of TEXT DATA PROCESSING and analyzes the sentiments in the tweets.

/wp-content/uploads/2014/02/6_378672.png

  • Entity_Extraction_Transform:
    This transform extracts the basic entities for sentiment analysis.
    It uses the English language module and the “english-tf-voc-sentiment.fsm” rule file provided by SAP for the analysis.

/wp-content/uploads/2014/02/7_378677.png

Run the Job.

And get reviews in general from Twitter Public stream.

/wp-content/uploads/2014/02/8_378678.png

Though the tables can be used to build universe and and detailed reports could have been generated I thought I would try that later..

I had to catch up with the Movie 😀

Assigned tags

      18 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo shiva sahu
      shiva sahu

      An exceptional document.

      Can you please write algo for your python code. 🙂

      Cheers,

      Shiva Sahu

      Author's profile photo Rishabh Awasthi
      Rishabh Awasthi
      Blog Post Author

      Hi Shiva,

      See the attached code as in Blueprint.

      Regards,

      Rishabh

      Author's profile photo Former Member
      Former Member

      Nice Blog Awasthi ji 🙂

      Author's profile photo Rishabh Awasthi
      Rishabh Awasthi
      Blog Post Author

      thanks kaka 🙂

      Author's profile photo Former Member
      Former Member

      Nice Article

      Author's profile photo Rishabh Awasthi
      Rishabh Awasthi
      Blog Post Author

      Thanks Charuta 🙂

      Author's profile photo Former Member
      Former Member

      Great Article!!

      Can you please help me... where can i find the proxy and the url for the search.cgf file?

      thanks in advance.

      Regards,

      Manuel

      Author's profile photo Rishabh Awasthi
      Rishabh Awasthi
      Blog Post Author

      Hi Manuel,

      Proxy needs to be the current proxy that you are using.

      URL is the address of twitter search api,as in the screenshot.

      Regards,

      Rishabh

      Author's profile photo Former Member
      Former Member

      Interesting case!

      We try to build a similar thing in BODS 4.0....would you maybe provide the ATL-file for this case?

      kind regards

      matthias

      Author's profile photo Former Member
      Former Member

      Great Awasthiji.. 🙂

      Author's profile photo Mathan Msd
      Mathan Msd

      Nice blog Awasthi..

      Author's profile photo Former Member
      Former Member

      Hi Rishab,

      Nice Blog!!

      Is there a way to process a single text which has multiple languages in it, using the Entity Extraction transform?

      Regards,

      Sharayu

      Author's profile photo Rishabh Awasthi
      Rishabh Awasthi
      Blog Post Author

      Hi Sharayu,

      This is an interesting case but the entity extraction transform uses the specific language packs. Will have to check for multiple language support.

      Regards,

      Rishabh

      Author's profile photo Former Member
      Former Member

      Hi Rishab

      Nice Blog!!!

      Tried with following the steps mentioned. I am getting access violation issues. Could u please suggest me how to over come this.

      "SYSTEM EXCEPTION <ACCESS_VIOLATION> OCCURED"

      Regards

      Author's profile photo Rishabh Awasthi
      Rishabh Awasthi
      Blog Post Author

      Hi Sarathy,

      Could you please elaborate where you are facing this issue?

      Regards,

      Rishabh

      Author's profile photo Venkata Ramana Paidi
      Venkata Ramana Paidi

      Hi Rishabh,

      While I am running the job, I am getting the below error.

      5340    5836    PRINTFN    10/23/2015 11:56:02 AM    https://api.twitter.com/1.1/search/tweets.json?q=uae&count=100&lang=en&result_type=recent

      5340    5836    PRINTFN    10/23/2015 11:56:02 AM    Failed to reach the https://api.twitter.com/1.1/search/tweets.json?q=uae&count=100&lang=en&result_type=recent server due to

      5340    5836    PRINTFN    10/23/2015 11:56:02 AM    [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581).

      Could you please help me to resolve this issue.

      Thanks & Regards,

      Venkata Ramana Paidi

      Author's profile photo Former Member
      Former Member

      Hi Rishab,

      I am facing one issue. Rishabh Awasthi

      which is present at below link:

      Error while executing TDP blueprint for twitter

      Thanks,

      Swapnil

      Author's profile photo Luis Gonzalez
      Luis Gonzalez

      Where can I find the blueprints for TDP? I only see the DQM blueprints