Skip to Content

Ah Twitter, that blissful paradise for bad grammar and teen fanatics.   In this video by the SAP HANA Academy, Tahir Hussain Babar aka Bob shows us how to replicate Twitter into SAP HANA.

Picture1.png

What you are going to do

In this video you are going to take the view of the twitter adapter or data source made in the last video and replicate it into SAP HANA using the Web IDE.  You are going to create a replication task that will filter the objects that are coming into your SAP HANA system

How you are going to do it

The Twitter adapter takes a few minutes to connect to your tweets.  You take your various keys within the Twitter API and you put them into remote data source which creates a virtual view in SAP HANA connecting to your Tweets via dev.twitter.com.

Picture2.png

Picture3.png

Look out for

The problem is that this is just a view, the data is not in HANA and consists of all Tweets.

Picture4.png

Historic Moment

Bob describes making his first tweet using this test account, he also demonstrates that after refreshing this tweet called imaginatively enough “This is a test” it appears at the top with all the relevant columns concerning the source, whether it was replied to, if it was a re-Tweet etc.

Picture5.png

Filtering Tweet data in SAP HANA

The focus of this video is the creation of a replication task which will filter the data to be stored.    This simple example which filters for the Tweets that Bob has made can be easily customised according to your requirements.

Step 1 – Launch the SAP Web IDE

Login as the developer user using the link below, you may have a different instance number. Remember when you created the user you should have allocated it the developer role.  You need to create yourself a file with the specific extension.

Bob then describes how to make a new file in the package for DEV01.

Picture6.png

The file extension must be as shown below.

Picture7.png

Step 2 – Using the UI

Bob then outlines the key features of the UI that has been generated.  In this example Bob selects DEV01 as the Target Schema and Virtual Table Schema.  He wants to capture all the updates so he deselects the box for Initial load only. He also wants to clearly identify the table as virtual so he prefixes it with VT_

Picture8.png

At this point you can Add Objects.  Bob then adds the table from the Remote Data Source called Status which as you may recall from the previous video contains all your tweets.

Step 3 – Filtering

Once the table has been added you can see the actual columns in that table.    Bob then demonstrates filtering by ScreenName and emphasises that this is the only way you can accomplish this task.

Picture9.png

Bob uses the filter tab to type in the command below then clicks on Save and Activate.

Step 4 – Invoking the Procedure


Picture10.png

Now the task and been saved and activated you need to invoke the procedure which can be accessed in SAP HANA Studio.  However, Bob demonstrates another method outlined below to access the Catalog and invoke the procedure from there.

Picture11.png

Picture12.png

All you need to do now is execute the procedure you have just called as shown below.

Picture13.png

You will be able to see if the procedure has been successfully completed in the dark grey area under the call statement.

Picture14.png

Step 5 – Demonstrating the Task is Completed

Bob returns to his list of tables and highlights the two objects that have created during the replication task.  The source table ended contains VT_ as part of the name whilst the output table includes the word Status.

Picture15.png

Bob the shows how the source table still contains all Tweets but the output table only contains the filtered Tweets which Bob created. He does this in both cases by right clicking and selecting Generate Select, then Execute.  In typical thorough style Bob tests this by adding another Tweet which of course works.  He goes on to explain how the SQL commands actually work.

Picture16.png

The first line is creating the procedure using SQL.  The rest of the code establishes what you are going to do, creates a replication task but while this task is being created you don’t want to miss any incoming data.  The queue ensures that this does not happen.

As always Bob concludes by outlining what’s coming next.  So here there is was folks, how you can replicate twitter into SAP HANA academy. #schooled .  Did I mention the veritable gods of the modern age, the ‘twitter famous’. Funnily enough, a website dedicated entirely to talking to yourself whilst thousands can hear is an excellent form of marketing and connecting people. Twitter can be held responsible for many things, the creation of a whole subculture and slang and also, improving your business.

To report this post you need to login first.

12 Comments

You must be Logged on to comment or reply to a post.

  1. Wenjun Zhou

    Tahir Hussain Babar

    Great demo! But I still have a question about the Twitter adapter. Can we search tweets on hashtags? From page 16 of this slide deck, it seems we can access tweets based on hashtags, users, etc.

    Capture.PNG

    However, after adding the Twitter remote source in SAP HANA Academy – Smart Data Integration : Twitter Replication Pt 2 of 3 [SPS09] – YouTube, there are only two tables, Rate_Limit_Status and Status. And according to the demo, the content in Status should be the tweets on the homepage of the testing user, which means his tweets plus following tweets.

    Capture.PNG

    So now I’m wondering if we can configure something to search tweets based on one specific hashtag? Something like I searched tweets based on movie hashtag Real-time sentiment rating of movies on SAP HANA (part 6) – wrap-up and looking ahead

    Best regards,

    Wenjun

    (0) 
    1. Tahir Hussain Babar

      Wenjun,

      Yes. You could utilize the Fuzzy Search feature in HANA to filter out specific hashtags, as long as the Twitter User the Twitter Adapter is using is following that hashtag in the first place. The advantage with fuzzy search is that you can look for similar hashtags as well, as a score is allocated to the match. Have tested this, and it works fine.

      If you have filtered in the .hdbreptask using a LIKE statement, you would have to manually change this.

      In the SAP HANA Academy, we also show how you can dynamically change what you are searching for using a different technology called node.js. Check out the Live 3 project here ; Live3 – Advanced Real-Time Social Media Analytics – YouTube …

      Cheers

      Bob

      (0) 
        1. Tahir Hussain Babar

          Wenjun,

          I don’t think we can, but having worked with the Public Twitter API before, you need to be careful as not all tweets are returned when you perform a query with GET search/tweets. Also, if you request too much data (I think more than 2000 tweets an hour), Twitter will suspend your connection (we experienced this before when teaching a class in Vegas using node.js) 🙁 Therefore, it is better (alhough probably less dynamic) to get a user to follow a hashtag, connect using SDI, and then filter out using SAP HANA fuzzy search, and then you would have no limit 😉 To summarize, there are pros and cons of using a public freely available API ! Depends what you want to do.

          Cheers

          Bob

          (0) 
  2. Jean-Paul de Vooght

    Thank you for these series! I have a question about the endpoints configured in the adapter. We can see /application/rate_limit_stats, /search/tweets, /statuses/home_timeline and /statuses/user_timeline from Rate_Limit_Status.

    I still have two doubts here.

    What I fail to understand is which endpoint is used by the adapter to fill data in Status…

    Then, how would I proceed to perform a Twitter search setting my own defined query e.g. q = #cop21 as q is a mandatory parameter for the search endpoint on the twitter API.

    TIA

    (0) 
          1. Marian Canciu

            The view of the twitter adapter looks good.

            The issue is that replicating and filtering is not working as it should: data is not replicated.

            I’m missing the option to add attachments in order to add the indexserver logs.

            Any other options to upload the logs?Twitter_Rep.png

            (0) 

Leave a Reply