Twitter Paradise: Smart Data Integration : Twitter Replication Pt 3 of 3 [SPS09]
Ah Twitter, that blissful paradise for bad grammar and teen fanatics. In this video by the SAP HANA Academy, Tahir Hussain Babar aka Bob shows us how to replicate Twitter into SAP HANA.
What you are going to do
In this video you are going to take the view of the twitter adapter or data source made in the last video and replicate it into SAP HANA using the Web IDE. You are going to create a replication task that will filter the objects that are coming into your SAP HANA system
How you are going to do it
The Twitter adapter takes a few minutes to connect to your tweets. You take your various keys within the Twitter API and you put them into remote data source which creates a virtual view in SAP HANA connecting to your Tweets via dev.twitter.com.
Look out for
The problem is that this is just a view, the data is not in HANA and consists of all Tweets.
Historic Moment
Bob describes making his first tweet using this test account, he also demonstrates that after refreshing this tweet called imaginatively enough “This is a test” it appears at the top with all the relevant columns concerning the source, whether it was replied to, if it was a re-Tweet etc.
Filtering Tweet data in SAP HANA
The focus of this video is the creation of a replication task which will filter the data to be stored. This simple example which filters for the Tweets that Bob has made can be easily customised according to your requirements.
Step 1 – Launch the SAP Web IDE
Login as the developer user using the link below, you may have a different instance number. Remember when you created the user you should have allocated it the developer role. You need to create yourself a file with the specific extension.
Bob then describes how to make a new file in the package for DEV01.
The file extension must be as shown below.
Step 2 – Using the UI
Bob then outlines the key features of the UI that has been generated. In this example Bob selects DEV01 as the Target Schema and Virtual Table Schema. He wants to capture all the updates so he deselects the box for Initial load only. He also wants to clearly identify the table as virtual so he prefixes it with VT_
At this point you can Add Objects. Bob then adds the table from the Remote Data Source called Status which as you may recall from the previous video contains all your tweets.
Step 3 – Filtering
Once the table has been added you can see the actual columns in that table. Bob then demonstrates filtering by ScreenName and emphasises that this is the only way you can accomplish this task.
Bob uses the filter tab to type in the command below then clicks on Save and Activate.
Step 4 – Invoking the Procedure
Now the task and been saved and activated you need to invoke the procedure which can be accessed in SAP HANA Studio. However, Bob demonstrates another method outlined below to access the Catalog and invoke the procedure from there.
All you need to do now is execute the procedure you have just called as shown below.
You will be able to see if the procedure has been successfully completed in the dark grey area under the call statement.
Step 5 – Demonstrating the Task is Completed
Bob returns to his list of tables and highlights the two objects that have created during the replication task. The source table ended contains VT_ as part of the name whilst the output table includes the word Status.
Bob the shows how the source table still contains all Tweets but the output table only contains the filtered Tweets which Bob created. He does this in both cases by right clicking and selecting Generate Select, then Execute. In typical thorough style Bob tests this by adding another Tweet which of course works. He goes on to explain how the SQL commands actually work.
The first line is creating the procedure using SQL. The rest of the code establishes what you are going to do, creates a replication task but while this task is being created you don’t want to miss any incoming data. The queue ensures that this does not happen.
As always Bob concludes by outlining what’s coming next. So here there is was folks, how you can replicate twitter into SAP HANA academy. #schooled . Did I mention the veritable gods of the modern age, the ‘twitter famous’. Funnily enough, a website dedicated entirely to talking to yourself whilst thousands can hear is an excellent form of marketing and connecting people. Twitter can be held responsible for many things, the creation of a whole subculture and slang and also, improving your business.
Tahir Hussain Babar
Great demo! But I still have a question about the Twitter adapter. Can we search tweets on hashtags? From page 16 of this slide deck, it seems we can access tweets based on hashtags, users, etc.
However, after adding the Twitter remote source in SAP HANA Academy - Smart Data Integration : Twitter Replication Pt 2 of 3 [SPS09] - YouTube, there are only two tables, Rate_Limit_Status and Status. And according to the demo, the content in Status should be the tweets on the homepage of the testing user, which means his tweets plus following tweets.
So now I'm wondering if we can configure something to search tweets based on one specific hashtag? Something like I searched tweets based on movie hashtag Real-time sentiment rating of movies on SAP HANA (part 6) - wrap-up and looking ahead
Best regards,
Wenjun
Wenjun,
Yes. You could utilize the Fuzzy Search feature in HANA to filter out specific hashtags, as long as the Twitter User the Twitter Adapter is using is following that hashtag in the first place. The advantage with fuzzy search is that you can look for similar hashtags as well, as a score is allocated to the match. Have tested this, and it works fine.
If you have filtered in the .hdbreptask using a LIKE statement, you would have to manually change this.
In the SAP HANA Academy, we also show how you can dynamically change what you are searching for using a different technology called node.js. Check out the Live 3 project here ; Live3 - Advanced Real-Time Social Media Analytics - YouTube ...
Cheers
Bob
Hi Bob,
Thank you very much for your reply. Just had a look at Live3 - Advanced Real-Time Social Media Analytics - YouTube, very useful materials. Will learn it later. You're right, we can use fuzzy search to filter the tweets we crawled. But what I meant is can we directly crawl tweets with a specific hashtag like using GET search/tweets | Twitter Developers with SDI?
Best regards,
Wenjun
Wenjun,
I don't think we can, but having worked with the Public Twitter API before, you need to be careful as not all tweets are returned when you perform a query with GET search/tweets. Also, if you request too much data (I think more than 2000 tweets an hour), Twitter will suspend your connection (we experienced this before when teaching a class in Vegas using node.js) 🙁 Therefore, it is better (alhough probably less dynamic) to get a user to follow a hashtag, connect using SDI, and then filter out using SAP HANA fuzzy search, and then you would have no limit 😉 To summarize, there are pros and cons of using a public freely available API ! Depends what you want to do.
Cheers
Bob
Hi Bob,
Got it. Thanks for your explanation.
Best regards,
Wenjun
Thank you for these series! I have a question about the endpoints configured in the adapter. We can see /application/rate_limit_stats, /search/tweets, /statuses/home_timeline and /statuses/user_timeline from Rate_Limit_Status.
I still have two doubts here.
What I fail to understand is which endpoint is used by the adapter to fill data in Status...
Then, how would I proceed to perform a Twitter search setting my own defined query e.g. q = #cop21 as q is a mandatory parameter for the search endpoint on the twitter API.
TIA
The initial load is working but the delta load does not work. What should be done in this case?
Marian. It's hard to say without seeing any error/log messages 🙁
Please let me know which logs do you want to check and I will more than gladly provide them.
Marian. Please send the IndexServer logs and a screenshot of the error message.
The view of the twitter adapter looks good.
The issue is that replicating and filtering is not working as it should: data is not replicated.
I'm missing the option to add attachments in order to add the indexserver logs.
Any other options to upload the logs?
In the meantime I've opened an Incident and currently I'm waiting for Feedback from SAP.
Hi, I got the same issue initial load is happening but real time is not happening, could any one please help with the issue.