Integrating Twitter with SAP HANA for Text Analysis
Hi Just thought to share my learning experience on Streaming of Tweets using Java & inserting all in HANA for further Text/Token Analysis, Idea is simple and straight forward how you can leverage the capabilities/Power of inbuilt capability of text analysis of SAP HANA on some real-time information & I found twitter is better place for collecting some real-time information for understanding the text analysis in better way. So below is a short implementation which I wanted to share with everyone. This has already been implemented by multiple people/organization hence I am just adding my experience/learning & challenges here. So, at the instance you think for implementing text analysis technology – Please keep in mind following things.
- In which language, you are going to write the code. it is Java in my case you can use Python as well.
- How will you get real time data (Do you have access to any API which can provide you some real-time information) Answer is Twitter API’s are ones to provide all the real-time information which you are looking for? e.g. – You can perform analysis on Political tweets, Sports Tweets, Technological Tweets & Geo Tweets.
I opt for analyzing tweets related to SAP HANA (#SAPHANA, #IoT, #SAP) So these Hash tags will be used later for fetching tweets using Twitter API.
- Basic Understanding of Text Analysis Capability of SAP HANA refer to below links for TA (https://blogs.sap.com/2014/04/08/text-analysis-with-sap-hana/ & https://open.sap.com/courses/hsta1)
- Eclipse IDE Installed if not then download the latest version using below –http://www.eclipse.org/downloads/eclipse-packages/ Add HANA Tools using – https://tools.hana.ondemand.com/neon/.
- Create developer account at Twitter https://dev.twitter.com/
You will be navigated to developer page at Twitter. Click on create New App & fill the below required information.
Create your Twitter Application
Next step is to keep all the security tokens with you for Consuming Twitter API’s, below is a Snap of the Security tokens of mine.
Now Click on create Access Token
Your Access token will be generated Successfully.
Download latest version of Twitter API for using it into your project. please click on below to Download latest version of Twitter 4j.
http://twitter4j.org/en/index.html.
below is a snap of latest Twitter4j API –
Twitter API libraries will be used later.
Install the SAP HANA Client if not installed, Get it from SAP Service Market place which would be having the jdbc library for accessing the HANA from java.
Go to Service Marketplace -> Software Downloads -> Installation and Upgrades – > Browse Our Download Catalog -> SAP in Memory (SAP HANA) -> SAP HANA Platform and download the HANA Client
below is a snap of HDB Client, Important thing to notice is – it must have JDBC inside this.
Install HDB Client on your machine(32 or 64 Bit check this before download)
Download Twitter-analysis App here
Once done with above activities open eclipse IDE then open java perspective in package explorer -> right click here -> Import
Click Finish -> You project will be imported into package explore
Switch to HANA Development perspective for creating table which will store the Tweets information. execute the below commands of SAP HANA SQL Console.
SET SCHEMA “<YOUR_SCHEMA>”;
CREATE COLUMN TABLE TWEETS(
“ID” INTEGER NOT NULL,
“USER_NAME” NVARCHAR(100),
“CREATED_AT” DATE,
“TEXT” NVARCHAR (140),
“HASH_TAGS” NVARCHAR (100),
PRIMARY KEY(“ID”));
After creating the table in HANA, switch to configuration folder – change the config for HANA & Twitter connectivity. Open Java Configuration file & Perform the changes connecting the HANA Server.
1- Check if there is any proxy then make the proxy variable true & enter proxy details
2- Hana Database Host, Port, User, Schema & Password
3- Twitter tokens received above including Consumer keys & Secret keys.
4- Search Term What you want to fetch from Twitter like #SAP or #SAPHANA
After updating above details
Open the TwitterConnection.java & execute the file –
Test Connection to Twitter
Test Connection to SAP HANA
Open theHDBConnection.java & execute the file –
Before executing the TwitterSearch.java file, Configure TwitterApi properly then only you would be able to execute the Application else you will encounter errors like the Source of this class is not found hence i thought to mention how to configure source path for Twitter Api’s.
Right Click on Project.
Click on Configure build path -> Click on Java build Path -> Add External Jars -> Go to libraries folder of Twitter4j -> Select All Jars.
make sure All jars are available in libraries folder.
Click on Apply this will make all the classes available for your application. you can see in reference library folder all the Jars are available.
>TweetDAO.java will be used for inserting the tweets data into HANA System, here SQL Statement is prepared first & then executed.
After completing all the config & code now it’s time to invoke the twitter API for fetching the data from Twitter & insert the Tweets into HANA System. Execute the TwitterSearch.Java file.
Go to HANA System & and put a select on “Tweets” table
Now Leverage the text analysis capabilities of SAP HANA create Full Text Index on Tweets table here is the Syntax for that.
Create FullText Index “TWEETS_FTI” On “TWEETS”(“TEXT”)
TEXT ANALYSIS ON CONFIGURATION ‘EXTRACTION_CORE’;
As you execute the above command a FullText Index will be created on this table & text analysis will be on the Data of the table & additionally a $TA_TWEETS_FTI table will be created this table would be containing the token information for the Tweets data table.
Below is the structure of table $TA_TWEETS_FTI –
Now you can preview the data of $TA_TWEETS_FTI for getting the better understanding of the text analysis by SAP HANA.
So here is the Analysis done by SAP HANA Text Analysis capability –
In Above image you can see Search term #SAPHANA is highlighted & got the highest count in table now you can build your data model based on this $TA_TWEETS_FTI table & can put different where clause for analysis like Combination of tweets of SAP HANA & IOT or SAP HANA & Cloud etc.
Queries/Questions are most welcome.
Thanks.
Really great article to have an idea about text analysis with real time data....
Very elaborative and nice blog!
Thanks Shivam, its nice article.
Can you please advise what type of account we should have in twiiter? I tried to create account on dev site and there its asking for which API we need . Is that search API we request for?
Hi Rubane ,
When i created the account it was not asking for any API but yes you can request for SearchAPi if you are going to extract the twitter data based on some token.
Thanks,
Shivam
Hi Shivam,
Somehow I managed to create app in twitter. I used your dump and updated properties from my app.
But when I reached to the point where I need to test connection with twitter, I’m getting below error:–>some ssl is required
Exception in thread “main” 403:The request is understood, but it has been refused. An accompanying error message will explain why. This code is used when requests are being denied due to update limits (https://support.twitter.com/articles/15364-about-twitter-limits-update-api-dm-and-following).
message – SSL is required
I updated jar files from twitter4j and this error has gone but while testing twitter connection I'm getting
Exception in thread "main" java.lang.AssertionError: java.lang.IllegalAccessException: Class twitter4j.internal.logging.Logger can not access a member of class twitter4j.StdOutLoggerFactory with modifiers ""
at twitter4j.internal.logging.Logger.getLoggerFactoryIfAvailable(Logger.java:90)
at twitter4j.internal.logging.Logger.<clinit>(Logger.java:46)
at twitter4j.auth.OAuthAuthorization.<clinit>(OAuthAuthorization.java:46)
at twitter4j.auth.AuthorizationFactory.getInstance(AuthorizationFactory.java:40)
at twitter4j.TwitterFactory.<clinit>(TwitterFactory.java:39)
at com.saphanatutorial.util.TwitterConnection.getInstance(TwitterConnection.java:26)
at com.saphanatutorial.util.TwitterConnection.main(TwitterConnection.java:35)
Caused by: java.lang.IllegalAccessException: Class twitter4j.internal.logging.Logger can not access a member of class twitter4j.StdOutLoggerFactory with modifiers ""
at sun.reflect.Reflection.ensureMemberAccess(Unknown Source)
at java.lang.Class.newInstance(Unknown Source)
at twitter4j.internal.logging.Logger.getLoggerFactoryIfAvailable(Logger.java:83)
... 6 more
Any idea how to resolve this?
Thanks
Thanks
Hi,
I too got the same error.
Did you manage to resolve that?
Hi,
For your info....
I got it resolved by adding the following line in TwitterConnection.java file
cb.setUseSSL(true)
Rgds,
Murali
Thanks Shivam for the share. For people like me who have shifted focus from technology to business, blogs such as these with screen-shots for each step are a great respite. Thanks again!
Hi!, Thanks for all the information. In the page, you add Download Twitter-analysis App link, but it redirects for an URL where you cant donwload the file. Can you help me?
Thanks in advance
Kevin
Hi Kevin ,
Go here please - http://twitter4j.org/archive/twitter4j-4.0.7.zip
Thanks,
Shivam
Hi Shivam,
When I try to import Twitter4j-4.0.7.zip file into Eclipse then it said, no project found for import.
make sure you are in Java Perspective or do one thing share screenshot of what you tried to do.
Hi Shivam,
thanks for the information.I could not download twitter analysis app, when i tried to download using the link that you provided it showing "the site can't be reached". can you help with this?
http://twitter4j.org/archive/twitter4j-4.0.7.zip
Hello Shivam,
I am not able to access the site. Is it changed? Pl share with me the site to download the source.
Thanks
Hi Shivam,
When selecting the twitter4j-4.0.7 zip file it is showing the below warning. Can you please suggest how to proceed further. Also, I'm in Java perspective.
Thanks.
Hello Krishna Chaitanya,
Please follow below steps.
(1) Create a New Project (for e.g. TEMP) in Java Perspective
(2) Right Click --> Show In --> System Explorer --> Copy .classpath and .project file
(3) Paste to twitter4j-4.0.7 zip
(4) Delete the Newly Created Project (TEMP)
(5) Now Try to Import
Thanks Harshil. It worked now.
HI HArshil,
When I imported the twitter4j-4.0.7 zip file, I couldn't able to see the src folder and the packages under it. Any reason ?
You can create src
I am facing below error after importing twitter API, can you help me in that?
Can you let me know how we can do that. As I'm unable to see the any .java files.
Also, where exactly is it that you're facing the above error ?