Analyze Social media tweets using SAP HANA Cloud Platform
I was recently at SAP TechED and was having a conversation with my colleague Jon Gooding as to what people are actually talking about SAP TechED. Jon had earlier used SAP HANA to pull information form twitter and report on the data in another SAP event. I wanted do something similar in a real time fashion using HANA Cloud Platform in order to quickly understand what people are talking about SAP TechED. There are lot of organizations who want to obtain information on what their customers are actually talking about them in social media. Social media can be an incredibly important tool for many businesses out there. This can help them to understand their customer’s sentiments and accordingly change some of their business processes. There are standard integration contents available in HANA Cloud Integration to quickly obtain twitter feeds into HANA Cloud Platform. In this blog, I am going to show how this can be done using Smart Data Integration services which are now available on HANA Cloud Platform.
- HANA Cloud Platform Trial account
- Multi-tenant HANA Database on trial account
- Installation & configuration steps are outlined by Former Member in this blog.
Data Provisioning agent
Assuming the Data provisioning agent is installed and setup, launch the Data Provisioning agent and register the twitter adapter
You should now be able to use the twitter adapter setting in the HANA System which is available on HANA Cloud Platform
Create a twitter application by navigating to https://dev.twitter.com and click on “Manage my Apps”
Follow the wizard to create an application. Once it is created, under “Keys and access token” tab, you will be able to find some info around the keys which will be required later.
Setup remote source in HANA
Switch your HANA system which is available on HANA Cloud Platform. In the catalog view, under Provisioning add a new Remote source.
Select Twitter Adapter and provide all the keys which were provided when the Twitter application was created earlier.
If you expand the remote source (I have named mine as “MyTwitter”) , you should see more folders. Under Search > Tweets, create a virtual function.
Provide a name for the virtual function and a schema to be used.
This should have created a function under the SDI_TABLES schema
I can now use this Virtual Function to query the tweets which I am after. In the below example, I am trying to search 100 recent tweets which have hash tag #saphcp and #SAPTechED.
SELECT *FROM "Tweets_Search"('#saphcp && #SAPTechED',100,null,null,null,null,null,null,null);
This should list tweets as below
The data is not persisted in HANA. If you wish to persist this data and perform some mash-up and data manipulation, it makes sense to store the data temporarily. You can create a table as shown below
CREATE COLUMN TABLE my_tweets( Id BIGINT, ScreenName NVARCHAR(256), Tweet NVARCHAR(256), Source NVARCHAR(256), Truncated TINYINT, InReplyToStatusId BIGINT, InReplyToUserId BIGINT, InReplyToScreenName NVARCHAR(256), Favorited TINYINT, Retweeted TINYINT, FavoriteCount INTEGER, Retweet TINYINT, RetweetCount INTEGER, RetweedByMe TINYINT, CurrentUserRetweetId BIGINT, PossiblySensitive TINYINT, isoLanguageCode NVARCHAR(256), CreatedAt TIMESTAMP, Latitude DOUBLE, Longitude DOUBLE, Country NVARCHAR(256), Place_name NVARCHAR(256), Place_type NVARCHAR(256), UserId BIGINT, UserName NVARCHAR(256), UserUrl NVARCHAR(256) );
Push all your tweets into this table using the below SQL command
INSERT INTO "SDI_TABLES"."MY_TWEETS" SELECT * FROM "Tweets_Search"( '#saphcp && #SAPTechED', 100, null, null, null, null, null, null, null );
The new table “MY_TWEETS” should contain all the tweets which I am after.
Under the remote source system, there are other options. Lets take the User_Stream under “Streams” folder
unlike the previous one, you will now be provided an option to create a virtual table. Provide the table name and the schema.
Verify that the virtual table under the schema has been created.
Execute the below SQL command to create a HANA Table as the same structure of the Virtual table
CREATE COLUMN TABLE "SDI_TABLES"."T_USER_STREAM" AS (SELECT * FROM "SDI_TABLES"."VT_USER_STREAM")
Now there should be a HANA Table and the corresponding virtual table
Lets switch to the Workbench editor and create a flow graph from a package which you can use for this development
Select the virtual table under “Input Types” and the target HANA table under “Output types”. Also, place a filter in the content area as shown below and link them all up.
Select the Filter to view the details. You can perform all the mappings and also apply filter conditions as to what type of tweets you want to persist in the target table. You can specify all the conditions in the “Filter Node” tab. For example, if you want to know what one particular user is tweeting on this topic, you can apply the filter here. I am just leaving this blank for now.
Click on the properties icon at the top left hand corner of the flow graph.
In the properties menu, ensure that “Real-time” is selected.
Also, navigate to the details of the Virtual table under Input types and select “Real-time”.
Execute the flowgraph. It will provide a popup screen to provide the parameters. Provide the name of the Virtual table and Target table under the schema which is being used.
This would load all the twitter feeds into the target HANA Table in real-time. Below is the content of the target HANA table.
Once the twitter feeds are loaded into a HANA table, the next step is to apply sentiment analysis which will add more context to what we are trying interpret with the data collected.
We will use a native feature of SAP HANA to perform text analysis.
Run the below SQL command
CREATE FULLTEXT INDEX "SA_TWITTER" ON "SDI_TABLES"."MY_TWEETS"("TWEET") CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' TEXT ANALYSIS ON;
This will create a new object as shown below.
When you view the contents of this table, you will find the contents of the real-time tweets categorized by positive or negative sentiments as shown below.
You can now take this data and further create rich charts which show the current sentiment of users on a particular event or brand and help the business to act accordingly.