[SAP HANA Academy] Live3: Text Analysis
[Update: April 5th, 2016 – The Live3 on HCP tutorial series was created using the SAP HANA Cloud Platform free developer trial landscape in January 2015. The HCP landscape has significantly evolved over the past year. Therefore one may encounter many issues while following along with the series using the most recent version of the free developer trail edition of HCP.]
In the next tutorial video in the SAP HANA Academy’s Live3 series Philip Mugglestone showcases how to perform text analysis using SAP HANA’s native capabilities. Text analysis allows us to identify the sentiment of a tweet. Watch Philip’s tutorial video below.
(0:40 – 5:10) How to Create a Text Index
Now that we have loaded real-time Twitter data into our SAP HANA table we can view the actual text of each individual tweet.
To create a Text Index open a new SQL console in Eclipse and make sure you’re set to the proper schema in your current session. Next with Notepad++ open the 03 setupTextAnalysis.sql in the scripts folder of the Live3 course GitHub code repository. Copy and paste the code into the SQL console.
The above code will create a full Text Index on the text column of the Tweets table. The index will be called tweets. A configuration will be set to extract the core voice of customer for the sentiment analysis part of the text analysis. The SQL specifies the language of each Tweet via the Language column of the Tweets table and also determines what languages we will do the sentiment analysis on. Sentiment analysis will be done on Tweets in English, French, German, Spanish, and Chinese. These five languages are currently supported for sentiment analysis. The final line of code actually turns on the text analysis.
Highlight the code and click the run button to execute it. Now for all of the data currently in the Tweets table as well as for any new rows that are added, a new table will be created automatically that will be logically seen as an index.
(5:10 – 9:00) Overview of $TA_Tweets Text Index
After refreshing the list of tables our newly created text index called $TA_Tweets will appear. By doing a data preview on $TA_Tweets we will see some important columns. These include TA_RULE, a column that confirms that entity extraction is occurring. The TA_TOKEN lists the piece of information that the text analysis is preformed on. TA_TYPE shows what type of text analysis is it and what has been extracted. Examples include location, social media and sentiment.
The text index determines the five types of sentiment a Tweet can have. For example the word “Grand” is determined to be strong positive sentiment and “Disappointed” is determined to be weak negative sentiment. Some words are labeled ambiguous and could be considered profane such as “loser.”
Follow along with the Live3 course here.
SAP HANA Academy over 900 free tutorial videos on using SAP HANA and SAP HANA Cloud Platform.