First steps with sentiment analysis in SAP Predictive Analytics 2.4
Yesterday I installed the latest version of SAP Predictive Analytics (download a trial here: Welcome | SAP) and I couldn’t wait to try out the new ‘sentiment analysis’ component in Expert Analytics, one of the most intriguing novelties in this release.
The component looks at a SAP HANA table or view with a text column and, for each record, finds if the text conveys a positive, negative or neutral sentiment (and it also identifies the sentiment of emoticons, problem statements and profanities -which you might want to automatically filter out of your analysis-). Behind the scene this components manipulates the SAP HANA Text Analysis engine exposing it with a simple interface.
Let’s see what I could do in a few minutes as a first exploration of the functionality.
The initial requirement is that you need to have SAP HANA and, in HANA, you need a table (or view) with a column containing text.
My colleague and friend Jayanta Roy provided me a sample dataset containing tweets on football (related to Manchester United -#MANU- and Liverpool -#LFC- )
After launching SAP Predictive Analytics 2.4 I created a new document ‘connected’ to SAP HANA, selected the table containing the tweets and isolated only a few fields which I wanted to use for my test: the tweet text, its hashtag and the country where the tweet originated.
In the next screenshot you can see some of the content. The dataset contains more than 300 000 records.
Now I was ready to go to the Predict room and drag the Sentiment Analysis component into the project page and apply it to the dataset. Notice that this component is found under the Data Preparation blocks.
This is quite important: sentiment analysis is not necessarily a goal per se but is rather to be intented as a step to enrich data in a predictive project.
Opening the Configure Settings page of the component I declared that the text field to analyze is ‘TWEET’
then in the Advanced tab I set that the analysis didn’t need to identify problem statements and just had to return me three sentiment values: Positive, Negative and Neutral.
I did so by simply typing the return value text into each possible sentiment detected by the tool (in the example here below I just say that all different ‘positive’ levels of sentiment map to the Positive keyword, the same for Negative and Neutral)
After the mappings were typed in, I clicked on Done and then executed the project.
In the picture below you can see an excerpt of the output showing how the sentiment analysis component has identified the sentiment of the text, the ‘token’ (the word or concept in the text, which justifies the sentiment choice) and the Parent_Token which gives the context into which the token has been used.
The data generated here could now be used to enrich an existing dataset (e.g. the number of positive or negative tweets about a football team could be related to the renewal rate of subscriptions to the team magazine or to the number of home match tickets sold).
The sentiment analysis component can also be a source for another component (e.g. if you want to filter it or write back the analysis to SAP HANA for further processing).
In my small and unrepresentative dataset I just wanted to visualize the results and have a very basic summary of the tweets.
Going in the Visualize room of SAP Predictive Analytics I created a new measure from the “Sentiment” dimension with the Count aggregation and then defined a barchart graph where the Sentiment was plotted against the hashtag.
I could see that the larger number of tweets was related to the #LFC hashtag and that, in general, there were many more positive tweets than negative ones.
Using a tag cloud I also isolated the 30 most used token words, again here positive ones are appearing most often (and if you look well, you can see that I blurred out a profanity from the image because I forgot to enable the profanity filter in the Configure Settings panel beforehand :-)).
That’s it. In about 10 minutes I was able to run a very simple project to analyze some tweets in SAP Predictive Analytics by manipulating the SAP HANA Text Analysis engine without a single line of code.
This visual approach saved me a lot of time and reduced the trial and error phase I would have gone through if I wanted to do all from scratch within SAP HANA Studio.
Now that I understood what the sentiment analysis component of SAP Predictive Analytics does and which results I could get, it is time to look around me and see how I can apply it to improve my business. I hope you are going to do the same!