Visualizing Data-Driven Insights from Sentiment Analysis Models
This is a continuation from my previous blog post, https://blogs.sap.com/2020/11/30/how-to-deploy-a-natural-language-processing-model-with-tensorflow-on-sap-data-intelligence/, which showcased how to create, train and deploy a basic Sentiment Analysis with TensorFlow and Deep Learning on SAP Data Intelligence for inference. That blog was largely technical in that it merely showed the instructions on how to perform an action while not showcasing any use cases that could be derived from it. In this very short, follow-up blog post, I shall show some of the insights that can be derived from such models. I recently used Sentiment Analysis as well as Aspect term Extraction for a POC that involved having to discover the specific aspects and the sentiment affiliated with each respective aspect for within a body of laptop reviews. Please note that the dataset consisting of laptop reviews within this blog is a non-academic dataset meaning that it does not belong to one of the many datasets found in academic papers such as the SemEval datasets and are almost all exclusively from 2020 as well. Aspect-Based Sentiment Analysis (ABSA) can be a powerful tool to understand what your customers like or dislike about a product and to what extent. I shall be using the SAP Analytics Cloud software system to showcase my results. First, I showcase how my results for Aspect Term Extraction. The results showcase the ABSA system’s results on a dataset consisting of over 3500 laptop reviews.
The seven aspects by which my model was trained on were; battery, heat, memory, ports, price, quality and speed. My ABSA system has determined that over half of all the reviews are speaking about the memory of the laptop i.e. the hard drive, Dish Drive and Random Access Memory (RAM). From personal experience, this is typically what I complained about when I used to buy a lower quality laptop. While Price and and Battery also made large portions of the reviews, there were extremely few reviews concerning the battery or quality of the laptop being purchased. Of course, while it is very useful to know what your customers are talking about, let’s see whether they were speaking about it in a positive or negative way and to what extent.
As you can see, most reviews are positive with each aspect receiving a portion of both positive and negative that is in line with the proportion they made up the aspects in the previous visualization. However useful, this does not take full advantage of the Sentiment Analysis portion of the ABSA system, as such, I will now show the degree of polarity or rather how positive a review is.
Using a Sentiment Analysis model trained via TensorFlow akin to how I showed in my previous entry mentioned in the beginning of this blog post, I have been able to determine the average polarity by which customers are speaking about towards each respective aspect. It seems that the most positively spoken aspect given the dataset is the battery while the other tend to be between .19 and .21. However, this is but an average taken across the entire dataset, it would be nice to see how the polarity of reviews changes over time in relation with the ratings that customers give them. This could also showcase whether customers’ polarity matches the ratings that they give the reviews.
As our polarity is a measure of sentiment between 0 and 1 and the ratings that customers give to reviews range between 0 and 5, I have decided to normalize the ratings so that they are on the same scale and comparable. I also think it would be useful to look at the change in polarity towards a particular aspect over time rather than looking at the polarity averaged over all topics.
Fortunately, thanks to the easy customization feature of SAP Analytics Cloud, I am able to show the change in polarity over time for the Memory aspect. A vendor or e-commerce site could then see if there is perhaps a sudden drop or increase at any particular point in time due to a software or hardware upgrade. After doing so, it seems the polarity towards reviews on average tends to fall below the normalized rating that customers give them. This means that, on average, customers speak about reviews less positively than how they rate them. This could be for a variety of reasons such as perhaps the customers are merely stating some critiques of the product which is meant to show case things that they believe could be improved or perhaps the rating scale needs to be changed to give customers a wider array of ratings. Nonetheless, this is where a data scientist should look into the data or reviews to see why this my be the case. For now, I will look into the distribution of words within each specific aspect and see if there are any keywords which customers are using to describe the product. I will choose to look into the keywords for negative reviews for both Speed and Memory.
Here, you may see an issue which is common throughout natural text within product reviews, customers use the same or similar words to describe products. However, what is of use here is that within the negative reviews for Speed, the keyword start comes up which may be alluding to the startup of the computer when it is turned on or perhaps when a user attempts to start an app. Additionally, the negative Memory reviews show that Windows and Support come out which may indicate an issue with the Windows Operating Systems and its support for other applications. However, one would need to look into the data to find out exactly why. While I have done so, I will eschew showing all this for the sake of keeping this blog short and simple. We have now seen some of the insights that we can gain from using NLP on SAP Data Intelligence and visualizations from SAP Analytics Cloud.