Skip to Content

Digital and social media platforms play a significant role in decision making to support day-to-day activities such as selecting a product or service, watching a movie or choosing a restaurant. Online opinion is part of marketing strategy used by businesses to market their products, identify new opportunities, manage and maintain their brand image/reputation.

Now-a-days users provide comments and feedbacks about variety of products and services across many social networking sites, e-commerce websites and social blogs. Rise of digital and social media has fueled interest in the area of text sentiment analysis.  Thus, user generated text provides a rich source of user sentiments opinions about products and services. This dataset has immense potential to reveal useful insights from user sentiments on features/aspects of a product that can be used to improve and differentiate products to create competitive advantage.

Data and analytics are transforming the way industries are working, disrupting established traditional business models and leading the way to innovative business models based on unprecedented insights from markets and customers.

Properly thought out analytics framework can provide meaningful insights to support the selection decisions.  However, few challenges are associated with analytics framework in the era of digital transformation.  Some of such challenges are highlighted below:

  1. Capture and Store the huge dataset (Handle Big data)
  2. Analyze the dataset (Require Appropriate tools)
  3. Gather the right information at right time without loss of data (Understand the data)
  4. Visual Representation (For decision making)

One would need multitude of tools from several vendors to perform all the above-mentioned activities.  Making yourself familiar with all the tools and technologies was a perplexing task. However, SAP HANA, a disruptive innovation in the digital world overcome this limitation by providing overarching solutions covering tools necessary from end to end.

In this blog, we’ll show how to perform text sentiment analysis based on aggregation of movie reviews using HANA information views.  We will also highlight how we can use SQL and HANA Information views to perform Text analysis and visually represented the result set in diverse ways.

**********************************************************************************

Pre-work

Pre-work for Sentiment analysis would be to create a database table and insert the dataset in Database table, capturing the sentiments by creating a full text index with appropriate Text Analysis Configuration.

Detailed steps to complete this pre-work are explained in below blogs:

  1. https://blogs.sap.com/2018/02/01/sap-hana-text-analysis-3/
  2. https://blogs.sap.com/2018/03/15/custom-dictionaries-sap-hana-text-analysis/
  3. https://blogs.sap.com/2018/02/15/sap-hana-full-text-index/

**********************************************************************

Graphical Calculation View

Once the pre-work is completed, next step is to create a graphical calculation view in HANA Modeler Perspective which is focus area of this blog. Going forward with XSA, graphical calculation view would be one unified artifact for creating HANA models.

Follow the below steps:

  1. Create a Graphical Calculation view
  2. Select the full text index created in pre-work
  3. In Aggregation Node, select the full text index starting with $TA*.
  4. Add the following columns as attributes to the Output.
    • ID
    • TA_TOKEN
    • TA_TYPE
  5. Add a second TA_TOKEN column but add it as an Aggregated Column to the Output.
  6. Rename TA_TOKEN_1 to Counter
  7. Rename the other TA_TOKEN to Word
  8. Rename TA_TYPE to Sentiments
  9. Save, Validate and Activate Calculation View
  10. Data Preview

Figure1: Calculation view

**********************************************************

Visual Representation

Analytic Framework is used to get insights into data. Once the Calculation View is activated and data preview selected, different options (three tabs: Analysis, Distinct Values and Raw Data) are available as shown in Figure 2.

Figure2: Tabs in Data Preview

Tag cloud option can be used to identify the most frequent terms used in the Tweets as shown in Figure 3 below. Sentiment Analysis is used to perform orientation of the text into either positive or negative feedback.

Under analysis tab, use Word in Label axis as an attribute and Sum of counter in Value axis as a measure to represent the different forms like bar charts, pie charts, tag cloud and Radar etc.

Figure3: Tag Cloud Option in Analysis tab

In this example we used pie chart to show positive and negative emotions capturing the positive and negative sentiment in calculated column. From Figure 4 below, positive sentiments outscore the negative sentiments. In other words, it is giving a hint that audience received this movie positively.

Of course, Sample Size of Data set would be crucial factor in understanding the trend to be statistically significant. While analyzing the data set, we need to adapt the process of filtering out noise, understand the context, identify the relevant content and take an appropriate action. In other words, it is important to carefully pre-process the data set based upon the use case by considering anomalies permutations and combinations before analyzing the data set to reach conclusions.

Illustrating an example of noise filtering, a sample word “bad” in such a movie review might indicate a sentiment, however that word “bad” could also be a part of movie title or movie song and thus skew the sentiment analysis if proper attention to data cleansing or pre-processing is not given. Therefore, it is crucial to ignore counting such words occurring in the movie title or a song while performing sentiment analysis.

Figure 4: Chart Showing Positive or Negative Sentiments

Same information can also be represented in bar chart in Figure 5 using Distinct Values.

Figure 5: Distinct Value tab

Third tab raw data would show the resulted columns (ID, Word, Sentiment and Positive_Negative) with actual values in table format.

To simplify the use case, we captured only positive and negative emotions.  However, in real scenario, one can consider analyzing multiple emotions such as Positive, Anticipation, Anger, Fear, Joy, Surprise, Sarcasm, Negative based on the use case.

 

To report this post you need to login first.

6 Comments

You must be Logged on to comment or reply to a post.

  1. Ishan Mukewar

    Hello Esha,

    Nice Blog, very useful. Could you please elaborate more on logic of the calculated Column “Positive_Negative”.

    I am seeking for a logic that will help me to segregate all TA_TOKEN as either positive or negative in the form of a measure.

    TIA

    Kind Regards,

    Ishan Mukewar

    (0) 

Leave a Reply