Skip to Content
Author's profile photo Flavia Moser

Creating A Word Cloud with R-Visualizations in SAP Analytics Cloud

Creating A Word Cloud with R-Visualizations in SAP Analytics Cloud

Authors: Benton Li, Jiying Wen

In this blog, we show how to create word clouds to visually represent textual data. Word clouds allow users to quickly gain insights from their data by showing higher frequency words in larger font sizes.

By building word clouds within SAP Analytics Cloud, users can:

  • Create a simple to understand visual using text data
  • Identify important textual data points
  • Uncover valuable feedback to guide business decisions

The procedure of creating a word cloud is very simple using the text mining package tm and word cloud generator package wordcloud. These packages are available in R to help users analyze text and quickly visualize keywords. Let’s take a look at two scenarios where word clouds are beneficial:

  • Top Performing Product Items
  • Most Discussed Topics

Example 1: Top Performing Product Items

Note that datasets used to create a word cloud should have at least 10 dimension values. In this example, there are 29 unique product names that will produce a full, cloud-like output.

This example uses transactional data that involves the sales of drinks. We want to easily visualize the top performing product items.

Steps:

  1. Upload the dataset
  2. Go to “Insert” toolbar, click “R Visualization”
  3. Click “Add Input Data”; under Rows, click “Add Dimension” to specify the textual data that will appear in the word cloud.In this example, the user would like to see the different types of products in the word cloud
  4. Under “Columns”, select the “Manage Filters” icon on the top right to specify the measure that the frequency will be based on. Quantity Sold is chosen therefore, higher the quantity sold of a product, the larger and bolder the product name will be.
  5. Now that the measure and dimension have been specified, select “OK”
  6. Next, select “Add Script”
  7. Paste the following code into the “Editor”:
# load package
library(wordcloud)

# get words
words <- BestRun_Demo$Product

# get frequency
frequency <- BestRun_Demo$'Quantity sold'

# generate word cloud
wordcloud(words, frequency, scale = c(3, 1), rot.per=0.2, colors=brewer.pal(8, "Dark2"))

8. Click “Execute”, followed by “Apply”

Result: With a simple look at the word cloud, we can see that Orange with pulp, Dark Beer, and Orange Crush are amongst the top performing drinks within this example.

 

Example 2:  Most Discussed Topics

This example uses data scraped from the @SAP official Twitter account with the date range: March 2016 – June 2017.

Steps:
From “R Visualization” in the Insert toolbar, click “Add Input Data.” Under Rows, click “Add Dimension” and specify “text” to analyze all the tweets. Click “Ok”.

Paste all of the following code chunks into the Editor. Each code chunk pertains to certain actions:

1. Execute packages and extract text data

library(wordcloud)
library(tm)
tweets = as.character(sapTweetscsv$text)​

2. Clean up the text by removing unnecessary white space, converting text to lower case, and removing common stop words (“the”, “we”, etc.)

Note: you can customize the filtering of stop words by adding selected words into the stopwords collection. In this example, we added “amp”, “will” and others.

 

# data cleaning
# remove emoji
tweets <- iconv(tweets, 'UTF-8', 'ASCII')

# remove http links
tweets <-  gsub("http[s]?://[[:alnum:].\\/]+", "", tweets)

# create a corpus
corpus <-  Corpus(VectorSource(tweets))

# create document term matrix applying some transformations
tdm <-  TermDocumentMatrix(corpus,
                         control = list(removePunctuation = TRUE,
                                        stopwords = c("amp", "will", "sap", "via", "can", "just", stopwords("english")),
                                        removeNumbers = TRUE, tolower = TRUE))

3. Build a data frame with words and their frequencies

# define tdm as matrix
m <-  as.matrix(tdm)

# get word counts in decreasing order
word_freqs <-  sort(rowSums(m), decreasing=TRUE)

# create a data frame with words and their frequencies
dm <-  data.frame(word=names(word_freqs), freq=word_freqs)

4. Generate the word cloud

# plot wordcloud
wordcloud(dm$word, dm$freq, scale = c(5, 1), max.words = 50, random.order = FALSE,
          rot.per = 0.2, use.r.layout = FALSE, colors=brewer.pal(8, "Dark2"))



5. Click “Execute” and “Apply”

Result: SAP, digital, iot are some of the most frequent topics that have been discussed on Twitter. Taking this to the marketing and communications teams can be useful in tailoring their promotional strategy.

Arguments in the word cloud function:

  • words : the words to be plotted
  • freq : word frequencies
  • min.freq : words with frequency below min.freq will not be plotted
  • max.words : maximum number of words to be plotted
  • random.order : plot words in random order. If false, they will be plotted in decreasing frequency
  • rot.per : proportion words with 90 degree rotation (vertical text)
  • scale : largest and smallest relative font sizing
  • colors : color words from least to most frequent. Use, for example, colors =“black” for single color.

Summary

With the R Visualization feature, SAP Analytics Cloud can reveal in-depth insights that help you make end-to-end decisions with confidence.

Assigned Tags

      5 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Scott Stellmon
      Scott Stellmon

      Is it possible to get a copy of the beverage dataset?  Trying to understand how the data is structured to get a nice outcome.

      Author's profile photo Gagan Bakhshi
      Gagan Bakhshi

      Hello,

      I see a certain anomaly in this platform, once you load the dataset, and if you vary an input parameter, then it skips some records in the dataset i.e

      set = Name - A,B,C,D

      Age - 10,11,12,13

      input parameter - i1 [set to a default]

      reading the set:

      s1 <- as.data.frame(set) [Works fine]

      changing input parameter i1 value

      s1 <- as.data.frame(set)

      Name - B,D

      Age -11,13

       

      Why does the dataset change with a change in input parameter (without even filtering based on that parameter, just a value change)

      Author's profile photo Ulf Arbstig
      Ulf Arbstig

      Hi,

      Thanks for sharing this.

      In your first example, where and how in the script would you add an encoding = "UTF-8" to get correct result for non English letters?

      Author's profile photo Tracey-Lee February
      Tracey-Lee February

      Hi there

       

      I would just like to understand how user-friendly this solution is. I understand that I will need to link R to SAC in order to use R-visualizations, however once the solution is implemented and the visual is available for use by the end users, would it be a problem if R is not installed on PC's/linked online. i.e. Is it just the developer that needs to have the link between R and SAC?

      Author's profile photo Flavia Moser
      Flavia Moser
      Blog Post Author

      Hi Tracey-Lee,

       

      The R server is associated with the SAC tenant. Thus, once it's set up / enabled, any user of this tenant will benefit from R (without having to do any additional configuration / setting up their own R server).

       

      Best wishes,

      Flavia