Creating A Word Cloud with R-Visualizations in SAP Analytics Cloud
Authors: Benton Li, Jiying Wen
In this blog, we show how to create word clouds to visually represent textual data. Word clouds allow users to quickly gain insights from their data by showing higher frequency words in larger font sizes.
By building word clouds within SAP Analytics Cloud, users can:
- Create a simple to understand visual using text data
- Identify important textual data points
- Uncover valuable feedback to guide business decisions
The procedure of creating a word cloud is very simple using the text mining package tm and word cloud generator package wordcloud. These packages are available in R to help users analyze text and quickly visualize keywords. Let’s take a look at two scenarios where word clouds are beneficial:
- Top Performing Product Items
- Most Discussed Topics
Example 1: Top Performing Product Items
Note that datasets used to create a word cloud should have at least 10 dimension values. In this example, there are 29 unique product names that will produce a full, cloud-like output.
This example uses transactional data that involves the sales of drinks. We want to easily visualize the top performing product items.
- Upload the dataset
- Go to “Insert” toolbar, click “R Visualization”
- Click “Add Input Data”; under Rows, click “Add Dimension” to specify the textual data that will appear in the word cloud.In this example, the user would like to see the different types of products in the word cloud
- Under “Columns”, select the “Manage Filters” icon on the top right to specify the measure that the frequency will be based on. Quantity Sold is chosen therefore, higher the quantity sold of a product, the larger and bolder the product name will be.
- Now that the measure and dimension have been specified, select “OK”
- Next, select “Add Script”
- Paste the following code into the “Editor”:
# load package library(wordcloud) # get words words <- BestRun_Demo$Product # get frequency frequency <- BestRun_Demo$'Quantity sold' # generate word cloud wordcloud(words, frequency, scale = c(3, 1), rot.per=0.2, colors=brewer.pal(8, "Dark2"))
8. Click “Execute”, followed by “Apply”
Result: With a simple look at the word cloud, we can see that Orange with pulp, Dark Beer, and Orange Crush are amongst the top performing drinks within this example.
Example 2: Most Discussed Topics
This example uses data scraped from the @SAP official Twitter account with the date range: March 2016 – June 2017.
From “R Visualization” in the Insert toolbar, click “Add Input Data.” Under Rows, click “Add Dimension” and specify “text” to analyze all the tweets. Click “Ok”.
Paste all of the following code chunks into the Editor. Each code chunk pertains to certain actions:
1. Execute packages and extract text data
library(wordcloud) library(tm) tweets = as.character(sapTweetscsv$text)
2. Clean up the text by removing unnecessary white space, converting text to lower case, and removing common stop words (“the”, “we”, etc.)
Note: you can customize the filtering of stop words by adding selected words into the stopwords collection. In this example, we added “amp”, “will” and others.
# data cleaning # remove emoji tweets <- iconv(tweets, 'UTF-8', 'ASCII') # remove http links tweets <- gsub("http[s]?://[[:alnum:].\\/]+", "", tweets) # create a corpus corpus <- Corpus(VectorSource(tweets)) # create document term matrix applying some transformations tdm <- TermDocumentMatrix(corpus, control = list(removePunctuation = TRUE, stopwords = c("amp", "will", "sap", "via", "can", "just", stopwords("english")), removeNumbers = TRUE, tolower = TRUE))
3. Build a data frame with words and their frequencies
# define tdm as matrix m <- as.matrix(tdm) # get word counts in decreasing order word_freqs <- sort(rowSums(m), decreasing=TRUE) # create a data frame with words and their frequencies dm <- data.frame(word=names(word_freqs), freq=word_freqs)
4. Generate the word cloud
# plot wordcloud wordcloud(dm$word, dm$freq, scale = c(5, 1), max.words = 50, random.order = FALSE, rot.per = 0.2, use.r.layout = FALSE, colors=brewer.pal(8, "Dark2"))
5. Click “Execute” and “Apply”
Result: SAP, digital, iot are some of the most frequent topics that have been discussed on Twitter. Taking this to the marketing and communications teams can be useful in tailoring their promotional strategy.
Arguments in the word cloud function:
- words : the words to be plotted
- freq : word frequencies
- min.freq : words with frequency below min.freq will not be plotted
- max.words : maximum number of words to be plotted
- random.order : plot words in random order. If false, they will be plotted in decreasing frequency
- rot.per : proportion words with 90 degree rotation (vertical text)
- scale : largest and smallest relative font sizing
- colors : color words from least to most frequent. Use, for example, colors =“black” for single color.
With the R Visualization feature, SAP Analytics Cloud can reveal in-depth insights that help you make end-to-end decisions with confidence.