SAP PA and Twitter – Building Wordcloud
While doing some research on Sentiment and Text Analysis for one of my projects, I came across a really nice blogspot.
Inspired by the above, I thought of doing some sentiment analysis in SAP PA using twitter tweets.Hence decided to go ahead and do some text mining and Sentiment Analysis using the twitteR package of R.
I have created a multi-series blog where we see the different things we can do using SAP PA, R and Twitter.
First blog here talks about how get the twitter data inside SAP PA and build a word-cloud by building a text corpus.
I downloaded some public opinion data regarding Car Manufacturer from the NCSI-UK website.
The data is from 2009-2013. My intention was to just see what is the public sentiment of people for these manufacturers on Social Networking Site twitter and build a probable score for 2014 based on twitter sample population. I loaded the data in SAP PA. First I build a word cloud for some of the hashtags of the cars and plot a graph on number of re-tweets. In the next blog postings I will be doing Sentiment Analysis of this data and Emotion Classification.
Before I start let me make it clear that this is only sample data which was analyzed only for the purpose learning. It’s not to target any brand or influence any brand. The outputs and analysis shown here are just based on opinion and should not be considered facts.
Step1: Setting up the Twitter account and API for handshake with R
Please refer this step by step document to setup the twitter API and the settings required to call the API and get tweet data inside R.
Step2: Getting the tweet data in SAP PA and building a word-cloud.
Now we need to create a custom R component to get the data into SAP PA and create a text corpus and display it as a word-cloud. I have used the tm_map function comes that comes with the tm package for setting up the text corpus data for word-cloud. The various commands are self-explanatory as shown in the comments. I have used wordcloud package to generate the word-cloud.
The code below lists down the steps you need to do to get the desired output. The configuration settings are shown in the screenshots below.
mymain<- function(mydata, mytweet, mytweetnum)
##Load the necessary packages
## Enable Internet access.
##Load the environment containing twitter credential data (saved in Step 1)
##Establish the handhsake with R
options(RCurlOptions = list(cainfo = system.file(“CurlSSL”, “cacert.pem”, package = “RCurl”)))
##Get the tweet list from twitter site (based on parameters entered by user)
tweetList <- searchTwitter(mytweet, n=mytweetnum)
##create text corpus
r_stats_text <- sapply(tweetList, function(x) x$getText())
r_stats_text_corpus <- Corpus(VectorSource(r_stats_text))
##clean up of twitter Text data by removing punctuation and English stop words like “the”, “an”
r_stats_text_corpus <- tm_map(r_stats_text_corpus, tolower)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, removeWords, stopwords(“english”))
r_stats_text_corpus <- tm_map(r_stats_text_corpus, stemDocument)
##Build and print wordcloud
out2 <-wordcloud(r_stats_text_corpus, scale=c(10,1), random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=”blue”)
## Return the twitter data in a table
result <- as.data.frame(cbind(Audi.df$text, Audi.df$created, Audi.df$statusSource, Audi.df$retweetCount))
Running the Algorithm and getting the output:
The output table (created on is char):
The general opinion of the public from wordcloud seems positive. However we will do a detailed sentiment analysis of the various brands in our source file and plot the heat map based on 2013 survey findings in my next blog. This will help us know whether current public sentiment is in line with survey findings.
To be continued in Sentiment Analysis.