SAP PA and Twitter – Building Wordcloud
While doing some research on Sentiment and Text Analysis for one of my projects, I came across a really nice blogspot.
http://www.slideshare.net/jeffreybreen/r-by-example-mining-twitter-for
Inspired by the above, I thought of doing some sentiment analysis in SAP PA using twitter tweets.Hence decided to go ahead and do some text mining and Sentiment Analysis using the twitteR package of R.
I have created a multi-series blog where we see the different things we can do using SAP PA, R and Twitter.
First blog here talks about how get the twitter data inside SAP PA and build a word-cloud by building a text corpus.
Scenario:
I downloaded some public opinion data regarding Car Manufacturer from the NCSI-UK website.
http://ncsiuk.com/index.php?option=com_content&task=view&id=18&Itemid=33
The data is from 2009-2013. My intention was to just see what is the public sentiment of people for these manufacturers on Social Networking Site twitter and build a probable score for 2014 based on twitter sample population. I loaded the data in SAP PA. First I build a word cloud for some of the hashtags of the cars and plot a graph on number of re-tweets. In the next blog postings I will be doing Sentiment Analysis of this data and Emotion Classification.
Before I start let me make it clear that this is only sample data which was analyzed only for the purpose learning. It’s not to target any brand or influence any brand. The outputs and analysis shown here are just based on opinion and should not be considered facts.
Step1: Setting up the Twitter account and API for handshake with R
Please refer this step by step document to setup the twitter API and the settings required to call the API and get tweet data inside R.
Setting up Twitter API to work with R
Step2: Getting the tweet data in SAP PA and building a word-cloud.
Now we need to create a custom R component to get the data into SAP PA and create a text corpus and display it as a word-cloud. I have used the tm_map function comes that comes with the tm package for setting up the text corpus data for word-cloud. The various commands are self-explanatory as shown in the comments. I have used wordcloud package to generate the word-cloud.
The code below lists down the steps you need to do to get the desired output. The configuration settings are shown in the screenshots below.
mymain<- function(mydata, mytweet, mytweetnum)
{
##Load the necessary packages
library(twitteR)
library(RJSONIO)
library(bitops)
library(RCurl)
library(wordcloud)
library(tm)
library(SnowballC)
## Enable Internet access.
setInternet2(TRUE)
##Load the environment containing twitter credential data (saved in Step 1)
load(‘C:/Users/bimehta/Documents/twitter authentication.Rdata’)
##Establish the handhsake with R
registerTwitterOAuth(credential)
options(RCurlOptions = list(cainfo = system.file(“CurlSSL”, “cacert.pem”, package = “RCurl”)))
##Get the tweet list from twitter site (based on parameters entered by user)
tweetList <- searchTwitter(mytweet, n=mytweetnum)
##create text corpus
r_stats_text <- sapply(tweetList, function(x) x$getText())
r_stats_text_corpus <- Corpus(VectorSource(r_stats_text))
##clean up of twitter Text data by removing punctuation and English stop words like “the”, “an”
r_stats_text_corpus <- tm_map(r_stats_text_corpus, tolower)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, removePunctuation)
r_stats_text_corpus <- tm_map(r_stats_text_corpus, removeWords, stopwords(“english”))
r_stats_text_corpus <- tm_map(r_stats_text_corpus, stemDocument)
##Build and print wordcloud
out2 <-wordcloud(r_stats_text_corpus, scale=c(10,1), random.order=FALSE, rot.per=0.35, use.r.layout=FALSE, colors=”blue”)
print(out2)
## Return the twitter data in a table
result <- as.data.frame(cbind(Audi.df$text, Audi.df$created, Audi.df$statusSource, Audi.df$retweetCount))
return(list(out=result))
}
Configuration Setting:
Running the Algorithm and getting the output:
The output table (created on is char):
Visualizations:
The general opinion of the public from wordcloud seems positive. However we will do a detailed sentiment analysis of the various brands in our source file and plot the heat map based on 2013 survey findings in my next blog. This will help us know whether current public sentiment is in line with survey findings.
To be continued in Sentiment Analysis.
Hello,
I am trying to implement a similar model.
Since OAuthFactory is no longer supported, I tried using
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)
When I run the model, this particular code crashes the "Expert Analytics" application.
Do you have any idea how to handle this?
Thank you.
Tamilnesan
Hi,
What is the error you get in the PA logs?
Logs are here: C:\Users\<Your Windows User>\AppData\Local\Temp\sappa\logs
Thanks & regards
Antoine
Hi Antoine,
Attaching the log snippet.
This happens only with SAP PA, the same script runs properly in R Studio.
Looking forward to your suggestions.
Thank you.
Thanks.
Can you also provide us with the R script and the error message you see ? You can also share via this post or send them to my email address at sap.com.
Cheers,
Antoine
Hi,
Sorry for the delay. Is it a possibility that you can share the LUMS file as well? I tried to reproduce this morning with no luck - as I am missing the original data I guess.
Thanks & regards
Antoine
Any luck already how you resolved this issue? I'm bumping into the same...
With kind regards,
Martijn
For me the code runs perfectly in R studio, but when I try this in SAP PA expert mode version 2.3, the tool automatically gets closed. I tried it several time but the tool exits whenever I tried to run the custom component.
Ranajay Sit is it a possibility that you can share the LUMS file with me? I tried to reproduce this morning with no luck - as I am missing the original data I guess.
Hi Antoine,
I am using the below dataset.
Regards
Ranajay
Bingo - I reproduced it. It took me a while to configure the missing R packages (twitteR and the likes) but I am now done. Thanks for the hint!
Antoine
So did you figure out the problem?
Not yet, I passed this to our engineering team for further investigation.
Thanks Appreciate it 🙂
Hi,
I try to run this in PA but it seems Twitter has ceased registration for new apps. I could not generate the twitter_authentication.rdata which is required by the component in the main post (because lack of Cosumer Key and Consumer Secret). Anyone knows how to deal with this?
Cheers
Wei
Hi Wei,
You can still get the consumer key and consumer secret from Twitter. I have recently done this for the HANA Data Provisioning agent which also requires the same parameters. To find these you can look at the HANA Academy video, SAP HANA Academy - Smart Data Integration/Quality : Twitter Replication Pt 1 of 3 [SPS09] - YouTube at 2:40.
Hi Ian,
Twitter needs to register mobile phone to the account to setup an Twitter App but it fails to register my mobile phone. It seems it has been like this for a while. Anyway, problem is solved. I got a script that has twitter auth info. Thank you for the information.
Thanks,
Wei
Hi . . I have a Question here . .
You did Text mining in R with the #WT20 Data. But when it comes to SAP EA why the results are "Car Satisfaction reviews" ?