I worked on a project recently where one of the requirements was to perform sentiment analysis based on Twitter data and to surface that in meaningful fashion in a dashboard. We wanted to isolate the text tokens that were identified and assign measure values to them based on the number of times they occurred in the data (i.e. a count on each ID), as well as the overall sentiment score that we calculated inside of HANA. We also wanted it to be possible to link back to the source tweets based on a selection (i.e. summary to details level navigation).
Once the idea was baked, I needed a visualization to support the model we had created. I settled on a Word cloud (or Tag cloud if you prefer) using D3 and the D3 word cloud layout by Jason Davies, because, although there’s some argument about the value of Word clouds, for text analysis and isolation of twitter tokens, it seemed to be a good fit. I also noticed that the word cloud was available in Lumira but I couldn’t find a readily acceptable replacement for Design Studio. Primarily this was because I needed to handle a click event to retrieve a list of tweets based on the originating token.
Here is a picture of the Word cloud layout from the site above in case you aren’t familiar:
It took me a little while to create the layout properly, as some of the examples were incomplete, but eventually I came out with a word cloud that avoided collisions pretty well and that I could attach up to 2 measures to. The size of the tags is derived from a measure of your choice, but I used occurrences of a given text token. If you don’t have a 2nd measure, this will cause the word cloud to render the words in dark gray:
I pulled the Twitter data from the feed of the JW Marriott Marquis in Dubai, not the customer we were working with, just a really cool hotel that generates some positive buzz on Twitter. In my case, I wanted to use HANA text analysis (I couldn’t find a version of this guide for SPS09) to also derive sentiment from the tags, and color them either green or red depending on whether or not the sentiment is positive or negative (neutral sentiments remain gray). The text analysis tables in HANA contain information about whether a given token is associated with strong positive, weak positive, neutral, weak negative, or strong negative sentiment. The contents look like this:
By building a calculation view that converted TA_TYPE into a numeric column ranging from -2 (strong negative) to 2 (strong positive). This gave me a chance to use a poly linear color scale in D3:
var colorScale = d3.scale.linear() .domain([-2, 0, 2]) .range(["red", "#5D6770", "green"]);
Once a measure is attached for color, the word cloud looks like this:
As I said before, the buzz is pretty positive about this particular hotel (looking at the pictures I can imagine why!). I borrowed Manfred Schwarz‘s D3 on-click event to link back to the details behind a token (i.e. specific tweets) and made it available in a BIAL on-click event:
Finally, I added some Display properties you can set to rotate text tags < 5 characters 90 degrees (as above), randomly, or not at all, as well as a few fonts for the text itself.
I tried to make the rotation and text available through BIAL methods but I could only manage it if I deleted all the text elements first. Since the keys weren’t really changing, just the property, the enter() method wasn’t picking it up any other way. This worked but it had the additional (undesired) effect that the cloud would re-compute with each click. It looked cool but was too distracting for the final product. I removed the BIAL methods from the final commit, but you can always use the .wordcloud selector in CSS to change font-family beyond the 3 I provided as predefined options. Check the animated GIF below (click if it doesn’t start) for what it did look like and if anyone has a better idea for the rotation I would take it!
So there you have it, my first extension contribution! While I work out how to contribute to the development community (needing to read this post: SDK Development Community Git Repository (sdkpackage)) I put the source here:
I’ll update the post once it’s there.