I am not a crazy online shopper like some of my friends, but I found it very useful and interesting to read some of the customer product reviews posted by online retailers like Amazon.com. One time on a bus heading back home after work, a lady sitting next to me was reading a book “To the Lighthouse” by Virginia Woolf which immediately aroused my curiosity. To see if it could be my next vacation book, I went to Amazon.com to see what other people say about the book. Amazon.com did a good job by collecting customer reviews on products, but, in terms of analyzing and presenting those reviews, it didn’t give me too much other than showing a simple “star” column chart with a couple of the most helpful favorable/critical ones followed by listing all of the reviews (see below figure).
For products which come with hundreds or thousands of reviews, it is just too difficult for human beings to read through all the reviews to analyze them. Data discover tools like SAP Lumira can be very helpful for this kind of insight hunting.
From a business point of view, not only end consumers, but also product designers and product sellers are very interested in getting any deeper insights from customer product reviews.
Two data sources are used for this blog post (although I collected a bunch of customer product reviews from Amazon.com):
Below figure shows their data volume size:
There are three challenges in my tour to use SAP Lumira to analyze Amazon customer product reviews.
My first challenge is to extract the data from Amazon.com web site into a format that Lumira could look. After some online research, it didn’t seem very straightforward to find APIs to easily get the review data by the public. To avoid spending too much time on this, I ended up writing a small VBA (Visual Basic for Applications) script inside MS Excel to automate Internet Explorer to fetch the web pages and parse the review data directly into Excel sheets which could be easily fed into Lumira. Below figure shows the simple frontend GUI of the script:
To my knowledge, at the time when this blog post was written, Lumira provides little sentiment analysis capabilities. Again, to make it simple, I ended up writing another small VBA script to add a lexical level sentiment analysis algorithm which is based on the research paper “Language-independent Bayesian sentiment mining of Twitter” by Alex and Zoubin. In order for Lumira to analyze the lexical sentiment, I created a second dataset by breaking down the review dataset into rows of words, followed by a merge (join back) with the review dataset. This operation sometimes results in a large dataset which gives me the next challenge.
The Lumira Desktop I used is the free download version 1.13.0. There is a limitation to use it with large dataset. When performing sentiment analysis on “To the Lighthouse”, I bumped into a performance bottle net which I cannot overcome very easily. To continue my insight hunting journey, I have to pick another data source – customer reviews on “Happy Camper Two Person Tent With Carry Bag” - which has a smaller dataset but just good to illustrate some of the potential insight that could be seen on "To the Lighthouse", any other reviews, or text contents.
SAP Lumira did a good job to help me to see, imagine, and show the review data I got from Amazon.com, although in some areas it could do better to reduce my challenges. It is a good sign to see that extensibility and big dataset support features surface out on the roadmap of Lumira. So, it is probably a good time to write a blog post here in response to the data geek challenge, and as a stop on the trip before the busy pre-Christmas season came. However, my journey to seek the insight with the help of SAP Lumira is far from its destination, especially when seeing there are a lot of potentials in areas such as sentiment analysis at higher levels.
I also attached the datasets used in this blog post (the review data I collected from Amazon.com plus the calculated sentiments) for those who are interested in following this blog post. To use them, remove the .txt file extension after unzip.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
34 | |
25 | |
12 | |
7 | |
7 | |
6 | |
6 | |
6 | |
5 | |
4 |