Sentiment Analysis: Analyzing Big Data is Big Deal in the Middle Kingdom
At the start of July, NetBase announced that its customers now have access to Weibo content through Socialgist’s Certified Weibo Data Program. When I saw the news, my immediate reaction was “excellent news for SAP in China!” So I thought I’d share some of my observations, and why I think there may be big opportunities for SAP China to pursue.
Sentiment Analysis in China
Before we talk about NetBase, I should give you a little background. Sentiment Analysis (which includes Social Media Analytics) is a fairly significant niche market in our space within China. One estimate I read put China’s sentiment analysis software sales at roughly $150 million in 2013, and $1.5 billion for all related services and infrastructure. These estimates seem high, but at least they give you a rough idea of the market. The market is still quite immature and segmented at the moment, consisting of over 1,000 known vendors in China alone, with the top vendor owning roughly 15% share of the market.
When we talk about “sentiment” analysis, it means gathering sentiments of consumers and those of the general public from all online (and/or internal) sources, including news sites, forums, blogs, micro-blogs, social sharing sites, academic publications, etc. In a nutshell, a massive amount of data is collected from these sources using search technologies, automatically cleansed, categorized, and analyzed, and finally, visualized in an intuitive manner so that insights can be extracted from the data.
The Complexities of Sentiment Data
In theory, it sounds quite straight forward, but in practice it’s extremely difficult to obtain accurate (or even correct) results. This is mainly due to the 1.) complexity and variety of languages we use today, and 2.) emotions are often very difficult to interpret through the language we use.
A few common problems are that the opinions expressed by users can be:
- Incorrect or misleading information/mistakes (e.g. “nice sunny day” when it’s actually raining outside)
- Sarcastic comments (e.g. “can’t wait to get this new phone with industry-leading 3.5-inch screen”
- Heavily skewed by influencers/celebrities (e.g. “Ms. X said this is a good product, so it must be”). I find this to be a more pronounced problem in China than the rest of the world.
- Slangs/dialects/informal/internet/translated/acronyms/ancient (e.g. “LOL”, “selfie”, “dim-sum”, “#FOSS”, “whassup”)
- Difficulty in gauging emotions (e.g. “Do I like the re-design? Yes and no.”)
- Non-structured data is difficult to analyze (e.g. how do you detect someone giving a thumbs-down to a product in a video?)
- Reluctance to express true emotions due to public nature (e.g. Saying “I love my job” online when I really hate it, but my boss will see it so I better say what he wants to hear)
- Social media data, especially micro-blogs, tend to contain grammatically incorrect sentences and shorten/non-existent words due to character limit (e.g. “#TGIF its tme 4 bzzr”)
Adding to all the complexity is the Chinese language itself, which is known to be one of the most difficult languages in the world to use and to interpret: there are 50,000+ known, distinct Chinese characters, 3,000+ of which are commonly used. Also, approximately 2,000+ characters have been simplified. Yet many others (Taiwan, Hong Kong, etc) still use the traditional forms. And each character can be a word by itself or combined with others to form words and phrases.
New Opportunities for SAP China
Needless to say, there’s a great need to analyze public and consumer sentiments in China. The government wants to know that people are content (and that no one is causing trouble); companies and agencies want to know about consumer demands and feedback on new products; academia and investors want to gather data for their own research. The demand is certainly there, but the market has not been growing as fast as it should in the past few years.
My previous experiences in this space showed that this is mainly due to the quality of the tools available in China today. Vendors are able to make a sale because the visualizations that are presented in demos are enticing. Yet the underlying search technology and analytics are too weak to reveal any meaningful insights after the system is actually in production.
This is an opportunity that I see for SAP China. The partnership with NetBase was one that I am a fan of. NetBase has gained incredible traction in recent years, and it fully supports analysis of data in the Chinese language. Dr. Wei Li, Chief Scientist of NetBase, is a well-respected expert on Natural Language Processing (NLP) internationally and a native of China. So I know that NetBase appreciates the intricacies of the Chinese language, and that should give it an edge against other bigger competitors.
Also, I took time to read up on NetBase’s approach to collecting and analyzing data. From what I can see, they appear to be doing all the right things. So it will be exciting to see NetBase really prove its ability to process and interpret sources of Chinese data (Dr. Li’s personal blog gives a glimpse into NetBase’s capabilities for those interested in reading more). And with the recent addition of Weibo Data to its list of data sources, SAP China and NetBase are well positioned to capitalize on this niche market.
Of course, SAP China may already be working on growing this business. But from what I can see (e.g. through a Baidu search), the market is still saturated with Chinese vendors that are offering inferior solutions, both front-end and backend. My search returned very little mention, if any, of the sentiment analysis solutions that are offered by well-established global software vendors. It will be great to see a few success stories on SAP Lumira and NetBase in China.
The Need for End-to-End Consulting Services
When I was working in this space back in 2012, we talked to quite a few prospects and customers. Customers often came to us with the same request: they needed our help to extract insights from their data. We were asked to help 1.) tweak the automated analyses so they could slice and dice the information in different ways, and 2.) interpret the analytics for them, as most customers didn’t have sentiment analysis competencies on their team. Another common requirement was to have well defined, canned reports created and delivered to them periodically, as the consumers of analytics are typically executives. They require exec summaries of the analytics, again due to lack of sentiment analysis capabilities on their team.
Vendors such as Converseon, which offers consulting services in addition to their technology, address this market need quite well outside of China. No matter how sophisticated the software becomes, there’s always going to be a need for human intervention and interpretation. One day, our software may get to that level of maturity, where everything is fully automated. Until then, this is another area that is under-served within the sentiment analysis space in China today.
Untapped Markets and New Directions
Finally, I want to discuss one niche space that appears completely untapped in China at the moment: a sentiment analysis solution for conversations within instant messaging platforms on mobile devices. Weibo, the Chinese Twitter, was enjoying a healthy growth until WeChat got traction in early 2012, reaching its first 100 million registered users. Today, WeChat has over 600 million registered users, with MAU exceeding 355 million, which is about three times that of Weibo.
It’s not hard to see why WeChat is taking traffic away from Weibo at an alarming pace. WeChat is far more private, so users are more willing to reveal their identities and express their true opinions. This is where sentiment analysis gets really interesting, because it will be far more accurate (or should I say, far less inaccurate? 🙂 ). But this is unfamiliar territory for me, so I should pause here and see if other experts in the space want to share their knowledge.
Tell me: Are other SAP or NetBase experts already working on a solution here? Please comment and add your thoughts. I look forward to hearing from you!