SAP HANA Text Analysis: Extracting insights from the written word
In the SAP Startup Focus program, part of my work is educating startups on the platform capabilities of SAP HANA. Recently, there has been a lot of interest in the Text Analysis features in SAP HANA. So, as a short introduction to the topic, here is a recap of my conversation with Anthony Waite, Text Analysis Product Manager at SAP.
Anthony Waite is the SAP HANA product manager for Text Analysis. He has been involved in Enterprise Information Management for over 5 years evangelizing data integration and data quality on structured or unstructured data with customers. Before joining SAP, he worked at Oracle as a Data Warehousing product manager for over 7 years.
For someone just starting in the field, would you explain what Text Analysis is? And how is it different from Text Mining?
The two terms are used interchangeably by a lot of people. There is a lot of gray area in defining ‘Text Analysis’ and differentiating it from ‘Text Mining’.
But from the SAP perspective, ‘Text Analysis’, refers to the ability to do Natural Language Processing, linguistically understand the text and apply statistical techniques to refine the results. Text Mining is applying algorithms, like predictive analytics, for post-processing of data (akin to data mining)
When will Text Analysis capabilities be available to developers, specifically startups?
In SP05, SAP HANA exposes Text Analysis capabilities out of the box. Text Mining will be supported in a future release.
What is Sentiment Analysis?
Sentiment Analysis helps determine the attitude of the author of the text. For example, one output of sentiment analysis is to determine if the person is being negative or positive about the topic on hand.
Does SAP HANA Text Analysis account for sentiment and context in Text Analysis?
Context is critical to Text Analysis since it involves Natural Language Processing. Text Analysis in SAP HANA can identify the language, apply appropriate linguistic rules for the particular language and then semantically interpret the data.
For example, a company would like to analyze tweets in different languages that reference a particular product. HANA can classify the tweet according to the appropriate language rules and provide sentiment analysis like ‘strong positive’, ‘weak positive’ and so on with the associated topic.
Text Analysis is a new feature in SAP HANA SP05, released in Q1 2013. Can you highlight some of the key capabilities?
SAP HANA Text Analysis has market-leading, out-of-the-box predefined entity types that are packaged as part of the platform. Looking at a clause, sentence, paragraph, or document, the technology can identify the “who”, “what”, “where”, “when” and “how much” and classify it accordingly. For example, in the following sentence “Mexico celebrates Cinco De Mayo in May”, the analysis can identify the country, holiday and month using our predefined core extraction.
For basic Text Analysis, like tokenization and stemming, SAP HANA supports 31 languages. In the upcoming SP06, there is predefined core extraction support for 13 languages and sentiment analysis support for 5 languages.
Text Analysis in healthcare is different from text analysis in CPG, varying by industry. How is this handled in SAP HANA?
Text Analysis in SAP HANA is a horizontal solution. Delivered in SAP HANA are extensive dictionaries and associated rules. This is an extensible approach that lends itself to customization and will be configurable at some point in the future.
How are enterprises leveraging Text Analysis in SAP HANA?
One use case is of an APJ-based airline company that wanted to automate the process of responding to customer requests via email. Using SAP Text Analysis technology, they are able to classify incoming emails and accurately and effectively respond to requests. This also helps them reduce their call-center costs.
Another example is of a financial services company that uses SAP Text Analysis technology as the backbone for their automatic content enrichment platform. They use Text Analysis to discover meta-data in input text data feeds, making document categorization, search and retrieval a seamless process.
For a developer getting started with Text Analysis, what are some resources you would recommend?
You may want to start out with the SAP HANA SPS05 Learning Maps.
Look for and open What’s New
- Find Text Analysis
- Fulltext Search and Fuzzy Search may be of interest too.
Fulltext Search uses our text analysis libraries. Fuzzy Search does not use our text analysis libraries but might be interesting for clustering similar entities extracted from text analysis.
Text Analysis in SAP HANA details can be found in the Developer Guide for HANA SPS05.
However, if you would like to better understand our predefined core entity extraction or Voice of Customer (sentiment analysis) coverage; please refer to the Text Data Processing Language Reference Guide. Hopefully these resources will help you get started.
You may also try HANA search for your SAP HANA Cloud applications.
One thing that should be mentioned here is that SAP HANA still does not support regular expression based queries, which is one the primary requirement for any textual analysis.
We hope this feature gets added to HANA as soon as possible, so that text analysis applications can truly benefit from its query performance and in-memory architecture.