Skip to Content
Author's profile photo Anthony Waite

Text Analysis: Natural Language Processing as a Core Feature of SAP HANA

Text is everywhere. We use natural language to express ourselves in complete or incomplete sentences of any length. Words can be thought of as the intuitive units of language but are problematic for many organizations to process. The ability to perform accurate, large-scale and rapid detection of key information trapped in enterprises is a challenge.

Text Analysis (TA) is a native feature of SAP HANA. TA – at the core a Natural-Language Processing (NLP) engine – is a strategic asset that forms the foundation for search and information discovery applications. By applying statistical, linguistic, and machine-learning techniques, TA provides the foundation for indexing and semantic annotations that identify or describe features of interest in text – in multiple languages.

In general, extending the coverage of a platform like SAP HANA from structured data to textual unstructured data requires at least a basic form of NLP technology. The versatility of the SAP’s TA engine within HANA reaches beyond the basics – it enables a wide range of applications that can rely on it to varying degrees. The text processing capabilities of these applications depend on the combinations of TA functionality areas they consume:

Linguistic analysis

This is the most fundamental form of TA – tokenization, stemming and part-of-speech tagging. These operations are essential for optimized index building and search-oriented systems since they guarantee both high precision and recall. Full-Text Search within SAP HANA is built on top of this.

Entity extraction

The identification of named entities (persons, organizations, products etc.) allows for the elimination of ‘noise’ in textual data, essentially highlighting salient information in large text collections. This process enables the transformation of unstructured textual data into structured information, which can then be leveraged by Information Management and Business Intelligence solutions.

Fact extraction

Higher-level semantic processing links entities as ‘facts’ in domain-specific applications. One of the applications is sophisticated Sentiment Analysis (‘Voice of the Customer’), which classifies sentiments with their corresponding topics.

TA in SAP HANA takes unstructured text data in a wide variety of file formats and turns it into something you can search, analyze, and act on. It allows you to deal with information overload by mining big data and making sense of all of the information without having to read every single sentence. Simply put, TA automates intelligent discovery from data sources that were previously unprocessable.

The following are deep dive topics for future blogs and how they relate to Text Analysis in SAP HANA:

  • Linguistic Markup
  • Named Entity Recognition
  • Relations & Events
  • Sentiment Analysis
  • Precision & Recall Numbers
  • Customization
  • Text Mining Algorithms
  • Semantic Roles
  • Semantic Inference

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Sergey Korolev
      Sergey Korolev

      I am waiting for more.