Unstructured Data Tells Us Why
How many times have you been misunderstood? I know I have! We give out signals every day from the clothes that we wear to the gestures we adopt when interacting with others. All these snippets of information are gathered and processed in our big, juicy brains, which are still much more powerful than any computer. Context changes everything and is hard to interpret. This also applies to the written word. Computer technology has proven adept at handling quantitative, objective and Boolean information. How many people said yes or no to a particular question, how they ranked something etc? Where computers have struggled is deriving meaning.
This tutorial from the SAP HANA Academy is part of a series looking at enabling search within SAP HANA. This series will show how unstructured data can be extracted and transformed into structured data for analysis. Once structured it can be queried, analyzed and reported against.
This tutorial looks at text analysis and provides a simple introduction to the concept and a demonstration of how it can be performed on even relatively short sentences. Text analysis can be run in SAP HANA Studio with SQL scripts. The tutorial uses a simple example with an ID for each row and the text to be analyzed in one column.
It uses a short string in English which contains a noun, a verb and an abbreviation. “Bob likes working at SAP”.
This is loaded into a table.
The tutorial demonstrates in great detail how to create the necessary table structures etc. Once indexes have been created and Text Analysis has been turned on you can look up the sentiment derived from SAP HANA.
You can see that the software has identified the language, the type of words, the number of paragraphs and the number of sentences. Delving deeper it has identified Bob as a person, SAP as an organisation, the statement itself as a weak but positive and the topic of the sentence being “working at SAP”. Obviously this was a short sentence and could be a fluke so the tutorial adds a sentence and a paragraph. Are the same results achieved?
You don’t have to refresh the index the sentiment is automatically derived.
You can now see locality as New York and it’s identified as a region. Bob seems to like everything. He likes beer, soccer and New York. You could make Bob really like beer and dislike soccer. This would make beer come up as a strong positive and soccer come up as a weak negative. Which as we all know of Bob is highly unlikely.
With the introduction over you can see there is much to look forward in the rest of the series.
There are many opportunities for companies and individuals in this field. Companies are struggling to filter information from unstructured textual data. Once data is captured companies struggle with the volume. Big data is usually unstructured and difficult to analyze. Further challenges of making use of unstructured data in real-time to inform business priorities can also be problematic. It is estimated that 80% of enterprise-relevant information originates in “unstructured” data. This data is qualitative, subjective and most important of all, valuable. It has meaning like the proverbial picture telling 1000 words. Governments and corporations want to know what is hidden in this unstructured data. Some of this data uses colloquialisms, idioms and meanings that have to be derived in context. Some of this information forms part of a private code between people.
There is much to learn and discover by following the Text Analysis and Search series with the SAP HANA Academy. Be brave enough to find out why things have happened rather than reflect on what’s happened by making Text Analysis a priority for your business.