Big data, structured data, unstructured data: Making sense of it all
ASUG webcast by Trish Harman, SAP Solution Marketing
(For attribution of the following stats, see the Information Governance Infographic) Data is growing very quickly…up 48% form 2011. Projections are 7.9 Zetabytes by 2015. And the types of data are also quickly changing. Unstructured data is growing up to 80%, in addition to structured data continuing to grow by 40%.
When you don’t address these changes, data quality immediately impacts your organization, and remains the cause for 40% of business initiatives failing.
The classic Big Data dimensions are still Variety, Velocity, and Volume. At least two of these must be present to qualify as Big Data. And, of course, you must be constantly tying your efforts to the fourth V: Value. Does your organization deal with Big Data based on this definition? Audience answer was 42% Yes, and 15% No. (the rest didn’t respond).
How does Text Analysis help you? The main components of Text Analysis are below.
- Language Identification: Identify the language that the text is written in.
- Entity Extraction: identify the nouns (customer, person, etc.).
- Entity Relationships: how those entities are related to each other: identify job titles, marital status, mergers/acquisitions, etc.
- Linguistic analysis: describing the rules that speakers use in a specific language. Lots of heavy software intelligence is here.
Key use scenarios for text analytics follow:
- Law Enforcement
- Life Sciences
- Media & Publishing
- HR (pulling key information from resumes)
- Legal, Marketing (developing campaigns based on social media feedback)
- Customer Service
- Competitive Intelligence.
When asked, the majority of the audience was by far No Answer (interpreted as not doing it yet). Of the answers, Marketing and Customer Services were the most common use scenarios adopted by the audience.
A key customer provides a service that allows their customers to ask questions via text. They found that 80% of the texted questions were the same, so they thought they could automate some of the responses. Needed to understand multiple languages, extract all of the text, and search multiple words at a time. Without these capabilities, the questions went to a Customer Service Rep, who had to manually answer each text anyway. Data Services, however, can focus on the 11 different languages that this APJ customer needed. For example: Is flight 243 on time from TKL? Extract the flight number and the location, and can match Tokyo (despite being spelled wrong), conduct the query, and automatically reply if the flight is on time. This helps them be much more responsive to their customer questions AND helps their Customer Service Reps focus on the more difficult questions.
Data Services can handle many data sources, and Text Analysis is part of that solution. In fact, Data Services can also connect to Apache Hadoop frameworks HDFS and Hive sources and targets. We push down the Entity Extraction as a Map/Reduce job. This means that performance is improved, because the processing is sitting very close to the data.
Essentially, Data Services pulls content from Notes fields, HTML, Word Documents, PowerPoint files, etc. Then we extract entities and facts. That data is then structured into a database, where the BI tools can query and report on the results.
The brain can scan through this text quickly, but it needs to be automated. We identify the important data elements, and leave the rest behind (born, is, an, etc.).
This example outlines the software approach to the above scenario.
The capabilities are built right into Data Services. Choose the Text Data Processing transform, and work with the Entity Extraction functionality. We input the models and serial numbers to Data Cleanse.
Then pull in the unstructured data and highlight the key entities. Then extract those concepts into a structured form (Concept).The Match transform can then aggregate the concepts based on similarity into groups. Those groups are what you’ll use for reporting on trends. See the kind of results below.
HANA does have some text analysis capabilities, which are from the Text Analysis SDK. When to use Data Services vs. HANA? If you’re not looking to load lots of unstructured data, but instead just want to load the relevant pieces, use Data Services.
Thanks, Trish, for this great webinar! For follow-up questions, please reach out to Trish at firstname.lastname@example.org.