We all know and agree that data size is increasing at a tremendous pace.

With Social media like Twitter and Facebook, organizations are faced with a new question… How can this unstructured data be used to its potential?

In this blog, we will look at how to consume and make sense of unstructured data using HANA and SAP Lumira.

Following are the various things we will look at-

1. Importing raw data into HANA

2. Using Text Analysis to redefine the unstructured data

3. Create Analytical Model to consume the data in SAP Lumira

4. Create a LUMIRA report and get the insight.

1. Importing raw data into HANA

There are various ways to import data into HANA, like SLT, SAP Data Services or plain CSV upload.

In this example, we have downloaded the CSV files(FLAT_CMPL.zip) containing the COMPLAINTS received by NHTSA (an organization which maintains data regarding Complaints, Defects & Recalls of the Cars in USA).

Office of Defects Investigation (ODI), Flat File Downloads | Safercar.gov | NHTSA

Upload the CSV file directly to HANA and create a new Table with Fields and mapping suggested by HANA.

Modify the Field Name in the Target Table.

Mapping.jpg

The Table contains details about the Type of Car, Make, Model, Year and Description” with lot of Text Data about the complaint.

Description_Column.jpg

2. Using Text Analysis to redefine the unstructured data

At this point, the data is not ‘complete’ for analysis. Using Text Analysis by means of a simple SQL execution, this data can be structured.

FULLTEXT INDEX GOUTAM.TEXT_CORE ON “GOUTAM”.“COMPLAINT”(“Description”) CONFIGURATION ‘EXTRACTION_CORE’ TEXT ANALYSIS ON;

After the Text Analysis process is complete(could be checked in M_FULLTEXT_QUEUE) the data  is extracted based on Type(TA_TYPE) and also will include the no of occurrence(TA_COUNTER) into a new TA Table “$TA_TEXT_CORE”.

Below is the snapshot as to how the TA process categorizes the extracted data. Since we are interested in only the parts of the car, we will use NOUN_GROUP as filter.

TA_TYPE.jpg

TA_TEXT_CORE.jpg

3. Create Analytical Model to consume the data in SAP Lumira

Let’s combine the original table “COMPLAINT” with the new TA table “$TA_TEXT_CORE” in an Attribute View. This view will eventually be consumed in Analytic View.

Define the Output Columns and Activate the Attribute View.

Attribute View.jpg

As we are only interested to see the complaints related to Car part or Incident, we will restrict the Measure TA_COUNTER to only records of TA_TYPE= NOUN_GROUP. This could be achieved by creating a variable on TA_TYPE Column with value NOUN_GROUP”

Variable.jpg

The Analytic View is now available and can be consumed in any front end Tool supported by HANA.

We will be using SAP Lumira Desktop version.

Analytic View.jpg

4. Create a LUMIRA report and get the insight.

Create a new Lumira Document and choose the SAP HANA for the Dataset. In the next screen select the Analytic View COMPLAINT_ANALYSIS.

Define the Measure and dimension for report.

Since we have included a Variable, it will ask you to choose the value. Choose the default “NOUN_GROUP”

HANA_SOURCE.jpg

HANA_VIEW_STRUCTURE.jpg

Create a basic Column Chart with Counter in Y Axis and Make, Model and TA_TOKEN in X Axis.

Here we can see the Components and how they are related to the Make of the Vehicle. For e.g. FORD has 12 Air Bag related issue.

This information was initially hidden in the Text, but no more so 🙂 .

We can scroll across the various lines to see the Component and the Model involved in the Complaint.

However our dataset is large, and it will get more interesting once we start applying filters like Date Range for Model Year of the Car,

filter values across states/cities or Components.

Screenshot 2014-12-08 15.22.17.png

If we further Filter the data only related to Air Bag related issue, the report will look something like this.

Counter here is nothing but the number of occurrence of Air Bag issue for various Make/Model.

AIR_BAG_GRAPH.jpg

Of course Lumira offers much more, and for e.g we could create a Geographical Hierarchy on the City Dimension and we could see the Complaint, Component and Model across the States of US. Compose a Story board or Infographics and share across seamlessly.

In the Graph below, it is filtered to Air Bag issue.

AIR_BAG_GEO.jpg

To conclude, we saw how easy it is to consume text/unstructured data in SAP HANA, with SQL based integration of Text Analysis Process and define the models in Graphical modeling environment i.e. HANA Modeler. And we saw how seamlessly HANA is integrated with SAP Lumira to consume the models.

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

Leave a Reply