Skip to Content

I came across a number of text mining examples and all them where dealing with text information that is not from my area of expertise, for example, microbiology. Such examples are not catching your attention because you can not really tell if text mining is working based on your knowledge. I created SAP TechEd 2015 Session Catalogue data to play with HANA Text Mining. Any SAP professional is familiar with this text data and can make his opinion about HANA Text Mining.

So here is my data. As you can see there are SESSION and TITLE information fields, CATEGORY field that classifies each document and DESCRIPTION field which contains text data for text mining (full-text index is built on this field).

TM1.jpg

I prepared a number of examples. Lets see text mining in action.

 

Find Similar Documents

For example, find session similar to DEV260 ‘Building Applications with ABAP Using Code Pushdown to the Database’

TM2.jpg

 

Find Relevant Terms

For example, find terms relevant to DEV260 ‘Building Applications with ABAP Using Code Pushdown to the Database’ session

TM3.jpg

 

Find Related Terms

For example, find terms related to ‘Fiori’

TM4.jpg

 

Find Relevant Documents

For example, find documents relevant for term ‘Fiori’

TM5.jpg

 

Categorize Documents

For example, you have a new session for which you have to assign to a proper category. I took SAP TechEd 2014 DEV161 ‘SQLScript – Push Code Down into SAP HANA to Achieve Maximum Performance’ session which belongs to ‘Development and Extension Platform for SAP HANA and Cloud’ category and classified it using 2015 SAP TechEd catalog documents. Lets see if the document will be correctly classified.

TM7.jpg

As you can see the document was classified correctly.

 

You can import attacted TM_DEMO-sap.com.tgz delivery unit into you HANA system and play with examples and data. Once you delivery unit is imported you will have following objects in tm_demo package created

TM8.jpg

Note: described examples are in query.sql file

 

Excecute install.sql script to assign proper authorizations, fix data in session table data and create full text index for text mining.

Following catalog objects will be created

TM9.jpg

Note: for text mining function to work correctly you need to be on HANA SPS10

 

Installation instruction:

  1. Import TM_DEMO-sap.com.tgz Delivery Unit
  2. Execute install SQL Script

Here is a content of Delivery Unit:

TM_DEMO.hdbschema

TM_DEMO_ROLE.hdbrole

install.sql

query.sql

session.csv

session.hdbdd

session.hdbti

session_fix.hdbprocedure

 

Note: here is a helpful link How to Import Delivery Unit to HCP HANA MDC

To report this post you need to login first.

3 Comments

You must be Logged on to comment or reply to a post.

Leave a Reply