Text Mining of SAP TechEd Session Catalogue
I came across a number of text mining examples and all them where dealing with text information that is not from my area of expertise, for example, microbiology. Such examples are not catching your attention because you can not really tell if text mining is working based on your knowledge. I created SAP TechEd 2015 Session Catalogue data to play with HANA Text Mining. Any SAP professional is familiar with this text data and can make his opinion about HANA Text Mining.
So here is my data. As you can see there are SESSION and TITLE information fields, CATEGORY field that classifies each document and DESCRIPTION field which contains text data for text mining (full-text index is built on this field).
I prepared a number of examples. Lets see text mining in action.
Find Similar Documents
For example, find session similar to DEV260 ‘Building Applications with ABAP Using Code Pushdown to the Database’
Find Relevant Terms
For example, find terms relevant to DEV260 ‘Building Applications with ABAP Using Code Pushdown to the Database’ session
Find Related Terms
For example, find terms related to ‘Fiori’
Find Relevant Documents
For example, find documents relevant for term ‘Fiori’
For example, you have a new session for which you have to assign to a proper category. I took SAP TechEd 2014 DEV161 ‘SQLScript – Push Code Down into SAP HANA to Achieve Maximum Performance’ session which belongs to ‘Development and Extension Platform for SAP HANA and Cloud’ category and classified it using 2015 SAP TechEd catalog documents. Lets see if the document will be correctly classified.
As you can see the document was classified correctly.
You can import attacted TM_DEMO-sap.com.tgz delivery unit into you HANA system and play with examples and data. Once you delivery unit is imported you will have following objects in tm_demo package created
Note: described examples are in query.sql file
Excecute install.sql script to assign proper authorizations, fix data in session table data and create full text index for text mining.
Following catalog objects will be created
Note: for text mining function to work correctly you need to be on HANA SPS10
Here is a content of Delivery Unit:
Note: here is a helpful link How to Import Delivery Unit to HCP HANA MDC