Text Mining: A distinctive aspect of HANA
Hello Members,
This blog is about how we can use Text Mining feature of HANA. As we know, most of the data today is in the form of unstructured data. Need of today is to extract meaningful and useful information out of it. Text Mining provides us the solution of extracting the meaningful information out of unstructured Data.
More Information on Text Mining can be found in the below link
http://help.sap.com/hana/SAP_HANA_Text_Mining_Developer_Guide_en.pdf
Let us see on how we can use the power of text mining from the below scenario:
I have fetched the Unstructured Data from Samsung Facebook Page into HANA database for doing Text Mining. Following are the steps required for doing Text Mining.
- Create a table in HANA to store the unstructured data.
- Create a Full Text Index for the Column in which text is stored with Text Mining ON. (Syntax for creating Full Text Index can be referred from Reference Guide).
- Text Mining was supported only by SAP HANA XS API till SAP HANA SPS09. From SPS10 we can either use SAP HANA XS API or SQL to use text mining functions.
I have used SAP HANA XS API for implementing text mining functions and shown the output in the form of html table.
1. SUGGESTED TERMS
This function suggest the terms based on the input substring. I have passed ‘Sams‘ as input and it provided me the suggestions of terms from the data loaded in HANA. I have restricted the output to top 16 rows.
Code Snippet:
Output:
2. RELATED DOCUMENTS
This functions provides the related comments based on the input string. I have passed ‘Samsung S6 Edge‘ and I showed the Comment ID, Post ID, Message i.e. Comments on Facebook related to input string. By using this function we have fetched the comments posted by Facebook Users on the Specific Product. I have restricted the output to top 16 rows.
Code Snippet:
Output:
This is just a glimpse of how we can use text mining feature of HANA for the websites/ applications in order to extract only useful data for the organization out of huge amount of unstructured data.
I hope you find my first blog interesting and useful! 🙂
Cheers,
Deepak Varandani
very nice 🙂
Thanks Ranajay 🙂
Hi Deepak,
thanks for sharing your experience with SAP HANA technology.
I'd love to see more of that.
It would be especially interesting to see more about your hands-on. Show us your code. Tell us how you did it and why this way.
How did you like it? What didn't you like?
Give us more than just a "this feature does that". This is your blog, your opinion and your statement.
And I hope to assume right when I assume that you don't want it to be just another piece of example documentation.
Keep on blogging!
- Lars
Hi Lars,
Thanks for your valuable inputs on my first blog. I will consider your suggestions for my next blogs. I have added code snippets which will help to understand how to use text mining functions.
Best Regards,
Deepak Varandani
I agree with Lars - please see this link (by the way, note that we can format the links this way, don't have to post them as a text 😉 ) for the differences between the blog and document. "How-to's" need to be posted rather as documents, from the blogs the readers expect more of a story, as Lars mentioned. Also this blog and the consequent ones could be helpful. Similar content can be found in the Getting Started section (cleverly hidden behind a tiny link in the top right corner of the SCN page).
P.S. Hmm, perhaps SAP should do some data mining on SCN... 🙂
Nice blog Deepak.
It would be good if you provide some more technical information about Text mining.
-Shweta
Hey Deepak
I can understand if you have any constraint on sharing the whole code, but even if you can share some code snippets for others, it would be a great help.
Regards
Ranajay
Hey Ranajay,
Thanks for understanding the constraint. I have attached the code snippets. It provides details about how we can use text mining functions.
Best Regards,
Deepak Varandani
Thanks Shweta !!
You can refer Text Mining Developer Guide for detailed info about Text Mining.
Let me know in case you face any issues related to Text Mining.
Best Regards,
Deepak Varandani
Nice Blog.
Thanks for sharing the point of capability of HANA.
Keep it up.
Thanks Papil !!
Very nice Blog Deepak.
Thanks Arden !!!
Great blog Deepak!! Good Work!! 🙂
Thanks Pragati 🙂
Great Blog Deepak. Very explanatory and knowledgeable. Thanks
Thanks a lot Rohit !!!