This is part one of the blog series I am planning to write on the topic of Unstructured data Analytic and how SAP Netweaver Cloud powered by HANA makes a good platform for this service.
Earlier this year I participated in HANA InnoJam context and had my first brush with HANA. It was highly rewarding to get access to HANA system hosted on Cloudshare for implementing my idea on unstructured data Analytics using HANA. I picked up customer complains listed on US department of transportation as the source of unstructured data.This database has appx 9 million customer complains spanning over a decade. It has a free text field where vehicle owner has entered details of the complain. This formed a rich source of unstructured data for analysis that can help get insight into customer’s sentiment. Specific terms with the complain descriptions were co-related across the complete database to build a unique search tool. APACHE UIMA framework was used as an information extractor for extracting nouns from unstructured data. This was then loaded into HANA for associative analysis. UI had a search field that searched HANA database and showed tree diagram as search results. Terms are linked via tree branched and user can drive progressively to narrow the search results.
You can find further details in this Vimeo submission. The video quality is not best in the world. Apology!
1) HANA – SAP cloud comes with HANA and I was in search of HANA access for taking my idea to next level.
2) SAP Cloud is open standard – Support for JAVA libraries for building cloud applications. This means I can implement APACHE UIMA and my custom JS based UI on cloud
3) Predictive Analysis Library -PAL – The algorithms that are packed with PAL in HANA can do cluster analysis, classification analysis and association analysis. My idea needs association analysis for the terms extracted from unstructured data.
I had to get access to SAP Cloud. BETA Test program came to the rescue. Currently, as part of this program, I am developing a cloud based solution. In my next blog on this topic I will talk about the business case and give architectural overview of the solution proposed.