Unstructured Data Analytics using SAP Netweaver Cloud – Part 2
This is part two of the blog series I am writing on the topic of Unstructured data Analytic using SAP Netweaver Cloud powered by HANA. First blog talks about HANA and its usage for analysing unstructured data. Specific case of automotive industry is discussed.
A case for analysing unstructured data
Significant amount of information is locked in form of unstructured data for most organizations. Unstructured data is generated in various forms during formal and informal operations. Some of the textual forms of unstructured data are emails, contracts, medical records, customer reviews, customer feedback, claims, agreements, etc. It’s observed that current BI and Analytical systems are less efficient for analyzing unstructured textual information. Research institutes are making use of open source frameworks like APACHE UIMA for extracting information from textual unstructured data for research purpose. This research is mostly around feature based opinion mining, summarization, aspect based sentiment analysis, detecting fake reviews, etc. However, a huge gap is observed when it comes to applications that can be used by business to drive decisions based on insights gained from textual unstructured data.
This blog describes a cloud based solution that uses information extractors like APACHE UIMA for extracting information from textual unstructured data and SAP HANA for generating business insights in correlation to back end structured enterprise data. SAP Netweaver Cloud provides a robust platform as a service which is leveraged for the purpose of scalability, RESTful API, authentication, integration and HANA.
Use of SAP Netweaver Cloud and HANA-
SAP Netweaver cloud is used as a platform to host various components of the solution. Information extractor like APACHE UIMA is deployed on the cloud. Unstructured data in fed to the information extractor either real time or periodically. Information extractor uses industry specific models as a reference for extracting information. Extracted entities and structured enterprise data is saved into persistence data storage. SAP Cloud provides HANA as storage device. Data is stored in industry specific schemas.
Industry specific APPS for business insights-
Apps specific to industry are used to analyze data stored in HANA to get insights. These apps make use of ‘Predictive Analysis Library’ (PAL) of HANA to execute various algorithms on the data. Algorithms for cluster analysis, classification analysis and association analysis are currently available. New apps can be developed and deployed as per the client requirements.
API for external world-
Application programming interface is exposed for external consumption. External applications that might be on a cloud or on premise system can use these API’s to consume business insights. Developers can make creative use of the insights to develop rich user interfaces using diverse technologies. API’s can be consumed by mobile apps as well.
Data loading mechanism-
- A) Persistent storage scenario
Customer of this cloud service can opt to store the data in HANA system persistently. This provides continuously available analytics to customer. Customer can develop his own external apps that make use of the insights provided by cloud service. He can get custom made apps developed on SAP cloud that sit on top of HANA.
- B) Temporary Storage scenario
This suite better for one time consumption of the analytics. Eg- Generate insights for an advertisement campaign by analyzing consumer reviews. Once the campaign is completed there’s no need to keep data on the cloud.
- C) On the fly scenario
This is purely in memory analytics where data is not stored to HANA database. Unstructured data can be fed to information extractor manually via a UI or programmatically via API services. Extracted information is stored as temporary tables in HANA and PAL is used to analyze the entities and provide output as per requirements. Once the output is calculated, temporary table is deleted so that there’s not permanent storage. This will be of use to other software’s and applications which require analytics to complete a task or transaction.
SAP Netweaver cloud with power of HANA has provided the right mix of cloud’s economic advantage and HANA’s high end computing power. In build predictive analysis library of HANA has it even easier to get those insights that help taking business decision. REST API can bring this solution in hands of all the developers who are interested in developing their own applications around the insights sources from unstructured data. Overall the solution bridges the unstructured data analytics with enterprise functions.
Disclaimer– Solution proposed is a ‘work in progress’. Certain things might not turn out as per the specifications. I will do my best to keep this series updated on the progress.
Finally, I would like to hear your feedback on the design and the topic overall. If would be great to hear your experiences in this domain.