Some time ago, I was contacted by a Harvard student called Hung Tran. Hung was working on a project involving SAP HANA and R, and he needed some guidance and help with some procedures. Of course, I helped him without actually knowing how big the project was.
Time passed and Hung sent me a first draft so I could test it live…I was totally blown away…I could have never expected something like that…
Before we continue…allow me to introduce “TEAM4Solutions”, composed by students working on a really nice project.
So…what makes this project so interesting? The technologies they use to bring this project to life…
1. SAP HANA Appliance (in-memory database and analytics engine)
2. SAP HANA Studio (data modeling and management tool)
3. R analytics engine and text mining packages
4. Java-based middleware
5. Flex based user interface (browser and mobile support)
6. Crystal Reports Reporting
7. Active Directory Integration
Pretty impressive…so…in the means of provide a better understanding…I asked the team a couple of questions…
Blag: Can you tell us why your team build this project?
Team: Our team built this project as part of an academic requirement for Harvard Extension’s Information Management Systems Capstone class. This project was meant to showcase the groups capability to produce a well-designed system which takes into consideration functional, technical and operational requirements in an enterprise environment.
Blag: Can you tell us shortly about the application?
Team: The application the group has created is an analytics application that consolidates and analyse various structured and unstructured product data. The goal of the project is to provide a platform that would provide users with real time access, correlation and analysis of product issue data. Specifically, the application focuses on the following use cases:
- Providing on-site technicians with real-time access to issue and solution information to facilitate faster resolution of product issues.
- Providing proactive analysis of user complaints in terms of product risk probabilities to reduce product recall instances and frequency.
- Providing a means to analyze unstructured data from business units such as complaints resolution, maintenance reports, knowledge bases, product documentation and quality assurance reports.
- Providing help desk with an easily searchable central repository of correlated information to assist with product and issue inquiries.
- Providing visibility to various business units regarding customer complaints and issues to facilitate better product development and enhancements.
Blag: Why did you choose SAP HANA?
Team: SAP HANA was primarily due to:
- Leverage the performance benefits of using an in-memory database. SAP HANA was obviously, one of the leading technologies in this area.
- Leverage the built-in integration of R. As the project is primarily an analytics project, the flexibility and the wide variety of data mining packages of R was essential for the project.
- The ability to embed R scripts in SQL. The group felt that the ease and flexibility of this approach will help make development more efficient and adding new analytics functionalities easier.
- The column store structure provided by SAP HANA is conducive to the unstructured data that will be used in the project
Blag: Which cloud provider did you used?
Team: Amazon Web Service was used for the project.
Blag: Can you tell us about your experience using the integrated R face of SAP HANA?
Team: The built-in integration of R is one of the strengths of SAP HANA that was essential to the project. Since the project is essentially an analytics and data mining system, having R scripts integrate with SQL scripts made extracting and analyzing datasets very efficient. The combination of R and it is wealth of data mining packages, data sets that are processed in memory and a database environment tailored for unstructured text makes R + SAP HANA an ideal environment for the project. For this project, our team used one of the natural language processor libraries with in R to extract the common phrases from the unstructured data column. It was simply implemented with a few lines.
Blag: How did you implemented the Fuzzy search on SAP HANA?
Team: The fuzzy search was implemented using the text engine that is embedded into SAP Hana, which enables the ability to use it inside a SQL query. The fuzzy search can be simply included in a “SELECT” statement using the “CONTAINS()” function with the “FUZZY()” option in the “WHERE” clause. For the purpose of this project, only the minimal options were used in the fuzzy search.
Blag: Can you tell us about your whole experience using SAP HANA, R, Flex and Java?
Team: As mentioned above, SAP HANA + R was essential for the project. The use of Flex is more in terms of providing not only a clean sleek look but also increase the usability of the application with the addition of drag/drop functionalities in both computer and mobile use. Java was primarily used as a middle ware to allow for expandability of the solution. One of our team members had experience with Flex + Java and connectivity to SAP Hana was easily implemented using JDBC. The only issue we came across was embedding R in Hana, but we found a tutorial that Alvaro Tejado Galindo wrote on how to embed R in Hana, which was very informative.
Blag: Would you recommend SAP HANA?
Team: Yes, we would definitely recommend SAP HANA particularly for analytics applications. The integrated R framework plus the speed of an in-memory environment provides tremendous possibilities particularly in real time data analysis not seen before in traditional implementations.
Enough said…their project was graded the best of the semester and also raised the bar for future semesters.
So after all this talking, I’m sure you want to see the application in action…gladly, the team provided a video demonstration, so please…enjoy…
Before I finish…I would like to thank Greg, Hung, Julio, Michael and Ryan for allowing me to sharetheir story…a story where SAP HANA is a big player.