If you have SAP HANA data base which stores all the enterprise transactional data and want to apply predictive/machine learning algorithms on the HANA data base tables or views using Rapid Miner. This blog gives you the steps to connect SAP HANA data base from Rapid Miner and retrieve tables/views/procedures data and apply Rapid Miner statistical algorithms or machine learning techniques to get the insights of data.
Back Ground and use case:
Rapid Miner is a data science platform for teams that unites data prep, machine learning, text mining and predictive model deployment. It is used for business and commercial applications as well as for research, education, training, rapid prototyping, and application development and supports all steps of the machine learning process including data preparation, results visualization, model validation and optimization.
SAP HANA is an in-memory, column-oriented, relational database management system developed and marketed by SAP SE. Its primary function as a database server is to store and retrieve data as requested by the applications.
With the SAP HANA 2.0 Data Platform of structured and unstructured data I wanted to do the proof of concept on Text analysis, Market basket analysis, Customer segmentation and other data science use cases using Rapid Miner Studio.
The below step by step procedure helps to connect HANA DB tables/views/procedures from Rapid miner studio.
SAP HANA DB & Rapid Miner Integration steps:
- Prerequisites : You must have below before trying your hands on Rapid miner models on HANA data.HANA CLIENT Tools (Which installs required drivers in the client machine)Rapid Miner Studio Client (Interface to develop Rapid miner models).User Account in HANA DB with read access on DB Objects.
Optional: Install HANASTUDIO/WEB IDE to analyze or preview SAP HANA data.
- Once you Install HANA Client tools, you will see ngdbc.jar file in the below path
Place this ngdbc.jar file below Rapid miner installation folder path
C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc
- Open the Rapid Miner studio and create data base driver .
Go to manage data base drivers and enter the following parameters as shown in picture
Name of driver: HANA_JDBC
Jar file: C:\Program Files\RapidMiner\RapidMiner Studio\lib\jdbc\ngdbc.jar Port : 30115
Driver class: com.sap.db.jdbc.Driver
- Now go to manage data base connections , create data base connection with below parameters. Select previously created driver in the data base system Give the HANA system host, port ,user id, password
Do the test connection- you should see success message.
Now you can use tables or views or write SQL script to retrieve the data.
- Now use read database operator,db connect process to import the data and process the data using rapid miner models.
From read data base parameters, you can connect tables or views or you can write custom HANA SQL Query. Ex: DW.COMPANY_DIM Table
Writing custom SQL query : Using below option you can write your own HANA SQL Query and pull required fields,aggregated data,apply filters.
Once you get the data using read data base operator.Process it using the available operators based on the ML/ Predictive use case.
Joseph Reddy Y