During the last month, I’ve been testing the connectivity of SAP Lumira to Cloudera, and thought of documenting the steps to show how easy it is and no scripting is required:
SAP Lumira allows you to connect to Cloudera Distribution for Hadoop (CDH). Once connected, you can do ad-hoc visualization of the data so you can see patterns and outliers. You can’t get value from your data unless you can see what’s inside of it.
Two Ways to connect: Using Cloudera Impala OR Using Apache Hadoop Hive
In this blog, Cloudera Impala is used. For Apache Hadoop Hive, Please click here
1. Launch SAP Lumira:
2. Acquire Data Or File>>New>>Query with SQL
3. Select Cloudera Impala 1.0 Simba ….etc >> Next
4. Enter security credentials, server and port for your and hit Connect
5. Initial screen
6. Click and expand the nodes in the catalog to view to the tables you have in your CDH
- You can also search for a table name form the text box
7. To select all the data in a certain table, click on the table and the impala scrip will be generated automatically
- To filter / Join / Union / Group the data / tables, you can modify / write impala statements
8. Click on Create
- Data set is now is SAP Lumira and data can be prepared visualized …..etc.