During the last month, I’ve been testing the connectivity of SAP Lumira to Cloudera, and thought of documenting the steps to show how easy it is and no scripting is required:
SAP Lumira allows you to connect to Cloudera Distribution for Hadoop (CDH). Once connected, you can do ad-hoc visualization of the data so you can see patterns and outliers. You can’t get value from your data unless you can see what’s inside of it.
Two Ways to connect: Using Apache Hadoop Hive OR Using Cloudera Impala.
In this blog, Apache Hadoop Hive is used. For Cloudera Impala, please click here
1. Launch SAP Lumira:
2. Acquire Data Or File > New Query with SQL
3. Select Apache Hadoop Hive 0.13 Simba ….etc Next
4. Enter security credentials, server and port for your and hit Connect
6.Click and expand the nodes in the catalog to view to the tables you have in your CDH
- You can also search for a table name form the text box
7. To select all the data in a certain table, click on the table and the hive scrip will be generated automatically
- To filter / Join / Union the data / tables, you can modify / write hive statements
8. Click on Create
- Data set is now is SAP Lumira and data can be prepared visualed …..etc