Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 

Carsten Mönning and Waldemar Schiller


Part 1 - Single node Hadoop on Raspberry Pi 2 Model B (~120 mins), http://bit.ly/1dqm8yO
Part 2 - Hive on Hadoop (~40 mins), http://bit.ly/1Biq7Ta

Part 3 - Hive access with SAP Lumira (~30mins)
Part 4 - A Hadoop cluster on Raspberry Pi 2 Model B(s) (~45mins), http://bit.ly/1eO766g

Part 3 - Hive access with SAP Lumira (~30 mins)


In the first two parts of this blog series, we installed Apache Hadoop 2.7.2 and Apache Hive 1.1.0 on a Raspberry Pi 2 Model B, i.e. a single node Hadoop 'cluster'. This proved perhaps surprisingly nice and easy with the Hadoop principle allowing for all sorts of commodity hardware and HDFS, MapReduce and Hive running just fine on top of the Raspbian operating system. We demonstrated some basic HDFS and MapReduce processing capabilities by word counting the Apache Hadoop license file with the help of the word count programme, a standard element of the Hadoop jar file. By uploading the result file into Hive's managed data store, we also managed to experiment a little with HiveQL via the Hive command line interface and queried the word count result file contents.


In this Part 3 of the blog series, we will pick up things at exactly this point by replacing the HiveQL command line interaction with a standard SQL layer over Hive/Hadoop in the form of the Apache Hive connector of the SAP Lumira desktop trial edition. We will be interacting with our single node Hadoop/Hive setup just like any other SAP Lumira data source and will be able to observe the actual SAP Lumira-Hive server interaction on our Raspberry Pi in the background. This will be illustrated using the word count result file example produced in Parts 1 and 2.


Preliminaries

Apart from having worked your way through the first two parts of this blog series, you will need to get hold of the latest SAP Lumira desktop trial edition at http://saplumira.com/download/ and operate the application on a dedicated (Windows) machine locally networked with your Raspberry Pi.


If interested in details regarding SAP Lumira, you may want to have a look at [1] or the SAP Lumira tutorials at http://saplumira.com/learn/tutorials.php.


Hadoop & Hive server daemons

Our SAP Lumira queries of the word count result table created in Part 2 will interact with the Hive server operating on top of the Hadoop daemons. So, to kick off things, we need to launch those Hadoop and Hive daemon services first.


Launch the Hadoop server daemons in your Hadoop sbin directory. Note that I chose to rename the Hadoop standard directory name into "hadoop" in Part 1. So you may have to replace the directory path below with whatever hadoop directory name you chose to set (or chose to keep).


          /opt/hadoop/sbin/start-dfs.sh

          /opt/hadoop/sbin/start-yarn.sh


Similarly, launch the Hiver server daemon in your Hive bin directory, again paying close attention to the actual Hive directory name set in your particular case.

     /opt/hive/bin/hiveserver2


The Hadoop and Hive servers should be up and running now and ready for serving client requests. We will submit these (standard SQL) client requests with the help of the SAP Lumira Apache Hive connector.

SAP Lumira installation & configuration

Launch the SAP Lumira installer downloaded earlier on your dedicated Windows machine. Make sure the machine is sharing a local network with the Raspberry Pi device with no prohibitive firewall or port settings activated in between.

The Lumira Installation Manager should go smoothly through its motions as illustrated by the screenshots below.

On the SAP Lumira start screen, activate the trial edition by clicking the launch button in the bottom right-hand corner. When done, your home screen should show the number of trial days left, see also the screenshot below. Advanced Lumira features such as the Apache Hive connector will not be available to you if you do not activate the trial edition by starting the 30-day trial period.


With the Hadoop and Hive services running on the Raspberry Pi and the SAP Lumira client running on a dedicated Windows machine within the same local network, we are all set to put a standard SQL layer on top of Hadoop in the form of the Lumira Apache Hive connector.

Create a new file and select "Query with SQL" as the source for the new data set.

Select the "Apache Hadoop Hive 0.13 Simba JDBC HiveServer2  - JDBC Drivers" in the subsequent configuration sreen.

Enter both your Hadoop user (here: "hduser") and password combination as chosen in Part 1 of this blog series as well as the IP address of your Raspberry Pi in your local network. Add the Hiver server port number 10000 to the IP address (see Part 2 for details on some of the most relevant Hive port numbers).

If everything is in working order, you should be shown the catalog view of your local Hive server running on Raspberry Pi upon pressing "Connect".

In other words, connectivity to the Hive server has been established and Lumira is awaiting your free-hand standard SQL query against the Hive database. A simple 'select all' against the word count result Hive table created in Part 2,for example, means that the full result data set will be uploaded into Lumira for further local processing.

Although this might not seem all that mightily impressive to the undiscerning, remind yourself of what Parts 1 and 2 taught us about the things actually happening behind the scenes. More specifically, rather than launching a MapReduce job directly within our Raspberry Pi Hadoop/Hive environment to process the word count data set on Hadoop, we launched a HiveQL query and its subsequent MapReduce job using standard SQL pushed down to the single node Hadoop 'cluster' with the help of the SAP Lumira Hive connector.

Since the Hive server pushes its return statements to standard out, we can actually observe the MapReduce job processing of our SQL query on the Raspberry Pi.


An example (continued)

We already followed up on the word count example built up over the course of the first two blog posts by showing how to upload the word count result table sitting in Hive into the SAP Lumira client environment. With the word count data set fully available within Lumira now, the entire data processing and visualisation capabilities of the Lumira trial edition are available to you to visualise the word count results.

By way of inspiration, you may, for example, want to cleanse the license file data in the Lumira data preparation stage first by removing any punctuation data from the Lumira data set so as to allow for a proper word count visualisation in the next step.

With the word count data properly cleansed, the powerful Lumira visualisation capabilities can be applied freely at the data set and, for example, a word count aggregate measure as shown immediately below.

Let's conclude this part with some Lumira visualisation examples.

In the next and final blog post, we will complete our journey from a non-assembled Raspberry Pi 2 Model B bundle kit via a single node Hadoop/Hive installation to a 'fully-fledged' Raspberry Pi Hadoop cluster. (Though it will be a two-node cluster only, but it will do just fine to showcase the principle.)

Links

SAP Lumira desktop trial edition - http://saplumira.com/download/

SAP Lumira tutorials - http://saplumira.com/learn/tutorials.php
A Hadoop data lab project on Raspberry Pi - Part 1/4 - http://bit.ly/1dqm8yO
A Hadoop data lab project on Raspberry Pi - Part 2/4 - http://bit.ly/1Biq7Ta

A Hadoop data lab project on Raspberry Pi - Part 4/4 - http://bit.ly/1eO766g

A BOBI document dashboard with Raspberry Pi - http://bit.ly/1Mv2Rv5

References

[1] C. Ah-Soon and P. Snowdon, "Getting Started with SAP Lumira", SAP Press, 2015

Labels in this area