As I promised on my previous post, I will be sharing here my Hadoop/Hive installation on my local machine using Windows 10 Bash Shell.

I will show you my setup and versions of Hadoop and Hive. In the next blog, I will be showing my local SAP HANA Express Edition connectivity to Hadoop/Hive using SDA.

To proceed here you will need to make sure you have Bash On Ubuntu installed and running on your Windows 10 machine.

I also have the new SAP HANA Express Edition 2.0 installed in the same hardware. Following by SAP HANA Studio, the SAP IDE for HANA (new web-based SAP HANA running on XS Advanced) ,SAP EA Design, and SAP SHIRE .

For more information about SAP HXE check here

so, Let’s start:

  1. Apache Hadoop installation
    • Create Hadoop Group and User
    • Add hduser as sudoers
    • Generate SSH key for hduser
    • Installing Java on the Lunix Ubuntu box
    • Installing MySQL
    • Hadoop installation
    • Changing bashrc file for hduser
    • Additional Hadoop folders
    • Setup of Hadoop startup files: core-site.xml, hdfs-site.xml, mpred-site.xml, yarn-site.xml and hadoop-env.sh
    • Formatting the Hadoop HDFS file system 
    • Starting up Hadoop services
    • Hadoop Web Interfaces
  2. Apache Hive Installation
    • Downloading and installing Apache Hive
    • Configuring MySQL Metastore for Hive
    • Creating hive-site.xml
    • HDFS commands to create HIVE directories
    • Starting Hive console
    • Running HiveServer2 and Beeline

Bash on Windows 10:

1. Apache Hadoop installation

New Hadoop Group Security and User. New user hduser will require password (of course)

sudo addgroup hadoop
sudo adduser –ingroup hadoop hduser

Add hduser as sudoers to allow root permissions: Add hduser to the list

cd /etc
sudo vi sudoersAdd the following lines:hduser ALL=(ALL:ALL) ALL

Generate SSH Key for hduser with empty password and move key to authorized_keys file. Need to be logged as hduser

su -i hduser

ssh-keygen -t rsa -P “”

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Installing Java on Linux Ubuntu:
I have Java 8 installed.

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

to confirm Java is installed:

java -version
java version “1.8.0_111”  > Shows my installation


Installing MySQL on Linux Ubuntu
: I use MySQL with Hive metastore database. I had issues with Bash on Windows 14.04 trusty. I succeeded installing on 16.04 Xenial.

sudo apt install mysql-server mysql-client
sudo apt-get install libmysql-java

To test MySQL installation:

sudo service mysql start

To check the status:

sudo /etc/init.d/mysql status

To connect with Localhost. Need the password of the installation

mysql -u root -p –host=localhost

Key: Successful connection is required here. to check status use “sudo /etc/init.d/mysql status”:

To stop MySQL server use:

sudo /etc/init.d/mysql stop

 

Hadoop installation: I installed Hadoop 2.7.3. More information can be found at Apache Hadoop. The installation file from the mirror website. The location is /usr/local. After the unziping I moved the folder to hadoop just to simplify things

cd /usr/local

sudo wget http://apache.mirror.rafal.ca/hadoop/common/stable2/hadoop-2.7.3.tar.gz

sudo tar -xzvf hadoop-2.7.3.tar.gz

sudo mv hadoop-2.7.3 hadoop

bashrc file for hduser needs changing to add the paths for Hadoop and Java

vi ~/.bashrc

# Hadoop
export HADOOP_PREFIX=/usr/local/hadoop

#Hadoop bin/ directory to path
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin

# java home
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export PATH=$PATH:$JAVA_HOME
export PATH=$PATH:/usr/share/java

Hadoop needed additional folders as per my setup

cd /usr/local/hadoop
sudo mkdir input
sudo mkdir -p tmpsudo mkdir logs
sudo chown hduser:hadoop /usr/local/hadoop/input
sudo chown hduser:hadoop /usr/local/hadoop/tmpsudo chown hduser:hadoop /usr/local/hadoop/logs

Hadoop startup files: the following startup Hadoop files need setup. My installation files are located at /usr/local/hadoop/etc/hadoop. I use VI editor. more information about how to use VI commands check here.

cd /usr/local/hadoop/etc/hadoop

sudo vi core-site.xml

then add/change the following lines:

<configuration>

<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

</configuration>

sudo vi hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property><property>
<name>dfs.namenode.secondary.http-address</name>
<value>localhost:50090</value>
</property><property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/input/hdfs/namenode</value>
</property><property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/input/hdfs/datanode</value>
</property>
</configuration>

sudo cp mapred-site.xml.template mapred-site.xml

sudo vi mapred-site.xml

<configuration>

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If “local”, then jobs are run in-process as a single map
and reduce task.
</description>
</property>

</configuration>

sudo vi yarn-site.xml

<configuration>

<!– Site specific YARN configuration properties –>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.Shuf1fleHandler</value>
</property>

</configuration>

hadoop-env.sh. This is very important! My setup I am changing SSH Port from 22 to 60022. It took me away to figure out i had to add a additional parameters here:

sudo vi hadoop-env.sh

# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export export PATH=$PATH:/usr/share/java
..
export HADOOP_SSH_OPTS=”-p 60022″

 

Formatting the Hadoop HDFS file system

Before formatting HDFS file system, file “etc/hosts” needs the current hostname added:

sudo vi etc/hosts

right beside ‘localhost’, add the hostname of the your Bash on Windows 10 Installation.

in my case, that’s my Laptop’s name:

127.0.0.1 localhost MY-ASUS-LAPTOP  

sudo vi /etc/hosts, add FERNANDO-PC

cd /usr/local/hadoop/input

/usr/local/hadoop/bin/hadoop namenode -format

following folders created
../input/hdfs -> it will host datanode and namenode sub-folders

 

Starting up Hadoop services: First, make sure SSH service is up. in my case, I always use hduser:

su -l hduser

sudo /etc/init.d/ssh status

if it is not running, perform “sudo /etc/init.d/ssh start’. SSH needs to be up before starting Hadoop services.

 

Finally, we start Hadoop services in the following order:

start-dfs.sh
start-yarn.sh

if everything is fine, we can check the services using ‘jps’ bash command. you should see the following services:

 

Hadoop Web Interfaces

Hadoop comes with several web interfaces which are by default available at these locations:

http://localhost:8088/    web UI of the JobTracker daemon

http://localhost:50070/  web UI of the NameNode daemon

http://localhost:50090/  Secondary Node

 

2. Apache Hive installation

Downloading and installing Apache Hive

  • get it from mirror download www.apache.org. download and unzip the file. To make things easier move the folder to prefix ‘hive’ only

cd /usr/local

sudo wget http://apache.parentingamerica.com/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz

sudo tar -xzvf apache-hive-1.2.1-bin.tar.gz

sudo mv apache-hive-1.2.1-bin hive

  • create the following entries in ~/.bashrc

vi ~/.bashrc

add the following entries:

export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
export HIVE_PREFIX=$HIVE_HOME
export PATH=$PATH:$HIVE_PREFIX/bin

  • Create soft link for connector in Hive lib directory  or copy connector jar to lib folder. this depends on the MySQL installation. in my case i found it here: ./local/hive/lib/
$ sudo apt-get install libmysql-java
ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar

 

Configure MySQL Metastore for Hive. MySQL needs to be active. if not, start it. User “hiveuser” needs to be created to later be used with the SDA connection through SAP HANA Studio.

mysql -u root -p –host=localhost

in MySQL console, perform the following steps:

mysql> CREATE DATABASE metastore;
mysql> USE metastore;
mysql> SOURCE /usr/local/hive/scripts/metastore/upgrade/mysql/hive-schema-0.14.0.mysql.sql;

mysql> CREATE USER ‘hiveuser’@’%’ IDENTIFIED BY ‘hivepassword’;
mysql> GRANT all on *.* to ‘hiveuser’@localhost identified by ‘hivepassword’;
mysql> flush privileges;

mysql> quit;

 

Creating hive-site.xml ( If not already present) . use the template from the hive folder

cd /usr/local/hive/conf

sudo cp hive-default.xml.template hive-site.xml
sudo vi hive-site.xml

edit hive-site.xml file as follow:

<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hiveuser</value>
<description>user name for connecting to mysql server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepassword</value>
<description>password for connecting to mysql server</description>
</property></configuration>

make sure ssh is activeHDFS commands to create HIVE directories

  • make sure Hadoop services are up
  • create Hive directories. some may already exists, that’s totally fine
  • Grant access to the folders

hadoop fs -mkdir /tmp

hadoop fs -mkdir /user/

hadoop fs -mkdir /user/hive
hadoop fs -mkdir /user/hive/warehouse

hadoop fs -chmod g+w /tmp

hadoop fs -chmod g+w /user/hive/warehouse
hadoop fs -mkdir /tmp/hive
hadoop fs -chmod 777 /tmp/hive

 

Create soft link for connector in Hive lib directory or copy connector jar to lib folder Starting Hive console. It will be probable required.  This is regarding the error message “(“com.mysql.jdbc.Driver”) was not found in the CLASSPATH”. more details here.

ln s /usr/share/java/mysqlconnectorjava.jar $HIVE_HOME/lib/mysqlconnectorjava.jar

 

Starting Hive console: To start hive, just type ‘hive’. my setup once in while fails saying “Hadoop node is not in safe mode”. when that happens, just perform the following:

hadoop dfsadmin -safemode leave

other than that, Hive console should pop up. Try to type “show databases;” for instance

 

Running HiveServer2 and Beeline

  • HiveServer2 service needs to be started to allow JDBC connections
hive –service hiveserver2 start

Very important: this “Bash on Windows section” running HiveServer2 service will be locked. It took me awhile to figure that out. We need to open a new “Bash on Windows section” and leave that one alone.

Using a new Bash section we can check if the HiveServer2 is up and running by simple typing ‘jps’. The service “RunJar’ indicates that.

 

  • Beeline console: Beeline console using localhost and hive port 1000 (the same used in the DSN file for SAP SDA connection). In this case, we are only testing if we can connect to Hive:

beeline -u jdbc:hive2://

jdbc:hive2://>!connect jdbc:hive2://localhost:10000/default

 

in my installation here, the user is “hduser”. password is required.

a few Beeline commands:

show databases;

use <databasename>;

show tables;

select * from <tablename>

 

That’s all. this means the Hiveserver2 is running and Port 10000 is listening to any JDBC/ODBC connections.

 

Next blog i will explain my SAP SDA connection on my new SAP HXE. SDA to HADOOP/HIVE database using Simba ODBC drive.

 

Best wishes;

Fernando

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply