Triple H: Hadoop, Hive, HANA on Windows 10’s Bash Shell (Part-2)
As I promised on my previous post, I will be sharing here my Hadoop/Hive installation on my local machine using Windows 10 Bash Shell.
I will show you my setup and versions of Hadoop and Hive. In the next blog, I will be showing my local SAP HANA Express Edition connectivity to Hadoop/Hive using SDA.
To proceed here you will need to make sure you have Bash On Ubuntu installed and running on your Windows 10 machine.
I also have the new SAP HANA Express Edition 2.0 installed in the same hardware. Following by SAP HANA Studio, the SAP IDE for HANA (new web-based SAP HANA running on XS Advanced) ,SAP EA Design, and SAP SHIRE .
For more information about SAP HXE check here
so, Let’s start:
- Apache Hadoop installation
- Create Hadoop Group and User
- Add hduser as sudoers
- Generate SSH key for hduser
- Installing Java on the Lunix Ubuntu box
- Installing MySQL
- Hadoop installation
- Changing bashrc file for hduser
- Additional Hadoop folders
- Setup of Hadoop startup files: core-site.xml, hdfs-site.xml, mpred-site.xml, yarn-site.xml and hadoop-env.sh
- Formatting the Hadoop HDFS file system
- Starting up Hadoop services
- Hadoop Web Interfaces
- Apache Hive Installation
- Downloading and installing Apache Hive
- Configuring MySQL Metastore for Hive
- Creating hive-site.xml
- HDFS commands to create HIVE directories
- Starting Hive console
- Running HiveServer2 and Beeline
Bash on Windows 10:
1. Apache Hadoop installation
New Hadoop Group Security and User. New user hduser will require password (of course)
sudo addgroup hadoop sudo adduser –ingroup hadoop hduser |
Add hduser as sudoers to allow root permissions: Add hduser to the list
cd /etc sudo vi sudoersAdd the following lines:hduser ALL=(ALL:ALL) ALL |
Generate SSH Key for hduser with empty password and move key to authorized_keys file. Need to be logged as hduser
su -i hduser ssh-keygen -t rsa -P “” cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys |
Installing Java on Linux Ubuntu:
I have Java 8 installed.
sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer |
to confirm Java is installed:
java -version java version “1.8.0_111” > Shows my installation |
Installing MySQL on Linux Ubuntu: I use MySQL with Hive metastore database. I had issues with Bash on Windows 14.04 trusty. I succeeded installing on 16.04 Xenial.
sudo apt install mysql-server mysql-client sudo apt-get install libmysql-java |
To test MySQL installation:
sudo service mysql start |
To check the status:
sudo /etc/init.d/mysql status |
To connect with Localhost. Need the password of the installation
mysql -u root -p –host=localhost |
Key: Successful connection is required here. to check status use “sudo /etc/init.d/mysql status”:
![]() |
To stop MySQL server use:
sudo /etc/init.d/mysql stop |
Hadoop installation: I installed Hadoop 2.7.3. More information can be found at Apache Hadoop. The installation file from the mirror website. The location is /usr/local. After the unziping I moved the folder to hadoop just to simplify things
cd /usr/local sudo wget http://apache.mirror.rafal.ca/hadoop/common/stable2/hadoop-2.7.3.tar.gz sudo tar -xzvf hadoop-2.7.3.tar.gz sudo mv hadoop-2.7.3 hadoop |
bashrc file for hduser needs changing to add the paths for Hadoop and Java
vi ~/.bashrc # Hadoop #Hadoop bin/ directory to path # java home |
Hadoop needed additional folders as per my setup
cd /usr/local/hadoop sudo mkdir input sudo mkdir -p tmpsudo mkdir logs sudo chown hduser:hadoop /usr/local/hadoop/input sudo chown hduser:hadoop /usr/local/hadoop/tmpsudo chown hduser:hadoop /usr/local/hadoop/logs |
Hadoop startup files: the following startup Hadoop files need setup. My installation files are located at /usr/local/hadoop/etc/hadoop. I use VI editor. more information about how to use VI commands check here.
cd /usr/local/hadoop/etc/hadoop sudo vi core-site.xml |
then add/change the following lines:
<configuration> <property> <property> </configuration> |
sudo vi hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property><property> <name>dfs.namenode.secondary.http-address</name> <value>localhost:50090</value> </property><property> <name>dfs.name.dir</name> <value>/usr/local/hadoop/input/hdfs/namenode</value> </property><property> <name>dfs.data.dir</name> <value>/usr/local/hadoop/input/hdfs/datanode</value> </property> </configuration> |
sudo cp mapred-site.xml.template mapred-site.xml
sudo vi mapred-site.xml
<configuration> <property> </configuration> |
sudo vi yarn-site.xml
<configuration> <!– Site specific YARN configuration properties –> </configuration> |
hadoop-env.sh. This is very important! My setup I am changing SSH Port from 22 to 60022. It took me away to figure out i had to add a additional parameters here:
sudo vi hadoop-env.sh
# The java implementation to use. #export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/usr/lib/jvm/java-8-oracle export export PATH=$PATH:/usr/share/java .. export HADOOP_SSH_OPTS=”-p 60022″ |
Formatting the Hadoop HDFS file system
Before formatting HDFS file system, file “etc/hosts” needs the current hostname added:
sudo vi etc/hosts right beside ‘localhost’, add the hostname of the your Bash on Windows 10 Installation. in my case, that’s my Laptop’s name: 127.0.0.1 localhost MY-ASUS-LAPTOP |
sudo vi /etc/hosts, add FERNANDO-PC
cd /usr/local/hadoop/input /usr/local/hadoop/bin/hadoop namenode -format |
following folders created
../input/hdfs -> it will host datanode and namenode sub-folders
Starting up Hadoop services: First, make sure SSH service is up. in my case, I always use hduser:
su -l hduser sudo /etc/init.d/ssh status |
![]() |
if it is not running, perform “sudo /etc/init.d/ssh start’. SSH needs to be up before starting Hadoop services.
![]() |
Finally, we start Hadoop services in the following order:
start-dfs.sh start-yarn.sh |
if everything is fine, we can check the services using ‘jps’ bash command. you should see the following services:
![]() |
Hadoop Web Interfaces
Hadoop comes with several web interfaces which are by default available at these locations:
http://localhost:8088/ web UI of the JobTracker daemon http://localhost:50070/ web UI of the NameNode daemon http://localhost:50090/ Secondary Node |
2. Apache Hive installation
Downloading and installing Apache Hive
- get it from mirror download www.apache.org. download and unzip the file. To make things easier move the folder to prefix ‘hive’ only
cd /usr/local sudo wget http://apache.parentingamerica.com/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz sudo tar -xzvf apache-hive-1.2.1-bin.tar.gz sudo mv apache-hive-1.2.1-bin hive |
- create the following entries in ~/.bashrc
vi ~/.bashrc add the following entries: export HIVE_HOME=/usr/local/hive |
- Create soft link for connector in Hive lib directory or copy connector jar to lib folder. this depends on the MySQL installation. in my case i found it here: ./local/hive/lib/
$ sudo apt-get install libmysql-java ln -s /usr/share/java/mysql-connector-java.jar $HIVE_HOME/lib/mysql-connector-java.jar |
Configure MySQL Metastore for Hive. MySQL needs to be active. if not, start it. User “hiveuser” needs to be created to later be used with the SDA connection through SAP HANA Studio.
mysql -u root -p –host=localhost in MySQL console, perform the following steps: mysql> CREATE DATABASE metastore; mysql> CREATE USER ‘hiveuser’@’%’ IDENTIFIED BY ‘hivepassword’; mysql> quit;
|
Creating hive-site.xml ( If not already present) . use the template from the hive folder
cd /usr/local/hive/conf sudo cp hive-default.xml.template hive-site.xml |
edit hive-site.xml file as follow:
<configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value> <description>metadata is stored in a MySQL server</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>MySQL JDBC driver class</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> <description>user name for connecting to mysql server</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hivepassword</value> <description>password for connecting to mysql server</description> </property></configuration> |
make sure ssh is activeHDFS commands to create HIVE directories
- make sure Hadoop services are up
- create Hive directories. some may already exists, that’s totally fine
- Grant access to the folders
hadoop fs -mkdir /tmp hadoop fs -mkdir /user/ hadoop fs -mkdir /user/hive hadoop fs -chmod g+w /tmp hadoop fs -chmod g+w /user/hive/warehouse |
Create soft link for connector in Hive lib directory or copy connector jar to lib folder Starting Hive console. It will be probable required. This is regarding the error message “(“com.mysql.jdbc.Driver”) was not found in the CLASSPATH”. more details here.
ln –s /usr/share/java/mysql–connector–java.jar $HIVE_HOME/lib/mysql–connector–java.jar |
Starting Hive console: To start hive, just type ‘hive’. my setup once in while fails saying “Hadoop node is not in safe mode”. when that happens, just perform the following:
hadoop dfsadmin -safemode leave |
other than that, Hive console should pop up. Try to type “show databases;” for instance
![]() |
Running HiveServer2 and Beeline
- HiveServer2 service needs to be started to allow JDBC connections
hive –service hiveserver2 start |
Very important: this “Bash on Windows section” running HiveServer2 service will be locked. It took me awhile to figure that out. We need to open a new “Bash on Windows section” and leave that one alone.
Using a new Bash section we can check if the HiveServer2 is up and running by simple typing ‘jps’. The service “RunJar’ indicates that.
![]() |
- Beeline console: Beeline console using localhost and hive port 1000 (the same used in the DSN file for SAP SDA connection). In this case, we are only testing if we can connect to Hive:
beeline -u jdbc:hive2:// jdbc:hive2://>!connect jdbc:hive2://localhost:10000/default
|
in my installation here, the user is “hduser”. password is required.
a few Beeline commands:
show databases; use <databasename>; show tables; select * from <tablename> |
That’s all. this means the Hiveserver2 is running and Port 10000 is listening to any JDBC/ODBC connections.
Next blog i will explain my SAP SDA connection on my new SAP HXE. SDA to HADOOP/HIVE database using Simba ODBC drive.
Best wishes;
Fernando