This article is for those interested in learning how to install Apache™ Hadoop®, Apache Hive ™ data warehouse on Windows 10’s Bash Shell. Then connect SAP HANA with Apache Hive ™ through SAP HANA Studio using SDA (Smart Data Access).
The goal here is to use Hadoop and Hive on a local Linux machine taking advanced of Windows 10 Anniversary Update as an alternative for Linux instance on the Cloud.
Main advantage: To have my own Ubuntu Linux system out-of-the-box with Windows 10 and not to worry about any additional cost for instance usage. Yes!
I will not discuss the installation of Hadoop and Hive in this post. Otherwise it will be too long. However, I think it is worth to make a second post sharing my installation, which by the way it is slightly different from the conventional installations, especially when using MySQL 5.7 with Hive metastore database. I also had trouble with HiveServer2 (or HS2) and the Port 10000 that needs to be open and listening for SQL statements. I will also share my SAP HANA One installation and SDA setup. I will also talk about the HIVE ODBC connectors I used in my scenario. All of that in a new post.
For now, I will explain the basics installing Bash on Windows 10 followed by the setup to allow external connection, such as SAP HANA ONE on the Cloud.
The steps are:
- Bash on Ubuntu on Windows 10 installation and upgrade to LSW with Ubuntu 16.04
- SSH service
- Port Forwarding
So, let’s get started:
1. Installing Bash on Ubuntu on Windows 10
There are tons of information out there how to do it. I will not spend time here explaining it. Use the link here for more information:
Now, the official installation is 14.04 trusty. However, I upgraded mine to Ubuntu 16.04 LTS (xenial). This is to fix MySQL installation. I use MySQL with Hive.
You can check the version using following bash command: $ lsb_release -a
|Trusty version out of the box (Windows Anniversary update)||Ubuntu 16.04 (upgraded)|
|No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
|No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Once again, you can find a lot of information out there. If you are starting fresh I would suggest upgrading it to 16.04 as soon as you complete the regular install.
Tip: Before the update I changed “/etc/sudoers” as follow:
|# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL
ffaian ALL=(ALL:ALL) ALL
This is to avoid tty error when using sudo later on. My installation I use ffaian as the Bash user. In This case I never need to use “sudo –l”, or “sudo su” to get root access. Everything I do I use sudo at the begging of my bash command
Steps I carried out:
To avoid the sudo tty issue and others, run these commands just before running do-release-upgrade
|sudo -S apt-mark hold sudo
sudo -S apt-mark hold procps
sudo -S apt-mark hold strace
|sudo apt-get update
sudo apt-get upgrade
sudo apt-get dist-upgrade
sudo apt-get autoremove
sudo do-release-upgrade -f DistUpgradeViewNonInteractive –d
The upgrade takes time. At first I thought my installation was stuck. Patience is the key to success! It will finish eventually.
2. Installing SSH service
I had to install SSH service. The SSH port is also used by Hadoop services regardless
Changing SSH Port: By default SSH connection is done through port 22. I decided to follow some advice I’ve seen on the forums and changed it to 60022. This is because port 22 is usually the one being attacked more frequently. Not mandatory.
Here I am commenting out Port 22 followed by adding Port 60022 along with “user privilege separation”, “Password authentication” and Allowuser parameters:
|sudo vi /etc/ssh/sshd_config
# What ports, IPs and protocols we listen for
#Privilege Separation is turned on for security –> very import to set to ‘no’
# Change to no to disable tunnelled clear text passwords
.. very important –> allow your user to log in SSH without password
# Set this to ‘yes’ to enable PAM authentication, account processing,
# and session processing. If this is enabled, PAM authentication will
# be allowed through the ChallengeResponseAuthentication and
# PasswordAuthentication. Depending on your PAM configuration,
# PAM authentication via ChallengeResponseAuthentication may bypass
# the setting of “PermitRootLogin without-password”.
# If you just want the PAM account and session checks to run without
# PAM authentication, then enable this but set PasswordAuthentication
# and ChallengeResponseAuthentication to ‘no’.
AllowUsers ffaian hduser
Installing SSH service:
|apt-get install openssh-server openssh-client
sudo apt-get update
Starting SSH service:
|sudo /etc/init.d/ssh start
* Starting OpenBSD Secure Shell server sshd
Checking SSH service status:
|sudo /etc/init.d/ssh status
* sshd is running
Additional SSH commands
|sudo /etc/init.d/ssh reload
sudo /etc/init.d/ssh stop
sudo service ssh –full-restart
Adding Outbound Firewall rule on your Windows 10 Machine: Just like AWS instances, I had to setup the Ports from my machine. Here some screen shots of my Windows 10 Firewall Inbound rule which allows Port 60022 (SSH) plus a few others I am using with Hadoop and Hive:
Ports are: 60022, 10000, 8088, 50070, 50030
Important!! I have Vipre antivirus on all my devices (full protection). In my case, to allow SSH service through my PC I always need to shutdown Vipre and then the Windows 10 Firewall takes over (with Vipre, Firewall is off). Otherwise I am not able to connect SSH remotely. Hadoop services will also complain. I also had to allow “Public” (Figure 3), otherwise I am not able to connect from my SAP HANA One instance into my Hadoop/Hive using my Public IP.
Generate SSH Key for User ffaian with empty password: and move key to “authorized keys” file
|$ $ ssh-keygen -t rsa -P “”
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
To SSH with remote connection privately
- From another device. In this my ASUS Laptop
- Using Putty on my remote Laptop
- create new connection
- Hosts: firstname.lastname@example.org -> my case, I use ffaian and the private IP Address of my desktop with Bash on Windows running and SSH status active)
- Port 60022 (again, my setup)
SUCCESS!!! I can remotely access my Bash on Windows remotely from my Private Network. Access from outside of my network is more complex. I will also discuss it here.
3. Port forwarding to allow external access
Port forward was needed on my private router so my public SAP HANA ONE instance on the Cloud could connect to my HADOOP/HIVE systems
Link above explains the basics. Pretty good indeed!
In short, I setup my router to re-direct the connection to my local machine which has my Bash on Windows, Hadoop and Hive systems. This is done on my router:
Over here I have SSH to my personal PC followed by two additional ports for Hadoop/Hive based on Internal Port 10000. In my Laptop I also have a 2nd setup, including Hadoop/Hive up and running. In this case, my SAP HANA One instance at AWS has two DSNs (remote sources), which the Port Forwarding re-redirects them accordingly to each machine. It works perfect!
I use http://www.yougetsignal.com/ to check for my Public IP and Port Forwarding status.
Testing SSH from outside my network using my Public IP:
- I use my second machine: My ASUS Laptop
- Internet connection through my IPhone via Hotspot via Bluetooth using my provider wireless network (Kood)
- Once again using Putty on my remote Laptop
- create new connection
- Hosts: email@example.com -> Public IP address
- Port 60022 -> Port Forwarding works beautiful here!
SUCCESS!!! I can remotely access my Bash on Windows remotely on the internet.
That’s all for now. If you are able to make this work, you’re far close to succeed connecting your Hadoop/Hive systems installed in your local Linux machine using Bash on Windows 10 outside your network. In my case, if use if with my SAP HANA One instance on AWS.
by the way, i use Simba Hive ODBC which works perfectly. Just great.
I hope you find this useful.