[SAP HANA Academy] SAP HANA Vora Multiple Node Install
In a nine part tutorial series the SAP HANA Academy’s Tahir Hussain Babar (Bob) details how to install SAP HANA Vora on a five node cluster. In a little under a hour and a half you will learn step-by-step how to install a multiple node SAP HANA Vora system from scratch.
All of the scripts that Bob uses throughout the series are available here on GitHub.
Creating a VPC and Five SUSE Linux Instances in AWS
The only prerequisite for this series is that you have already created an Amazon Web Services account and have a windows machine that contains Google Chrome, notepad, PuTTY and PuTTYgen.
First in AWS, Bob creates a Virtual Private Cloud network. Next in his EC2 dashboard, Bob shows how to create the five SUSE Linux instances. There will be one master node and four additional nodes.
The preparation for each of the machines is exactly the same. Bob makes each of the five m3.xlarge SUSE Linux instances have 4 CPU and 15 megabits of RAM. Bob chooses to create five instances and selects his recently created VPC as the network. Bob then creates a new security group and makes sure that any port can be accessed by the machines.
As always make sure to save your key pair. Bob names his machines Master and then Node 1-4.
Connecting to the Five SUSE Linux Nodes in AWS using PuTTY
In the second video in the series Bob shows how to connect to the five SUSE Linux instances using PuTTY. Bob also details how to allocate one node as the master and the other four as just nodes.
First, Bob uses PuTTYgen to convert the PEM key used for all five of the instances to a PPK so that PuTTY can read it. Bob then shows how to create an elastic IP address and how to associate it to the master node. This ensures that the IP address doesn’t change every time AWS is stopped and/or started. Bob records the IP address and the internal machine name in notepad for each of his five machines.
In PuTTY Bob shows how to connect to the master instance. Bob uses the PPK key to authenticate his connection. Next, Bob follows the same process using the IP address for each of the other four nodes to establish a PuTTY connection for each.
Preparing the Master Node and Creating Two Users
Bob details how to prepare the master node in the series’ third tutorial video. Bob walks through how to create two Linux users on the master node, which will be used during the SAP HANA Vora installation.
Using PuTTY Bob connects to the master user and modifies the host file to append in the IP addresses and machine names of all of the instances with an echo command. Then Bob runs a script to create a new group called sysadmin and a new user named cluster_admin. Once logged in as the cluster_admin user, Bob builds a key file that creates a public and private key. The public key will go on each of the nodes but you won’t be able to connect to the machine unless you have the private key.
Bob runs a cat command to output both the public key and the private key for the master node and pastes them into notepad. Then Bob creates another user named Vora.
Then Bob logs in as his EC2 user in PuTTY and installs all of the C++ extensions he has already downloaded. This step is covered much more in depth in this video from the single node install series.
Preparing the Four Individual Nodes
In the next tutorial video Bob shows how to prepare each of the four individual nodes. Bob logs into one of his individual nodes with PuTTY. Bob then pastes in the same echo command he used in the previous video to modify the host file of the individual node.
Next, Bob creates and logs into a cluster_admin user and gives that user the rights to add files. These individual nodes will use the public key to connect to the private key that the cluster_admin user owns. Then Bob logs in as his EC2 user and downloads and installs the C++ extensions on the individual node.
Follow these exact same steps for the three additional individual nodes before continuing on with the series.
Installing Ambari on the Master Node
In the next video Bob shows how to install the Ambari server on the master node. Ambari will then be used to deploy Hadoop on the master and the four additional nodes.
With PuTTY Bob logs into his master node and logs in as the cluster_admin user. Bob follows the same steps to install Ambari that he outlined much more in-depth in this video.
Bob connects to a repository that contains all of the Ambari downloaded information and then starts the installation. Bob then installs the JDK and puts the repository into the default PostgreSQL database.
Next Bob starts the Ambari server and then confirms the successful installation by accessing Ambari via a web browser.
Installing Hadoop on the Five Nodes
In the sixth video of the series Bob shows how to install Hadoop on a five node cluster. A prerequisite for this video is that you have viewed the series on installing SAP HANA Vora on a single node.
First, Bob logs into Ambari in a web browser and runs the install wizard. Bob enters the internal machine name of each of the five nodes for the Target Hosts and then enters the RSA private key. In Bob’s example the nodes aren’t yet able to connect to each other, so Bob shows how to change the security group to resolve this issue by enabling all traffic to access each node.
For the Hadoop services Bob chooses HDFS, Yarn, Zookeeper, Naigos and Ganglia. Then Bob elects to have Zookeeper installed on all of the nodes. Next, Bob has all of the client tools installed on every one of the boxes and uses Naigos to set up email alerts before completing the Hadoop installation.
Apache Spark and HDFS Verification
In the next video Bob shows how to install and test Apache Spark. After, Bob verifies that the user can write to HDFS.
Bob installs Apache Spark on his master node. The steps are the same as when installing Apache Spark on a single node so please watch this video for a more in-depth explanation. You will install SAP HANA Vora on all of the nodes so you must ensure that when installing Spark it knows each and every node that Vora will be on.
Next, Bob runs Spark Shell and tests it’s connectivity using SparkPi. Bob also tests the connection to HDFS by loading a test CSV table to HDFS as the Vora user.
SAP HANA Vora Deployment on Five Nodes
In the second to last video of the series Bob shows how to install SAP HANA Vora on a five node cluster. The installation process is exactly the same regardless of the amount of nodes.
In PuTTY Bob restarts the Ambari server as the EC2 user. Next in Ambari on a web browser, Bob selects to add the SAP HANA Vora service and chooses to install it on every node. Once Vora has been installed, Bob restarts all of his Hadoop services.
Back in PuTTY Bob logs in as the Vora user and installs the file that contains the SAP HANA Vora extensions. Bob then uses Spark Shell to validate Vora’s successful installation by creating a test table using SAPSQLContext that reads from the test CSV table in HDFS.
In the final video of the series Bob highlights how to use Zeppelin. Bob also details how to run queries in Zeppelin to extract data from a five node cluster using SAP HANA Vora. Watch this video from the single node installation series for in depth coverage of installing Zeppelin.
With a much friendlier user interface than Spark Shell, Zeppelin enables you to create tables and test your SAP HANA Vora connection. In a new note in Zeppelin Bob creates a test table with a path to the test.csv from HDFS. The syntax Bob uses points to all of the hosts where Vora and Zookeeper are installed and specifies the nameNode. Bob then confirms that he can view the data in the table.
Back in PuTTY as the Vora user in the master node, Bob creates a file called stats.csv and puts the file into HDFS. Now back in Zeppelin Bob builds another note that creates a table for the stats.csv file with a similar syntax.
A main take away from this series is that whether you install SAP HANA Vora on a two node system or a 2,000 node system the process is nearly identical.
For more SAP HANA Vora tutorial videos please check out this playlist
SAP HANA Academy – Over 1,200 free tutorial videos on SAP HANA, SAP Analytics and the SAP HANA Cloud Platform.
Follow us on Twitter @saphanaacademy and connect with us on LinkedIn.