This article walks through a quick installation of Vora 1.4 on HDP 2.6. It is a follow-up to my earlier post, Hadoop Installation on Monsoon.
As such, I’ve made assumptions that you already have a working Hadoop installation based on the above guide. If you are starting from scratch I’d recommend following the Hadoop installation first. Also, be aware that commands and directories may differ if you are using a different cluster manager or operating system.
- Master: 4 CPU / 32 GB
- 2 Workers: 4 CPU / 16 GB
SuSE SLES 12.1
SAP Vora 220.127.116.11 (latest at the time of writing)
Obtaining Vora 1.4:
Vora can be downloaded from the SAP Software Center. Go to product download page and search for “Vora 1.4”. Vora patches are all full installs so the latest Vora patch should be chosen. Make sure to select the latest install for your operating system (SLES, RedHat, or CentOS).
Once downloaded, move the installation to your Hadoop master node (I use FileZilla for this, but any FTP application will work.)
Part I – Running the install script:
SSH to your master node and unpack the install media. It is a good idea to do this in the /tmp directory as the hdfs user will need access:
> sudo su # tar -xzvf VORA04P_XXXXX.TGZ -C /tmp # cd /tmp/SAPHanaVora # ./install.sh --install-dep
This will launch an interactive install prompt. The –install-dep flag will install any necessary dependencies for Vora services. The installer will then ask for:
- A user/password for Vora tools and Vora manager
- Confirm your OS and cluster manager
- Confirm install of dependencies
- Confirm file access for hdfs user
- Whether to specify uid/gid for vora service users (selecting ‘No’ will use default uid/gid)
Once the installation finishes, your cluster manager (Ambari in my case) should be restarted automatically. However, I found the need to do a manual restart:
# /etc/init.d/ambari-server restart
Part II – Adding Vora Manager:
Once Ambari starts back up, from the cluster manager UI http://<master_node>:8080 navigate to Actions > Add Service.
In the list of services there should now be an option for Vora Manager.
Select Vora Manager and click Next.
Select where you want your Vora Manager node deployed (generally I choose the master node in the cluster).
Next, assign the Vora Manager Worker and Client components to ALL nodes in the cluster:
Next, under “Customize Services” there are some configurations you will want to confirm:
> echo $JAVA_HOME
> echo $SPARK_HOME
> sudo su # ifconfig
All the other options can remain defaults for now. These can always be adjusted later if you want to run Vora as a different user or under a different port. If you do want to run Vora as a non-root user please see Section 3.5 of the Install and Admin guide as additional steps are needed.
Click Next and the install process should start. As with the Hadoop installation, if any of the nodes fail, a failure log can be checked by clicking on the ‘Failed’ status within the installation UI.
Once successful, restart Vora Manager and any other services that require it.
Part III – Starting the Services:
At this point the Vora Manager UI should be accessible from http://<master_node>:19000
Log in with your credentials created when running the initial install script and access the Services tab.
Default configurations and node assignments are created during the installation, so clicking the “Start All” button at the top will attempt to start all services using these default settings:
If any services fail or show “Critical” status for more than 30-60 seconds, you can troubleshoot using the Vora 1.4 Troubleshooting Guide.
Using the default configurations, with the settings described in this and the previous Hadoop installation, all my services started without issue.
Once running, the Vora Tools UI can be access via port 9225 by default. Click Vora Tools > Node Assignment to confirm the node that Vora Tools is running on and access the UI either by typing in the URL: http://<assigned_node>:9225
Or from the Welcome tab in Vora Manager under “External Links”. You can use the same username / password used with Vora Manager to log in.
Part IV – Testing Vora:
Finally, we’ll use the vora-spark command-line tool to make sure the Vora libraries are accessible from Spark / Vora Tools.
On your master node console, we’ll first add an environment variable for the vora-spark library:
> sudo su # vi /etc/bash.bashrc
At the bottom of the file paste the following (note: directories may be different for different OSes and cluster managers):
Save and exit with :wq
Now we’ll launch the vora-spark shell:
> ls $VORA_SPARK_HOME/bin > sudo $VORA_SPARK_HOME/bin/start-spark-shell.sh
Once the Spark shell loads, you should be presented with a scala> prompt. Run the following commands to test connectivity:
scala> import org.apache.spark.sql.SapSQLContext scala> val vc = new SapSQLContext(sc) scala> vc.sql("show tables").show
After the show tables command, you should get output showing the no tables currently exist:
17/12/11 19:11:28 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 940 bytes result sent to driver 17/12/11 19:11:28 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 374 ms on localhost (1/1) 17/12/11 19:11:28 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/12/11 19:11:28 INFO DAGScheduler: ResultStage 0 (show at <console>:31) finished in 0.388 s 17/12/11 19:11:28 INFO DAGScheduler: Job 0 finished: show at <console>:31, took 0.540487 s +---------+-----------+ |tableName|isTemporary| +---------+-----------+ +---------+-----------+
Next we’ll create a simple test table, select from it, and drop it. These commands use the test.csv file we created in the Hadoop tutorial in the cadmin folder. Please see the Hadoop guide if you haven’t created this file, or just create a simple one-line csv file and add it to the hdfs folder /user/cadmin. Below are the full list of commands:
scala> val testsql = """ | create table testtable (a1 int, a2 double, a3 string) | using com.sap.spark.vora | options(files "/user/cadmin/test.csv") | """ scala> vc.sql(testsql).show scala> vc.sql("show tables").show scala> vc.sql("select * from testtable").show scala> vc.sql("drop table testtable").show scala> vc.sql("show tables").show
- After running the testsql statement and “show tables” we should see the testtable table listed:
17/12/11 19:32:49 INFO DAGScheduler: Job 2 finished: show at <console>:31, took 0.020684 s +---------+-----------+ |tableName|isTemporary| +---------+-----------+ |testtable| true| +---------+-----------+
2. “select * from testtable” statement should return the value from our test.csv file:
17/12/11 19:33:53 INFO DAGScheduler: Job 3 finished: show at <console>:31, took 0.106106 s +---+---+-----+ | a1| a2| a3| +---+---+-----+ | 1|2.5|Hello| +---+---+-----+
3. Finally, we drop the table and run a final “show tables” command to confirm that testtable is dropped.
Type exit at the scala prompt to exit the spark shell.
Congratulations! You now have a working instance of Vora 1.4!