In a series of 11 tutorial videos the SAP HANA Academy’s Tahir Hussain Babar (Bob) details and demonstrates step-by-step the entire process of installing and running a SAP HANA Vora system on a single node. From creating a SUSE Linux instance in AWS to visualizing data from tables connected to Vora with Zeppelin, Bob’s tutorial series will show you how to install SAP HANA Vora in under two hours.
All of the scripts used throughout this series are available here on GitHub.
In the first video Bob shows how to create a m3.xlarge instance in a SUSE Linux server in Amazon Web Services. Make sure to save the Key Pair file you download for the instance you create.
Bob details how to set an elastic IP that associates to your Private IP address for your instance. This makes sure the IP doesn’t change every time you start and stop the machine.
After the instance has started up, Bob walks through how to convert the Vora PEM file into a PPK using PuTTY Gem so you can log into your newly created instance using PuTTY.
Bob shows how to properly prepare the SUSE Linux server by creating a pair of users (cluster_admin and vora) that will be necessary for the installation.
After ensuring that the proper IP address is in the host file, Bob installs a set of five files. The five files are ambaripkg (used to install the Vora service in Ambari), datasourcedist (Vora extensions), SLES11-compat-c++ (c++ extensions need for the Vora installation), spark and Zeppelin (Front-end tool).
*Be aware that the names of the files may change once Vora is fully release. Also, soon a video will be published that shows how to download these five files from the SAP service market place.*
Next, Bob creates a group for the new users he will create. This way the users won’t need to use a password when they login. Then Bob creates a master user named cluster_admin which will be used when installing Ambari and Hadoop. Next, Bob shows how to generate a private and public key for the user to act as a security layer.
Continuing on, Bob creates another user name Vora. This user will be used for the Vora installation. Finally Bob shows how to install the c++ extensions that will be used by Vora.
In the next video Bob walks through how to install Ambari, a tool that will be used to create and monitor a Hadoop cluster.
To start, in PuTTY Bob logs in as the cluster_admin user and then runs a wget of a publicly available url (listed on the GitHub). Then Bob runs through the commands to preform the install using Zipper. After the installation is completed Bob proceeds to show how to run the Ambari server setup command. Once the JDK has been installed you can choose if you wish to deploy your repository to a specific database. Bob elects to leave the default which is a postgreSQL database.
To confirm the successful installation, Bob enters his AWS IP address along with the port number (8080 in Bob’s example) as a URL in a browser. Bob is directed to a Ambari login page and thus proves Ambari’s successful installation.
In the series’ next installment Bob shows how to create a small Hadoop cluster.
First, Bob logins into Ambari using the default user name and password (admin/admin) and then recommends changing your password. Then Bob opens the install wizard and follows the simple steps using HDP 2.2 as his stack. Bob uses the internal DNS from AWS as his target host and authenticates it with the private key from the custom_admin user. After the cluster has been registered, Bob selects the components he must install to use Vora. Those components are HDFS, Yarn and Zookeeper. Bob then continues through the wizard keeping the defaults to complete his Hadoop cluster creation.
Continuing the series Bob shows how to install Apache Spark so you can access HDFS. As of the recording of these videos Bob uses Spark release 1.4.1 for Hadoop 2.6.
To start in PuTTY Bob logs in as his cluster_admin user and then runs the commands to install his already downloaded Spark file. Next, Bob modifies some parameters and paths in the Bash folder. Bob inserts a script to the bottom of the Bash folder that specifies the Hadoop path, the Java home, the Hadoop and Spark conf directory and the path on the executables. Next, Bob creates his own conf directory with a script which adds the spark driver memory, the version of Hadoop, the number of cores and where the main node and zookeeper will be for Vora.
In the next part of the series Bob examines how to configure HDFS so that the Vora user can write to and store files in the HDFS system.
First in PuTTY, Bob logs in as the HDFS user and creates a new directory in HDFS to store the files called Vora. Then Bob allocates the recently created Vora folder to the Vora user. To ensure that the system works Bob outputs a test file (test.csv) to the Vora folder and then confirms it’s existence.
This next video in the series details how to test Apache Spark on top of the Hadoop cluster.
Back in PuTTY Bob enters spark-shell to test that Spark is running on Yarn using the Vora user. This test makes sure that Spark can speak to HDFS and that HDFS installed on the Ambari server is working correctly on the Linux server. Then Bob enters a sample piece of code (Spark Pi) in PuTTY to prove that the Vora user can actually use Spark.
To further confirm, in a browser Bob navigates to his AWS IP address on another port (8088) to a tool called cluster apps. The cluster apps webpage shows that Spark Shell and Spark Pi have successfully run and then when Bob again enters Spark Pi in PuTTY, another entry is displayed in the cluster apps tool.
Installing SAP HANA Vora
Now in this video Bob shows how to install SAP HANA Vora.
First as the base user Bob runs a package in PuTTY that installs a service which will enable you to install Vora in Ambari. Then after restarting the Ambari server in PuTTY, Bob logs back into Ambari in a browser and choose actions>add service and selects SAP HANA Vora. Bob leaves all of the defaults and successfully installs SAP HANA Vora on Ambari.
After restarting all of his services on Ambari, Bob logs into PuTTY as the Vora user and installs the Vora extensions into a newly created folder. The extensions are a Spark SAP datastore file and a series of scripts.
Testing SAP HANA Vora
Bob continues the series by showing how Vora can connect to Spark which will then ultimately connect to HDFS.
As the Vora user in PuTTY Bob starts the Spark Shell which tests that Vora can talk to Spark and that Spark can talk to HDFS, which is working on the SUSE Linux server. Next, Bob runs a series of commands using SAP Spark SQL in PuTTY to test Vora by creating a table which uses the data from the test.csv file that was created using HDFS.
Back in PuTTY, Bob enters a script that assigns the SQL to a value called test.sql. Then Bob opens the table and runs a select statement to see the table’s data. Now in PuTTY you can see the data from the test.csv file from the HDFS system is in the table created in Vora.
In the second to last video in the series, Bob shows how to install Apache Zeppelin. Compared to Spark SQL, Zeppelin offers a much more user friendly interface to create tables.
First Bob corrects an issue in the mapreduce.application.classpath in Ambari and restarts map reduce. Next, Bob logs into PuTTY as the Vora user and installs the Zeppelin file with C Bash. Then as the base user in the Bash folder, Bob inserts some additional parameters using a script that specifies the Zeppelin home and the version of Hadoop.
Next as the Vora user, Bob builds a Zeppelin conf file and changes the interpreter to the SAP interpreter. Then using a template Bob builds and then modifies by adding four additional paths a Zeppelin environment file. Next, Bob creates a symbolic link from the datasources file to the Zeppelin environment file.
Then Bob runs a script to start Zeppelin. Bob opens up a pair of ports (9099 and 9099+1) so that he can access Zeppelin via a browser.
In the final video of the series Bob shows how to use Apache Zeppelin to access data which has been stored and configured using SAP HANA Vora. Zeppelin is a web-based notebook that enables interactive data analytics.
Accessing Zeppelin from his browser, Bob checks to make sure his interpreter lists %velo. If that isn’t present then something went wrong with your install.
Next on the main Zeppelin page, Bob selects to create a new note. Bob then creates a table using a file from the local system by entering a script. Note that all commands in Zeppelin start with %velo. The table will be outputted locally with a Vora path for the test.csv and specifies the correct Zookeeper machine name. Bob then confirms that the table has the same data from the test.csv file from the Vora system and then highlight’s Zeppelin’s available native analytics.
Next Bob tests the connection to HDFS by creating a new note with a different, but similar script, that points to a HDFS path, node and machine. Bob then confirms the table’s existence with a select * statement.
Next in PuTTY Bob logs in as the Vora user and creates a new simple file (stats.csv) that contains five rows. Then Bob uses a Hadoop command to put the file on the HDFS and system and confirms its existence in Ambari. Now back in PuTTY Bob removes the file from his local Linux system to prove that the file isn’t connected to the local system.
Now back in Zeppelin Bob creates a new note in which he creates a table with the data from the stats.csv file. Then Bob views the data and showcases Zeppelin’s analytics.
To recap Bob instantiated a SUSE Linux server in AWS and prepared the server by creating a pair of users. Then Bob installed Ambari alongside Hadoop and created a small Hadoop cluster. Next, Bob installed and tested Apache Spark before installing SAP HANA Vora. Bob tested SAP HANA Vora with the Spark Shell. Finally to visualize the data Bob installed Zeppelin. With Zeppelin Bob created tables which were connected through Vora to Apache Spark to HDFS.
For more tutorial videos about SAP HANA Vora please check out this playlist.
SAP HANA Academy – Over 1,200 free tutorial videos on SAP HANA, Analytics and the SAP HANA Cloud Platform.