SAP HANA Vora is an in-memory, distributed computing solution that helps organizations uncover actionable business insights from Big Data. SAP HANA Vora can be used to quickly and easily run enriched, interactive analytics on both enterprise and Hadoop data.
In a series of tutorial videos the SAP HANA Academy‘s Tahir Hussain “Bob” Babar details how to install and use the newest release of SAP HANA Vora, SAP HANA Vora 1.3. Bob walks through the steps necessary to install SAP HANA Vora 1.3 on a single node system.
SAP HANA Vora 1.3 Overview
Watch the video below for an introduction to SAP HANA Vora 1.3 and for an overview of the series’ architecture.
Although Hadoop is highly scalable, it’s a challenging infrastructure to manage. For instance in schema flexibility and out-of-the-box enterprise grade analytics. Often specialized programing skills are required to extract business value from the data stored therein. SAP HANA Vora is an in-memory computing framework composed of specialized processing engines purposefully designed for big data environments.
For developers and data scientists, SAP HANA Vora allows the mash-up of enterprise data and data from a Hadoop data lake. For business users, SAP HANA Vora provides modeling capabilities and enterprise features such as graph processing, which displays complex relationships, and also time series modeling, which forecasts future values based on historical data.
Big data is both distributed and processed on multiple nodes. At the lowest layer is the Hadoop Distributed File System, HDFS. HDFS is the primary storage system used by Hadoop applications. It’s distributed to provide high performance access to data across all of the nodes within a cluster.
To process the data you can use tools like Apache Spark. Spark is an open source big data processing framework, which runs in-memory. SAP HANA Vora is an in-memory query engine that plugs into the execution framework to provide enriched interactive analytics on data stored in Hadoop.
As well as being able to preform business intelligence on that data, you can also build your own apps. You can also connect SAP HANA Vora to notebooks such as Jupiter and Zeppelin. It’s also easy to connect SAP HANA Vora to SAP HANA. These are bi-directional. If you build apps on the SAP HANA side you can connect to data in SAP HANA or Hadoop. Bidirectionally if you build your apps, which connect directly to SAP HANA Vora, you can use data contained in SAP HANA as a data source.
For this series imagine that you’re a user who wants to investigate and play with SAP HANA Vora and you want to install it yourself. First you will create a SUSE Linux Instance in Amazon Web Services. You will then use SSH through a Mac Terminal to access the instance. You can also use PuTTY if you have a windows machine.
Then you will use an easy deployment tool, named Ambari, to install and monitor the Hadoop cluster. Next, you will install the bare minimum of Hadoop services, such as HDFS, YARN, and Spark. After testing HDFS and Spark, then you will install SAP HANA Vora.
The two main SAP HANA Tools that you will examine are SAP HANA Vora Manager and the SAP HANA Vora Tools. There will be upcoming videos on the SAP HANA Vora engines.
After getting data into SAP HANA Vora, you will want to get it out. So you will install and use Apache Zeppelin to graphically visualize the data. You can use Apache Zeppelin or any BI tool to connect to SAP HANA Vora. This will enable you to connect to your data in HDFS.
Also, soon there will be some videos on how to connect SAP HANA Vora 1.3 to SAP HANA.
All of the commands and code used through this series can be found on the SAP HANA Vora 1.3 file on the SAP HANA Academy’s GitHub.
Create Linux Instance
In the series’ second video, linked below, Bob shows how to create a Linux Instance in Amazon Web Services.
First Bob creates a VPC network in AWS to house his instances. This ensures that every time the server is stopped and/or started the server name remains the same. Next, within EC2, Bob launches a SUSE Linux 11 Service Pack 4 image. The version that Bob has tested and has confirmed that the SAP HANA Vora installation works on is suse-sles-11-sp4-sapcal-v20160515-hvm-ssd-x86_64.
While configuring, Bob disables the Auto-assign Public IP as he will be using an elastic IP. Bob elects to use an existing security group that will be modified later. When launching the instance, Bob creates a key pair via a PEM file. Make sure to download and store the Key Pair so you can log into your server.
Then, Bob allocates an elastic IP to his recently created VPC. The elastic IP will never change. Finally, Bob associates his elastic IP to his instance. Next, Bob sets the security group so that all traffic can only come from his Mac computer. Bob will be using Terminal on his Mac to access the Linux server.
Connecting to the Instance
In the SAP HANA Vora 1.3 series’ next tutorial Bob shows how to connect to the AWS Linux Node using SSH from Terminal. Bob then details how to prepare the instance for the installation of Ambari, Hadoop and SAP HANA Vora.
If you want to connect to your instance using PuTTY on a Windows machine instead of using Terminal on a Mac than please watch this video from the SAP HANA Academy’s SAP HANA Vora 1.2 playlist.
In Terminal, Bob copies his Vora13.pem.txt key to his HOME/.shh folder.
Next, Bob changes the rights to the PEM file so he can log in using SSH. Finally, Bob logs in by entering the command shown below where he uses his Public IP address.
Once logged in you need to enter a few commands to install some packages that are required by the various SAP HANA Vora Servers. All of the scripts are listed on the SAP HANA Academy’s GitHub in the Vora_1.3_InstallNotes.txt file. For more information on SAP HANA Vora 1.3 Installation please read the SAP HANA Vora Installation and Administration guide found on the Vora 1.3 help.sap.com page.
First, as the root user, Bob makes sure his network time protocol daemon is running. Next, Bob installs a libaio file and changes the config file’s max size.
Next, Bob appends a line to the limits file. Then, Bob exports the locale for every thing as US English. Then, Bob installs a pair of packages, numactl and libtool, that prepare the document store server and the disk engine server respectively. You may need to install additional packages depending on your environment.
In the next video, linked below, Bob shows how to install Ambari on the Linux Instance. Ambari is a cluster provisioning tool which is used to both install and monitor Hadoop.
First, in Terminal as the root user, Bob pastes in the set of commands shown below to create a new specific user, cluster_admin, for installing Ambari. These commands can be copied from lines 56-61 of the Vora_1.3_InstallNotes.txt file on GitHub.
After logging in as the cluster_admin user, Bob generates a public/private RSA key file. Then Bob changes the rights to the files and outputs the public key to an authorized_keys folder.
Next Bob enters the command below to put the Ambari 2.2.2 repository into the Linux server’s repository.
After ensuring the repo is up to date, Bob installs the Ambari server. Once Ambari is installed Bob sets it up on the Linux server, This includes using Oracle JDK 1.8. Bob elects not to change any of the advanced database configuration, as Postgres is automatically installed. Finally, Bob restarts the server and then connects to the Ambari login page using port 8080 and his Public IP address.
In the next part of the series, Bob details how to install Hadoop. The Hadoop components HDFS, Hive, Spark and YARN are both installed and configured. Hadoop is used for the processing and storage of extremely large data sets in a distributed computing environment and is a prerequisite for SAP HANA Vora on the Linux instance.
First, Bob logs into Ambari as the default user and goes through the various tasks in the installation wizard. The most important part, is that you choose the stack HDP 2.4. Please check the installation guide for the exact versions that are supported.
Bob copies the target host name and the private key from his Terminal and adds them to the install wizard before confirming his host.
For the services Bob chooses HDFS, YARN+MapReduce2, Spark and Ambari Metrics. HDFS is Apache Hadoop Distributed File System and is where your files are stored. YARN+MapReduce2 helps you to do processing on the server. Spark is an open source processing engine built around speed. Spark is needed because SAP HANA Vora compliments Apache Spark. Ambari Metrics provide information about network and disc space usage. After clicking next the wizard also includes the necessary services ZooKeeper, Hive, Pig and Tez.
To continue on with the Hadoop installation, Hive must be configured and in the video below, Bob shows how to configure the Hive Service. This is a prerequisite for installing the Spark Service.
If you want to use the PostgreSQL database you need to create a database on the server. You need a Hive schema and user on the repository database. You will be installing the postgresql.jar file on the Linux server.
Back in Terminal, as the root user, Bob installs the postgresql-jdbc file and changes the rights on the file before making Ambari aware of the file.
Next, Bob logs in as the Postgres user Bob and accesses psql and then runs the commands shown below to create a database, user and password. He calls all three of these Hive.
Now, back as the root user, Bob copies the psql config file before opening it. Then Bob modifies the file to give Hive access to the database.
After the database is restarted, back in Ambari, Bob completes the Hadoop installation. Once done make sure that all of the services have been started.
Testing HDFS & Spark
Now that Hadoop is installed, Bob shows how to test it in the tutorial video below. These tests ensure that Linux Instance is ready for the SAP HANA Vora 1.3 installation.
First, to test HDFS, in Terminal Bob logs in as his HDFS user and creates a new folder in the cluster_admin directory. After giving the user rights, Bob tests HDFS by creating a simple test.csv file that contains a single row with three columns, as the cluster_admin user. Bob then puts that file into the folder he just created and shows that he can output the file as the cluster_admin user. The cluster_admin user can both read and write to HDFS.
Next, to test Spark first Bob sets some paths to the user in the bashrc file by locating his Java and Hadoop homes. Bob then inserts five commands from the GitHub file into the bashrc file.
The first test is to run Spark Shell which Bob successfully does as the cluster_admin user. Then still as the cluster_admin user Bob runs the command shown below to use a Spark library to return the value of Pi.
SAP HANA Vora Installation
In the next tutorial video Bob details how to download, install and configure the SAP HANA Vora package on the Linux Instance that contains Ambari and Hadoop.
First, go to the SAP Service Marketplace and select SAP HANA Vora and choose SAP HANA VORA FOR AMBARI 1. Download the file and then place it into the root directory of your cluster_admin user. Then extract the file into the HDP services folder by running the command below.
You install multiple services within Ambari. The difference compared to SAP HANA Vora 1.2 is that now there is only one service in Ambari. Instead a tool is used to manage and configure the services which is separate from using Ambari. Now you should see vora-manager if you do an ls on your services folder.
Now you need to restart both the Ambari agent and the Ambari server using a pair of commands from the GitHub file. As it’s restarting Ambari will become aware of the additional vora-manager services which will be available for installation in Ambari.
Log back into Ambari and make sure all of the services are running. Click on add services and scroll down to find and select Vora Manager. You need to install three things, Vora Manager Master, Vora Manager Worker and Vora Client on the single node.
On the next screen in the wizard choose advanced vora-manager-config and add your vora_default_java_home and your vora_default_spark_home. You can confirm the path of your Java and Spark home in the Terminal. Finally click deploy at the end of the wizard to complete the installation of SAP HANA Vora 1.3. To confirm make sure that the Vora Manager Master, Vora Manager Worker and Vora Client have all started.
In the next video linked below, Bob covers the post installation steps for SAP HANA Vora 1.3.
If you’re using AWS you may notice that the Vora Manager Worker goes from Live to Not Live after the installation. This is the result of a mismatch in AWS of the internal and external machine names. To fix it go back into the Terminal and create a new AWS file as the cluster_admin user. Then modify the params.py file in the HDP/2.4/services/vora-manager/package/scripts folder by inserting a few lines line. These lines are import socket and self_host = socket.getfqdn() as shown below.
After stoping and then starting the Vora Manager Worker it should be Live once again.
To access the Vora Manager, copy your IP Address and then append :1900 to it. However, you can’t access it unless you generate the password file. So back in Terminal as the root user find the find/ – name ‘genpasswd.sh’ file in the HDP services directory. Then create a user name and password for the Vora Manager.
Next, make the Vora user the owner of the htpassword file and give it the rights. Then place the file into the Vora Manager folder.
Then after you start and stop the Vora Manager you can access the Vora Manager on port 1900 using your login and password.
In the tutorial linked below Bob details how the SAP HANA Vora Manager works. The Vora Manager is a new feature in SAP HANA Vora 1.3. It is a UI which is used to configure, start/stop and troubleshoot the SAP HANA Vora Servers.
The SAP HANA Vora Manager has four tabs. The User Management tab enables you to create and edit other users. The Nodes tab details the services on and the stats of your Vora nodes.
The most important tab is Services. There you can configure and switch on and off each of your SAP HANA Vora services. Each services has both a configuration and a node assignment tab. You won’t need to change anything for each of the services at the moment unlike in Vora 1.2. The only thing you need to do is start all of the services. Now in the Nodes tab you can see that all of the services have started up.
Next Bob covers some troubleshooting steps in the case that all of the SAP HANA Vora services don’t start up. The log file for the Vora Manager is a folder called var log and can be accessed as the cluster_admin user in the Terminal. With the log file is a subfolder for each of the services.
Back in the Vora Manager, if you click on the Connection Status icon on the top right corner you can see the pair of third party tools, Consul and Nomad, that are utilized. You can then use these tools for advanced trouble shooting.
Back in the Terminal, turn on the Consul UI, which is a monitoring tool, and then locate the nomad tool found inside its bin folder. Then use the nomad tool to check the status of the services. Next, if you put the name of the SAP HANA Vora service after nomad, then you can see if any issues are occurring.
To turn on the web-based UI for Consul, append :8500 to the external IP Address in a new tab. This works the same way as it did for SAP HANA Vora 1.2.
The final tab, External Links, links you to the SAP HANA Vora Service tools. This is used to model within SAP HANA Vora. Make sure you paste in the external IP address in front of port 9225 to open the Tools. The SAP HANA Vora Tools allows you to create tables in SAP HANA Vora and to model various views. It also allows you to combine datasets together in SAP HANA Vora.
Testing SAP HANA Vora 1.3
In the next tutorial video, linked below, Bob shows how to test to make sure that installation of SAP HANA Vora 1.3 worked. Bob tests both the Vora Spark Shell and the Vora Tools UI.
Back in the Terminal on the Linux server as the cluster_admin user, navigate to the vora bin folder where SAP HANA Vora Spark Shell is contained. Then run the start-spark-shell command.
Once the shell has started, run the command to import the spark.sql.SapSQLContext. Next assign it to a variable called vc. Then enter the SQL below to create a test table. After, enter a show table command to see that the table exists and then enter a select * from command to see the table’s data. This proves the SAP HANA Vora Spark Shell works.
The next test creates a similar table using the SQL Editor in the SAP HANA Vora Tools. Bob runs the same command in the SQL Editor to create the table and then preforms a select *from to view the data.
Back in the SAP HANA Vora Tools home page the testtable is now contained in the data browser.
In the next video Bob shows how to install Apache Zeppelin. Apache Zeppelin is a new and incubating multi-purposed web-based notebook which brings data ingestion, data exploration, visualization, sharing and collaboration features to Hadoop, Spark and SAP HANA Vora.
For information about Apache Zeppelin visit their website. To see which version of Zeppelin you need you must check which version of scala you’re using. Scala is how you connect to Spark.
To find out your version of scala, launch the SAP HANA Vora Spark Shell as the cluster_admin user on the Terminal in your Linux Instance. Then run the command below to see your version of scala. Which version of scala corresponds to which version of Zeppelin is detailed in the installation guide.
On the Apache Zeppelin downloads page select the binary package with all interpreters of your proper version of Zeppelin to download it. Download it to dropbox or an FTP server so then it can be transferred to your Linux Instance.
Back in Terminal go to the Home directory of the cluster_admin user and run the wget command shown below.
After the Zeppelin tar file is unzipped then insert it into your bashrc file. Then log out and then back into your cluster_admin user and navigate into the zeppelin folder. Now that Zeppelin has been installed it must be linked to SAP HANA Vora.
SAP provides a SAP HANA Vora Spark extension. This is an interpreter that the UI within Zeppelin will use. First, copy the Zeppelin jar file from the Ambari Spark folder and insert it into the Zeppelin folder. Next open the Zeppelin folder that now contains both jar files and run the command shown below. This will remove the interpreter-setting.json file.
Then run a similar uf command to combine the interpreter-setting.json files together. Now it contains an interpreter for both Spark and SAP HANA Vora.
In this tutorial video Bob shows how to configure Apache Zeppelin to work with SAP HANA Vora by combining the SAP HANA Vora Interpreter with Zeppelin.
Back in the Terminal navigate to the zeppelin config folder as the cluster_admin user. Copy the zeppelin-env.cmd.template file and change its rights so it can be edited. Open the environment file and insert some class paths for YARN, Hadoop and Spark using the commands shown below.
Then copy the zeppelin-site.xm.template file, change the rights and modify the file by adding the sap.zeppelin.spark.SapSqlInterpreter as the second interpreter. Also change the Zeppelin server port from 8080 to 9099.
Next, go into Ambari and select the YARN service. Then navigate to the advanced tab from the Configs tab and choose to add a custom yarn-site. Give the property a key and then specify the version of HDP as the value. Save the configuration and restart YARN.
To start Zeppelin run the command shown below in the Terminal.
To access Zeppelin append the port 9099 to the end of the public IP address. Next, choose the Interpreters option on Zeppelin. Find and remove the Spark Interpreter. Then create a new Interpreter named Spark and put it in the Spark group. Change the master to yarn client and then add the jar file as an artifact in the dependency. Then restart the Spark Interpreter.
In the final video of the series Bob shows how to use Apache Zeppelin to work with SAP HANA Vora. This confirms that the installation and configuration of Zeppelin on Vora is working.
When using Zeppelin you create notes. So Bob creates a sample note called MyFirst note. Unlike the SAP HANA Vora Tools Zeppelin allows you to display data graphically. Due to the fact that your using an interpreter in Zeppelin you must always prefix statements with %spark.vora. Bob uses the same command to build a table as he did in the Vora Tools’ SQL Editor but with the prefix tacked on.
To view the table run the show table command and to view the data run a select * from statement. With this test we know that the Zeppelin interpreter works when connecting to SAP HANA Vora. That means that both Spark and Hadoop work as well.
To load data into HDFS, open Terminal on the Linux server and login as the cluster_admin user. Then create a simple table (aggdata) with the commands shown below.
Then use the hdfs dfs put command to add the file to hdfs.
Then back in Zeppelin Bob creates a new note and runs a command to create a new table from aggdata.csv. Then after doing a select * from command Bob is able to use the different graphics to visualize the data with Zeppelin.
That concludes the tutorial series on how to install and use SAP HANA Vora 1.3.
Please visit the SAP HANA Academy to learn about SAP HANA, SAP Analtyics, and the SAP HANA Cloud Platform from more than 1,800 free tutorial videos. Subscribe to keep upto date with the latest videos.
All code snippets used in every video are available on GitHub.
Please follow on Twitter @saphanaacademy and connect with us on LinkedIn.