In my documentation I’ll explain how install/configure SAP Hana Vora 1.2 with SAP Hana SP11 integration, I will demonstrate in detail how to setup a Horthonworks ecosystem in order to realize this configuration.

For my setup I’ll use my own lab on Vmware Vsphere 6.0, run SAP Hana Vora 1.2, SAP Hana Revision 112 and use Hadoop HDFS stack 2.7.2.

Disclaimer: My deployment is only for test purpose, I make the security simple from a network perspective in order to realize this configuration and use open source software.

In order execution

  • Deploy Horthonworks ecosystem
  • Install SAP Hana Vora for Ambari
  • Install SAP Hana Spark Controller 1.5
  • Install Spark assembly and dependent library
  • Configure Hive Metastore
  • Configure Spark queue
  • Adjust MapReduce2 class path
  • Connect SAP Hana to SAP Hana Vora

Guide used

SAP HANA Vora Installation and Developer Guide

SAP HANA Administration Guide

Note used

2284507 – SAP HANA Vora 1.2 Release Note

2203837 – SAP HANA Vora: Central Release Note

2213226 – Prerequisites for installing SAP HANA Vora: Operating Systems and Hadoop Components

Link used

Help SAP Hana for SAP HANA Vora 1.2

HDP Documentation Ver 2.3.4

Overview Architecture

5-9-2016 9-14-03 AM.jpg

The architecture is based on a full virtual environment, running SAP Hana Vora 1.2 require mandatory component as part of the Hadoop ecosystem:

• HDFS 2.6.x or 2.7.x

• ZooKeeper

• Spark 1.5.2

• Yarn cluster manager

For my configuration all my server are registered in my DNS and sync with an NTP server.

Deploy Horthonworks Ecosystem

The Horthonworks ecosystem deployment consist of several step

1. Prepare the server by sharing SSH Public Key

2. Install MySQL connector

3. Install Ambari

4. Install Hive database

5. Install and configure HDP cluster

For the installation in order to make it simple, I decide to use the “Ambari Automated Installation” based on HDP vreison 2.3.4 which can be deploy with SPARK version 1.5.2.

/wp-content/uploads/2016/05/8_2_947293.jpg

To realize this configuration my deployment will comport 3 vms:

Ambari: ambari.will.lab

Yarn: yarn.will.lab

Hana: vmhana02.will.lab


Prepare the server by sharing SSH Public Key

My 3 severs up and running we have to set the SSH Public key on Ambari server in order to allow it to install Ambari agent on host which are part of the cluster.

I first create the rsa key-pair

/wp-content/uploads/2016/05/1_947324.jpg

And copy the public key on the remote server “yarn”

/wp-content/uploads/2016/05/2_947325.jpg

And try to ssh my remote server to confirm that I don’t need to use the password

/wp-content/uploads/2016/05/3_947327.jpg

Install MySQL connector

Hive requires a relational database to store Hive Metastore, I install the MySQL connect and note the path, it will be required during the initialization setup of Ambari

/wp-content/uploads/2016/05/3_1_947328.jpg

/wp-content/uploads/2016/05/3_2_947329.jpg

Install Ambari

On the Ambari server we have download the Ambari repository for SLES11:

wget -nv http://public-repo-1.hortonworks.com/ambari/suse11/2.x/updates/2.2.0.0/ambari.repo -O /etc/zypp/repos.d/ambari.repo

/wp-content/uploads/2016/05/4_947330.jpg

And finally install Ambari

/wp-content/uploads/2016/05/5_947331.jpg

Now installed, the Ambari server needs to setup:

Note: I decide to use Oracle JDK 1.8 and the embedded database for Ambari PostgreSQL

/wp-content/uploads/2016/05/6_947332.jpg

Once done start the server and check the status

/wp-content/uploads/2016/05/8_947338.jpg

Note: I did not specify the MySQL connector path at the beginning of the initialization of Ambari, in order to include it stop Amabri and load it by re-executing the following command

/wp-content/uploads/2016/05/8_1_947339.jpg

Install Hive Database

By default on RHEL/CentOS/Oracle Linux 6, Ambari will install an instance of MySQL on the Hive Metastore host. Since i’m using SLES i need to create an instance of MySQL for Hive Metastore.

/wp-content/uploads/2016/05/20_947341.jpg

Install and configure HDP cluster

The server up and running we can start the installation and the configuration of the HDP cluster components, to proceed launch the Apache Ambari url and execute the wizard with the default user and password “admin/admin”

/wp-content/uploads/2016/05/9_947345.jpg

Follow the step provided by the wizard to create your cluster

/wp-content/uploads/2016/05/10_947348.jpg

/wp-content/uploads/2016/05/11_947349.jpg

For this section provide the private key generated earlier on Ambari server

/wp-content/uploads/2016/05/12_947350.jpg

/wp-content/uploads/2016/05/13_947351.jpg

Host added successfully, but check the warning message

/wp-content/uploads/2016/05/14_947353.jpg

Choose the necessary services you wants to deploy

/wp-content/uploads/2016/05/17_947354.jpg

Assign the service you wants to run on the selected master node, since I’m using one host only it’s a no brainer. Additional host can be assigned later upon your needs

/wp-content/uploads/2016/05/18_947355.jpg

Assign Slave and client

/wp-content/uploads/2016/05/18_1_947356.png

Customize your service upon your needs as well, in my case I use a MySQL database so I need to provide the database information

/wp-content/uploads/2016/05/19_947360.jpg

/wp-content/uploads/2016/05/19_1_947361.png

Review the configuration for all service and execute

/wp-content/uploads/2016/05/21_947362.jpg

/wp-content/uploads/2016/05/21_2_947369.jpg

/wp-content/uploads/2016/05/21_3_947370.jpg

Once completed, access the Ambari web page and make some checks to see the running services

/wp-content/uploads/2016/05/22_947371.jpg

The Horthonwork ecosystem now installed we can proceed with the SAP Hana Vora for Amabri installation

SAP Hana Vora for Amabri

SAP HANA Vora 1.2 is now available for download as a single installation package for the Ambari and Cloudera cluster provisioning tools. These packages also contain the SAP HANA Vora Spark extension library (spark-sap-datasources-<VERSION>-assembly.jar), which no longer needs to be downloaded separately.

/wp-content/uploads/2016/05/23_947378.jpg

The following components will be deployed from the provisioning tool

/wp-content/uploads/2016/05/24_947379.jpg

For Vora Dlog component a specific library is required on the server “libaio”, make sure it’s installed

/wp-content/uploads/2016/05/25_947380.jpg

Once download, from Ambari server copy the VORA_AM* file into:

/var/lib/ambari-server/resources/stacks/HDP/2.3/service folder

/wp-content/uploads/2016/05/26_947381.jpg

And decompress it, it will generate the several vora application folder

/wp-content/uploads/2016/05/27_947382.jpg

Then restart the Ambari server in order to load the new service

/wp-content/uploads/2016/05/27_1_947383.png

Once completed install the new Vora service form the Ambari dashboard

/wp-content/uploads/2016/05/29_947390.jpg

Select the vora application to deploy and hit Next to install it

/wp-content/uploads/2016/05/28_947391.jpg

The Vora Discovery and Thriftserver will required some customization entry such as hostname and java location

/wp-content/uploads/2016/05/30_947392.jpg


/wp-content/uploads/2016/05/30_1_947399.png

/wp-content/uploads/2016/05/31_947400.jpg

The new service appear now, yes I have red services but will be fixed.

/wp-content/uploads/2016/05/31_1_947401.jpg

The Vora engine installed, I need to install the Spark Controller

Install SAP Hana Spark Controller 1.5

The Spark controller needs to be download from the marketplace, this is an .rpm package.

/wp-content/uploads/2016/05/32_947403.jpg

Once downloaded execute the rpm command to install it

/wp-content/uploads/2016/05/33_947404.jpg

When the installation is completed the /usr/sap/spark/controller folder is normally generated

/wp-content/uploads/2016/05/33_1_947405.jpg

The next phase is now to install the Spark assembly file and Dependent libraries

Install Spark assembly and dependent library

The Spark assembly file and Depend libraries needs to be copied into spark controller external lib folder.

Note: up to now only the assembly.jar lib version 1.5.2 is the only supported version to works with t Vora 1.2, I’ll download page at https://spark.apache.org/download.html

/wp-content/uploads/2016/05/34_947406.jpg

Decompress the folder and copy the necessary library into “/usr/sap/spark/controller/lib/external” folder

/wp-content/uploads/2016/05/34_1_947407.jpg

And I will update the hanaes-site.xml file in /usr/sap/spark/controller/conf folder to update the content

/wp-content/uploads/2016/05/34_2_947420.jpg

Spark and Yarn create staging directories in /hana/hanaes directory in HDFS, this directory needs to be created manually by the following command as hdfs user:

hdfs dfs –mkdir /user/hanaes

/wp-content/uploads/2016/05/35_947421.jpg

Configure Hive Metastore

Since SAP Hana Spark Controller connect to the Hive Metastore, the hive-site.xml file needs to be available in controller’s class path.

To do it I will create a symbolic link in the /usr/sap/spark/controller/conf folder

/wp-content/uploads/2016/05/36_947422.jpg

And adjust the hive-site.xml file with the following parameter:

• Hive.execution.engine = mr

• Hive.metastore.client.connect.retry.delay = remove the (s)

• Hive.metastore.client.connect.socket.timeout = remove the (s)

• Hive.security.authorization.manager = org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider

Note this change are made only because for our example we are using Horthonworks distribution, with Cloudera it’s not required

Configure Spark queue

In order to  avoid Spark to take all available resources from Yarn manager and thus leaving no resource for any other application running on Yarn resource manager, I need to configure Spark dynamic Allocation by setting up a queue in ‘Queue Manager”

/wp-content/uploads/2016/05/37_947424.jpg

Create it then save and refresh from the action button

/wp-content/uploads/2016/05/38_947425.jpg

Once done from hanaes-site.xml file add the spark.yarn.queue property

/wp-content/uploads/2016/05/39_947429.jpg

/wp-content/uploads/2016/05/39_1_947430.jpg

Adjust Mapreduce2 class path

One import point to take in consideration about Spark Controller, is the fact that the component library path call during startup doesn’t support variable such as “${hdp.version}”.

This variable is declared in the MapReduce2 configuration

/wp-content/uploads/2016/05/39_2_947431.jpg

Expand the Advanced mapred-site property and locate the parameter “mapreduce.application.classpath

/wp-content/uploads/2016/05/39_3_947435.jpg

Copy/past the whole string in your favorite editor and change all reference of ${hdp.version} entries by the current hdp version

/wp-content/uploads/2016/05/39_4_947436.jpg

Before the change

$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure

After the change

$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.3.0.0-2557/hadoop/lib/hadoop-lzo-0.6.0.2.3.0.0-2557.jar:/etc/hadoop/conf/secure

Once done, as “hanaes” user, start the Spark Controller from the directory /usr/sap/spark/controller/bin

/wp-content/uploads/2016/05/40_2_947437.jpg

Check the Spark log to see if it’s running properly in /var/log/hanaes/hana_controller.log

As we can see I have an error in my config file

/wp-content/uploads/2016/05/40_1_947441.jpg

Connect SAP Hana to SAP Hana Vora

My Horthonworks ecosystem in place and SAP Hana Vora 1.2 deployed, I can connect my Hana instance to it over the Spark adapter.

Before trying to make any connection one specific library needs to be copy into “/usr/sap/spark/controller/lib” folder, from “/var/lib/ambari-agent/cache/stacks/HDP/2.3/services/vora-base/package/lib/vora-spark/lib” copy the spark-sap-datasources-1.2.33-assembly.jar file

/wp-content/uploads/2016/05/41_947442.jpg

Once done restart the Spark Controller

Now to connect on my Hadoop from Hana in need to create a new remote connection by using the following SQL statement

/wp-content/uploads/2016/05/42_947443.jpg

Since I did not create any table in my Hadoop environment this is why nothing appear below default, in order to test it I’ll create a new schema and load a table (csv) into it and see the result in hana

/wp-content/uploads/2016/05/43_947450.jpg

Note: you can download some csv sample here

Sample insurance portfolio

Real estate transactions

Sales transactions

Company Funding Records

Crime Records

/wp-content/uploads/2016/05/44_947451.jpg

Once done check the result from Hive view

/wp-content/uploads/2016/05/46_947452.jpg

And make the check in Hana by creating and querying the virtual table

/wp-content/uploads/2016/05/47_947453.jpg

/wp-content/uploads/2016/05/48_947454.jpg

/wp-content/uploads/2016/05/49_947456.jpg

It’s all good I have my data

/wp-content/uploads/2016/05/50_947457.jpg

My configuration is now completed with SAP Hana Vora 1.2 setup and connection with SAP Hana SP11.

To report this post you need to login first.

4 Comments

You must be Logged on to comment or reply to a post.

  1. Rahul Deo Vishwakarma

    I see that you made a single node installation of Vora 1.2, but have you verified Vora 1.2 Installation as per the SAP HANA Vora 1.2 (Document Version: 1.0 – 2016-03-31)

    SAP HANA Vora Installation and Administration Guide

    Testign Vora installation:

    scala> import org.apache.spark.sql.SapSQLContext

    scala> val vc = new SapSQLContext(sc)

    scala> val testsql = “””

    CREATE TABLE table001 (a1 double, a2 int, a3 string)

    USING com.sap.spark.vora

    OPTIONS (

    tablename “table001”,

    paths “/user/vora/test.csv”

    )”””

    scala> vc.sql(testsql)

    scala> vc.sql(“show tables”).show

    +———+———–+

    |tableName|isTemporary|

    +———+———–+

    | table001| false|

    +———+———–+

    scala> vc.sql(“SELECT * FROM table001”).show

    +—+–+—–+

    | a1|a2| a3|

    +—+–+—–+

    |1.0| 2|Hello|

    +—+–+—–+

    scala > <Ctrl-C to quit>

    The problem we are facing is with reference to Vora Discovery Services which are not able to start. It would great if you could mention about how you have configured Vora Discovery Services.

    (0) 
    1. Williams Ruter Post author

      Hello Rahul,

      Yes i did fix the Discovery Service, how did you configure your machine which host vora discovery service ? do you have several ip or card setup on it ? if yes make sure the parameter “vora_discovery_bind_interface” is using the correct eth(x).

      Williams

      (0) 
      1. Rahul Deo Vishwakarma

        Hi William,

        Thank you very much for the reply. I tired your instructions, but it did not work. I have Vora 1.2

        and the version you installed on your setup is Vora 1.1

        William, I already have Vora 1.2 from SAP Marketplace, but  I need Vora 1.1 for the installation to complete. Vora 1.1 is not available on SAP Marketplace for download.

        I request you to please share the binaries of Vora 1.1 ? It would be of great help. I look forward to hear form you. 

        Thanks,

        Rahul Vishwakarma

        (0) 
        1. Williams Ruter Post author

          Hello Rahul,

          My documentation is based on Vora 1.2 not 1.1, can you please forward me the log of your DS ?

          Can you confirm if the service start and then shutdown by itself ?

          Unfortunately i can’t provide you the Vora 1.1 binary.

          Williams

          (0) 

Leave a Reply