Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
former_member255270
Active Participant

How to deploy Vora 1.4 Patch 1 on a Kerberized Hadoop cluster


This guide is for reference only and has been derived from the official Vora 1.4 Administration and Installation guide which is always the most current and up-to-date document on how to install and configure Vora.

 

This guide assumes that:

  • The Hadoop cluster is managed by either Ambari Hortonworks 2.5.X or Cloudera Manager 5.8 or greater

  • The Hadoop services have already been enabled for MIT Kerberos or Kerberos on Active Directory

  • The cluster meets minimum specification criteria for Vora as documented in the Product Availability Matrix (PAM)


 

If you encounter any issues we encourage you to see the Installation and Troubleshooting guides at https://help.sap.com/viewer/p/SAP_VORA


 

Installation


The steps below document how to install Vora on a cluster that has already been Kerberized. If Vora is already installed you may proceed to the "Configuration" section of this guide

Step 0: Download the Vora installer to Ambari/Cloudera manager host


Vora can be downloaded from http://support.sap.com and can be extracted using the command

tar -xvf VORA04P_1-70002662.TGZ /<Vora_root_install_dir>/



Step 1: Distribute Vora's RPM files to Hadoop nodes


By default the Vora installer will attempt to distribute the Vora RPM files via HDFS. However, if the Hadoop cluster has already been enabled for Kerberos the alternative SSH-mode must be used to distribute the files instead.


Modify the following file with a list of hostnames that will be running Vora services




<Vora_root_install_dir>/config/hosts.txt


Run the installer with the following argument and follow the on-screen prompts:

<Vora_root_install_dir>/install.sh --use-ssh 


For a complete list of available arguments see 2.6.1.9 of the Vora 1.4 administration and installation guide.

The installer will afterwards attempt to restart the Ambari Server or Cloudera Manager.


Step 2: Distribute Vora to worker nodes using the cluster manager


In Cloudera

  1. Activate and distribute the parcel to all Vora hosts

  2. Add the service "Vora Manager"

  3. Ensure that the Vora Manager Worker and Gateway nodes are enabled on all Vora hosts

  4. It is optional but not required to have more that one Vora Manager Master

  5. Define the parameters as mentioned below


 

In Ambari

  1. Add the service Vora Manager

  2. Followed the on-screen wizard

  3. Ensure that the Vora Manager Worker and Gateway nodes are enabled on all Vora hosts

  4. It is optional but not required to have more that one Vora Manager Master

  5. Set the following parameters,

  6. Define the parameters as mentioned below (For details on all other parameters refer to the Vora Install guide)




vora_default_java_home = </path/to/javaJDK/directory>
vora_default_spark_home = </path/to/spark-client/>
vora_discovery_bind_interface = <Default network interface>



... then proceed to start all Vora Manager services on all hosts.




 




Configuration


Step 0: Generate Vora service and client principals


It is not required but typical that all Vora service principals and the single client principal to be named vora


In this guide we assume the client principal and service principal are both named vora and have respective keytab files named vora.client.keytab and vora.service.keytab

  1. Generate a Vora service principal and keytab file for each host that will run Vora services. All Vora service keytab filenames must be identical

  2. Generate only a single unique Vora client principal and keytab file, regardless of number of hosts

  3. For each Vora service principal host, upload that host's corresponding keytab file to /etc/security/keytabs/

  4. Copy the vora.client.keytab file to all host's directory /etc/security/keytabs/



Step 1: Create jaas.conf file


On each Vora host, create the file /etc/vora/jaas.conf with the following contents. Take note to substitute <REALM> with your corresponding Kerberos realm name

vora {

 com.sun.security.auth.module.Krb5LoginModule required

 useKeyTab=true

 keyTab="/etc/security/keytabs/vora.client.keytab"

 storeKey=true

 useTicketCache=false

 principal="vora@<REALM>"

 doNotPrompt=true;

};


 

Notes:

  • The jaas.conf file must be created on all Vora hosts

  • The jaas.conf file must be accessible/readable by the OS-user vora



Step 2: Enable Authentication for Default HDFS


Using Amabri or Cloudera cluster manager set the following HDFS service parameters for core-site.xml

vora.security.kerberos.hdfs.principal = vora
vora.security.kerberos.hdfs.keytab.path = /etc/security/keytabs/vora.client.keytab



Step 3: Configure Kerberos settings in spark-env.sh


Using Amabri or Cloudera cluster manager modify spark-env.sh to include the following environment variable definition:

In Cloudera this setting is defined under the Spark configuration section -> Spark Service Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-env.sh

In Ambari this setting is defined under Spark configuration -> Advanced spark-env


V2_AUTH_CONFIG='{
   "auth_type": "KERBEROS",
   "components": [{
          "kerberos": {
            "keytab": "/etc/security/keytabs/vora.service.keytab",
            "principal": "vora"
        },
           "name": "CAUTH_SERVER"
 }, {
        "kerberos": {
            "keytab": "/etc/security/keytabs/vora.service.keytab",
            "principal": "vora"
        },
          "name": "CAUTH_CLIENT"
   }, {
          "kerberos": {
             "keytab": "/etc/security/keytabs/vora.client.keytab",
             "principal": "vora"
          },
          "name": "JAUTH_CLIENT"
    }]
}'
export V2_AUTH_CONFIG

 

Step 4: Configure Kerberos settings in spark-defaults.conf


Using Amabri or Cloudera cluster manager add the following parameters to spark-defaults.conf.

In Cloudera these parameters must be defined in Spark Client Advanced Configuration Snippet (Safety Valve) for spark-conf/spark-defaults.conf



In Ambari each parameter gets its own field:



spark.v2server.principal=vora

spark.jdbcvora.authenticate=KERBEROS

spark.executorEnv.V2_AUTH_CONFIG={'auth_type': 'KERBEROS', 'components': [{'kerberos': {'keytab': '/etc/security/keytabs/vora.service.keytab','principal': 'vora'}, 'name': 'CAUTH_SERVER'}, {'kerberos': {'keytab': '/etc/security/keytabs/vora.service.keytab','principal': 'vora'}, 'name': 'CAUTH_CLIENT'}, {'kerberos': {'keytab': '/etc/security/keytabs/vora.client.keytab','principal': 'vora'}, 'name': 'JAUTH_CLIENT'}]}

spark.driver.extraJavaOptions=-Djava.security.auth.login.config=/etc/vora/jaas.conf

spark.executor.extraJavaOptions=-Djava.security.auth.login.config=/etc/vora/jaas.conf


 

 


Step 5: Enable Kerberos authentication between Vora components


In Vora-Manager UI, configure Vora Tools to have the following settings:

  • Set Kerberos principal to the client principal (e.g. vora)

  • Set Kerberos principal of Hive Thrift Server 2 to the service principal (e.g. vora)

  • Set Kerberos keytab to the client keytab /etc/security/keytabs/vora.client.keytab

  • Set authentication type to KERBEROS


 

In Vora-Manager UI, configure Vora Thriftserver to have the following settings:

Set the extra arguments field to have the following string (copy/paste the whole block):

--hiveconf spark.jdbcvora.authenticate=KERBEROS
--hiveconf hive.server2.enable.doAs=false
--hiveconf hive.server2.authentication=KERBEROS
--hiveconf hive.server2.authentication.kerberos.principal=vora/_HOST@AD.HADOOP
--hiveconf hive.server2.authentication.kerberos.keytab=/etc/security/keytabs/vora.service.keytab
--principal vora
--keytab /etc/security/keytabs/vora.client.keytab


Notes:

  • The _HOST syntax is intentionally hardcoded, this will automatically get the FQDN regardless of where the thriftserver is hosted

  • The hive.server2 parameters are documented in Vora installer guide section 4.1.11

  • The use of --hiveconf arguments is an alternative way of defining the hive.server2 parameters which are normally defined in the /etc/vora/hive-site.xml file. In other words, if hiveconf arguments are not defined directly in Vora-Manager then they must be defined /etc/vora/hive-site.xml on the host where Vora Thriftserver will be running. Otherwise an error will occur during the startup of Thriftserver.




 

Finally, for all services except...

  • Vora Tools

  • Vora Thriftserver


... perform the following configuration steps:

  • Set the Kerberos principal to be your service principal name (e.g. vora)

  • Set keytab path to /etc/security/keytabs/vora.service.keytab

  • Set authentication type to KERBEROS




 

Step 6: Start all Vora services


Start Vora services using the "Start all" button in the upper-left corner of Vora Manager UI. This may take a few minutes.



 

Step 7: Validate the cluster


Use the hdfs dfs -put command to upload a test.csv file into /user/vora in HDFS with the following contents:
1.1,2,Hello
2,3.4,World

Then either directly via spark-shell or using Vora Tools SQL editor check that all three SQL statements execute successfully:
CREATE TABLE helloWorld(a1 double, a2 int, a3 string)
USING com.sap.spark.engines.relational
OPTIONS (files "/user/vora/test.csv");

SHOW TABLES USING com.sap.spark.engines.relational;

SELECT * FROM helloWorld;
3 Comments