Skip to Content

In this blog post I will describe how to run SparkR together with SAP HANA Vora.

First you have to download and install on each node where Vora is installed.

The following should work to make it running with Vora and Redhat 7.2 (administrator priviledges and access to RedHat repositories required):

On every node, do (in bash):

> sudo yum update

> sudo yum install R

After you have successfully installed the R package, you need to install SparkRVora which is delivered with Vora:

> cd $VORA_SPARK_HOME/R 

> sudo bash install-dev.bash

(The environment variable VORA_SPARK_HOME should already be set in your installation – it points to the directory where the Vora/Spark packages are installed. For example, when using the ambari cluster manager it should look be something like “var/lib/ambari-agent/cache/stacks/HDP/2.4/services/vora-manager/package/lib/vora-spark” )

 

To run R together with Spark and Vora you have the following options:

1. Run SparkR directly

Using the sparkR executable is similar to using spark-shell or pySpark.

Execute as user “vora” (replace 1.X.YY by the version number of your Vora installation, e.g. 1.3.88 ):

> sparkR –jars ${VORA_SPARK_HOME}/lib/spark-sap-datasources-1.X.YY-assembly.jar

After you launched SparkR, you need to provide access to R library and SparkRVora package (in sparkR):


> library(SparkRVora, lib.loc = c(file.path(Sys.getenv(“VORA_SPARK_HOME”),”R”,”lib”)))

> sqlCtx <- sparkRVora.init(sc)

16/08/17 19:26:09 INFO SapSQLContext: SapSQLContext [version: 1.3.88] created


2. Run SparkRVora within RStudio or plain R

In order to run VoraSparkR from within R/RStudio, the SparkR library has to be loaded in addition and prior to SparkRVora (in R):

> library(SparkR, lib.loc = c(file.path(Sys.getenv(“SPARK_HOME”), “R”, “lib”)))

> library(SparkRVora, lib.loc = c(file.path(Sys.getenv(“VORA_SPARK_HOME”),”R”,”lib”)))

> sc <- sparkR.init(master=”local[*]”, sparkJars=c(file.path(Sys.getenv(“VORA_SPARK_HOME”), “lib”, “spark-sap-datasources-1.3.88-assembly.jar”)))

> sqlCtx <- sparkRVora.init(sc)

Now you are ready to use SparkR with Vora.

For example, the following lines can be used to inspect a table from the VORA database:

> sql(sqlCtx, ‘REGISTER ALL TABLES USING com.sap.spark.vora OPTIONS (eagerLoad “false”)’) #this registers all tables from VORA catalogue in the SparkSQL context

> df <- sql(sqlCtx, ‘SELECT * FROM NATION’) #display table content (assuming that the NATION table exists in catalogue)

> print(df) #this shows table schema

> head(df, 10) #this shows first 10 rows of data

As another example (see screenshot below), we create a table and load it from a file in HDFS, and plot the “val” column over the “id” column.

> sql(sqlCtx, ‘CREATE TABLE VoraTable(id integer, val double, mdate date, number decimal(6,2)) USING com.sap.spark.vora OPTIONS (files “/user/vora/test_int_double_date_dec2.csv”)’) #load table from a CSV-File on HDFS

> sparkDataFrame <- sql(sqlCtx, ‘SELECT * FROM VoraTable’)

> RDataFrame <- as.data.frame(sparkDataFrame) #convert to R dataframe

> plot(RDataFrame$id, RDataFrame$val) #plot variables

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply