Skip to Content

Creating Remote Sources and Virtual Tables in HANA to Hive and Vora can be accomplished using HANA Studio to create remote sources and virtual tables, but what about using DDL?

There are 3 types of connections that can be created from HANA to Vora or Hive using a Remote Source.  Two use the SparkController, while the third uses the newly available HANA Wire Protocol.

SparkController 2.0 can support HANA 1.0 SPS11 or 12 and HANA 2.0 and access Vora 1.2 or 1.3 or Hive on a Hadoop Cluster.

The use of the HANA Wire Protocol requires Vora 1.3 and the enablement of the wire protocol support in the Vora Transaction Coordinator service.  (As of this writing, only Vora Disk tables can be accessed using the HANA Wire Protocol connection, not in-memory tables).

For these examples, we will be using HANA 12 SPS122.5, Vora 1.3.65 and the SparkController 2.0.0.

Details on configuring the SparkControllers for this exercise can be found in: https://blogs.sap.com/2016/10/13/configuring-multiple-sparkcontrollers-ambari-hana-dlm-hive-vora-access/

First things first, let’s set up some simple test data.

  1. Log into a console and create the test data:
    >su - vora
    >echo 1,2,Hello > test.csv
    >hadoop fs -put test.csv
    >hadoop fs -ls /user/vora
    [...]
    -rw-r--r--   3 vora hdfs         10 2017-01-06 21:12 /user/vora/test.csv
    [...]
    

    The test data is now in the local file system and hdfs.

Now we need to load this into Hive and Vora.  We can run everything from the Vora spark shell:

>$VORA_SPARK_HOME/bin/start-spark-shell.sh
....
SQL context available as sqlContext.
scala>
  1. First we will create a Hive ‘test’ table in the ‘default’ Hive database:
    scala> import org.apache.spark.sql.hive.HiveContext
    import org.apache.spark.sql.hive.HiveContext
    
    scala> val hc = new HiveContext(sc)
    hc: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@457512b
    
    scala> hc.sql("CREATE TABLE IF NOT EXISTS test(a1 INT, a2 INT, a3 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")
    res0: org.apache.spark.sql.DataFrame = [result: string]
    
    scala> hc.sql("LOAD DATA LOCAL INPATH '/home/vora/test.csv' INTO TABLE test")
    res1: org.apache.spark.sql.DataFrame = [result: string]
    
    scala> val result = sqlContext.sql("FROM test SELECT a1, a2, a3")
    result: org.apache.spark.sql.DataFrame = [a1: int, a2: int, a3: string]
    
    scala> result.show()
    +---+---+-----+
    | a1| a2|   a3|
    +---+---+-----+
    |  1|  2|Hello|
    +---+---+-----+
    
    scala> sys.exit
  2. Now, let’s load the same data into Vora in-memory engine:
    scala> import org.apache.spark.sql.SapSQLContext
    import org.apache.spark.sql.SapSQLContext
    
    scala> val vc = new SapSQLContext(sc)
    vc: org.apache.spark.sql.SapSQLContext = org.apache.spark.sql.SapSQLContext@7a6ccc46
    
    scala> val testsql = """
    CREATE TABLE table001 (a1 double, a2 int, a3 string)
    USING com.sap.spark.vora
    OPTIONS (
    files "/user/vora/test.csv"
    )"""
    
    testsql: String =
    "
    CREATE TABLE table001 (a1 double, a2 int, a3 string)
    USING com.sap.spark.vora
    OPTIONS (
    files "/user/vora/test.csv"
    )"
    
    scala> vc.sql(testsql)
    res0: org.apache.spark.sql.DataFrame = []
    
    scala> vc.sql("show tables").show
    +---------+-----------+
    |tableName|isTemporary|
    +---------+-----------+
    | table001|      false|
    +---------+-----------+
    
    scala>
    
  3. and for the last table, we will load the data into the Vora disk engine:
    scala> val testsql = """
    CREATE TABLE TABLE002 (A1 double, A2 int, A3 string)
    USING com.sap.spark.engines.disk
    OPTIONS (
    files "/user/vora/test.csv",
    csvdelimiter ",",
    format "csv",
    tableName "TABLE002",
    tableSchema "A1 double, A2 integer, A3 varchar(10)",
    storagebackend "hdfs"
    )"""
    testsql: String =
    "
    CREATE TABLE TABLE002 (A1 double, A2 int, A3 string)
    USING com.sap.spark.engines.disk
    OPTIONS (
    files "/user/vora/test.csv",
    csvdelimiter ",",
    format "csv",
    tableName "TABLE002",
    tableSchema "A1 double, A2 integer, A3 varchar(10)",
    storagebackend "hdfs"
    )"
    
    scala> vc.sql(testsql).show
    ++
    ||
    ++
    ++
    
    scala> vc.sql("select * from TABLE002").show
    +---+---+-----+
    | A1| A2|   A3|
    +---+---+-----+
    |1.0|  2|Hello|
    +---+---+-----+
    
    scala> sys.exit
    >
    
    

Now we have 3 tables loaded into various datastores, we can switch to HANA Studio, but as we are using DDL, we can also use hdbsql.

We will use the SparkSQL HANA adapter to connect to the SparkControllers for Hive and Vora in-memory access.

Creating the Hive remote source

  1. Start HANA Studio and navigate to the server you want to use.  Open a SQL Console window.  To create the Hive Remote source, issue the following command (using the DSN name of your Hive-configured SparkController):
    CREATE REMOTE SOURCE "SparkHive" ADAPTER "sparksql"       
    CONFIGURATION 'port=7860;ssl_mode=disabled;server=<HiveSparkControllerServerDNS>'        
    WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=hanaes;password=hanaes'
    
    Statement 'CREATE REMOTE SOURCE "SparkHive" ADAPTER "sparksql" CONFIGURATION ...' 
    successfully executed in 107 ms 32 µs  (server processing time: 3 ms 655 µs) - Rows Affected: 0 

Creating the Vora remote source

  1. Issue the following in the HANA Studio SQL Console to create a Vora remote source using the Vora-configured SparkController:
    CREATE REMOTE SOURCE "SparkVora" ADAPTER "sparksql"       
    CONFIGURATION 'port=7860;ssl_mode=disabled;server=<VoraSparkControllerServerDNS>'        
    WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=hanaes;password=hanaes
    
    Statement 'CREATE REMOTE SOURCE "SparkVora" ADAPTER "sparksql" CONFIGURATION ...' 
    successfully executed in 114 ms 748 µs  (server processing time: 3 ms 398 µs) - Rows Affected: 0 

Lastly, we will use the new HANA Wire Protocol to access the table in the Vora disk store.
Creating the HANA Wire remote source

  1. First, Vora has to be configured to use the Wire.  Log into the Vora Manager GUI:http://<Vora Manager Node:19000/vora-manager/web/select ‘Services’ and open the ‘Vora Transaction Coordinator’ service ‘Configuration’ tab.  Ensure that ‘HANA Wire activation’ is selected and that an ‘Instance number for Vora Transaction Coordinator’ has been assigned:
  2. From the HANA Studio SQL Console window, you can now issue the following command (replacing the <TC Server DNS Name> with the server name of your Vora Transaction Coordinator service and the <TC HANA Wire Port> with 30<Instance number for Vora Transaction Coordinator>5 e.g. 30115):
    CREATE REMOTE SOURCE "VoraWire" ADAPTER "voraodbc"
    CONFIGURATION 'ServerNode=<TC Server DNS Name>:<TC HANA Wire Port>;Driver=libodbcHDB'
    WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=hanaes;password=hanaes';
    
    Statement 'CREATE REMOTE SOURCE "VoraWire" ADAPTER "voraodbc" CONFIGURATION ...' 
    successfully executed in 105 ms 922 µs  (server processing time: 3 ms 317 µs) - Rows Affected: 0 

Refreshing and Expanding all the folders underneath the Provisioning folder should show the new remote datasources and the tables available for virtual table creation in HANA:

Any of these tables can be created in HANA as a virtual table for access by HANA by selecting the table and using a right-mouse click and creating the table in a HANA schema:

but we will use DDL to create these tables instead.

  1. From the SQL Console, issue the following DDL commands for each table:
    create virtual table "<schema>"."Hive_test"
    at "SparkHive"."hive"."default"."test";
    
    create virtual table "<schema>"."Vora_table001"
    at "SparkVora"."vora"."spark_vora"."table001";
    
    create virtual table "<schema>"."VoraWire_TABLE002"
    at "VoraWire"."voraodbc"."VORA"."TABLE002";
    
    --I used my default "SYSTEM" schema in place of "<schema>"
    
    Statement 'create virtual table "SYSTEM"."Hive_test" at "SparkHive"."hive"."default"."test"' 
    successfully executed in 1.618 seconds  (server processing time: 1.511 seconds) - Rows Affected: 0 
    
    
    Statement 'create virtual table "SYSTEM"."Vora_table001" at "SparkVora"."vora"."spark_vora"."table001"' 
    successfully executed in 2.096 seconds  (server processing time: 1.964 seconds) - Rows Affected: 0 
    
    
    Statement 'create virtual table "SYSTEM"."VoraWire_TABLE002" at "VoraWire"."voraodbc"."VORA"."TABLE002"' 
    successfully executed in 2.451 seconds  (server processing time: 2.342 seconds) - Rows Affected: 0 
    Duration of 3 statements: 6.166 seconds 

The tables will now appear in your Catalog/<schema>/Tables folder as virtual tables:

You can now use them as you would any other virtual table.

As of this writing, the following notes apply:

  • The HANA Wire activation only supports tables cataloged in the Vora disk engine.
  • Re-starting Vora will require the Vora disk tables to be re-loaded (see the ‘SAP HANA Vora Developer Guide’).
  • As you can see from step 4, cataloging a Vora disk table in HANA requires the table name in Uppercase and additional tableSchema options to allow HANA to use HANA datatypes instead of Vora/Spark datatypes (e.g. Varchar(x) instead of string)

Chris

To report this post you need to login first.

2 Comments

You must be Logged on to comment or reply to a post.

  1. Abhay Mhatre

    Hi Chris,

    Very good article on how to create virtual tables with HANA Vora. What do you suggest how this configuration can be transported from Dev to QA and then to Production (specifically creating the Virtual Tables)?

    If the Virtual tables are created in specific project schema (not in SYSTEM or any developer’s schema), what roles other/business users need who would be using the data from the Virtual Tables?

     

    Thanks,

    Abhay

    (0) 
  2. Shailesh Moholkar

    Hi Chris,

    Wonderful blog! Thanks so much for taking time to put this up in so simple and detailed steps .

    One question , I keep reading of SparkController and Vora adapter as two ways of connection between HANA and VORA . Any specific difference in these two , as in, does SparkController gives more features while accessing data ( complex SQL. procedure to relocate for execution) .

    Though you have used the voraodbc for wire remote source. I believe it can be used just like that normally too…

    so when should we go for vora adapter or for Sparkcontroller. Can you advise on the parameters to decide this….

    Thanks a lot in anticipation !   Also please correct me if my basic understanding is flawed –through  this question — I know I am thinking linear here 😉

    Thanks and kind regards,

    Shailesh

    (0) 

Leave a Reply