Skip to Content
Author's profile photo Rohit Gurejala

Hana Hadoop Integration with Federated Access

Data Lake analytics have become real. But the challenge is to access the data quickly and provide meaningful insights. There are several techniques to access the data faster. In this blog we see how we can integrate Hana with Hadoop to get insights for larger data sets quickly.

If you have both Hana & Hadoop in your eco-system. SAP has provided an option to integrate HANA and Hadoop using Hana Spark controller. Where power of In-Memory processing can be used for real time insights and we can in parallel use Hadoop ability to process huge data sets.

For driving business decisions in real time with operational business data and with big data from various sources have become competitively necessary. Hana Spark controller can be used for integrating Hana and Hadoop where HANA in-memory capability can be used for real time analytics combining with Hadoop ability of process volumes of data in cost effective manner. Hana Hadoop integration with HANA spark controller gives us the ability to have federated data access between HANA and hive meta store. In this blog we will see this capability with a simple example.

The basic use case is the ability to use Hadoop as a cold data store for less frequently accessed data. The tool that enables it is the Data Lifecycle Manager (DLM), a component of SAP HANA’s data warehouse foundation. The data tiering enables us for bi-direction data movement between HANA and Hadoop.


We will now see the overview on how we should establish connection to Hadoop cluster from HANA and then we will see an example to have federated access between HANA and Hadoop.

Integration Steps


Install HANA spark controller on Hadoop cluster.


  •  Almost all distributions are supported ( Hortonworks, Cloudera, Mapr ).


  • HANA Version 1.0 SP12 and 2.0 are supported for this installation. If you would like to create virtual spark procedures then you must be on HANA 2.0.


  • Below link provides the installation pre-requisites.


  • Installation steps are guided in the Install guide.


I will document the installation steps for Spark controller on Hadoop cluster in my next blog.


Establish Smart data connection between Hana and Hadoop.


  • Once the installation of spark controller on Hadoop cluster is installed and started. We need to establish Smart data connection between HANA and Hadoop .
- If you have LDAP based authentication to Hadoop Cluster

CREATE REMOTE SOURCE "spark_demo" ADAPTER "sparksql"       
	CONFIGURATION 'server=<x.x.x.x>;port=7860;ssl_mode=disabled'        
	WITH CREDENTIAL TYPE 'PASSWORD' USING 'user=<user name>;password=<password>';

- If you have Kerbros authentication to Hadoop cluster

CREATE REMOTE SOURCE "spark_demo" ADAPTER "sparksql"
	CONFIGURATION 'server=<x.x.x.x>;port=7860;ssl_mode=disabled'


  • You can verify the created remote source connectivity.

We can see connection successfully established. We will now look at an example to have federated data access between Hana and Hadoop.


Federated Data access between Hana and Hadoop

In this example, we will see how to fetch data from Hana and Hadoop in parallel with a small example. Below are the steps to achieve this.

  1. Create a sample table in Hana.
  2. Create a sample table in Hive.
  3. Create virtual spark procedure in Hana to fetch data from hive.
  4. Create another procedure to union Hana data results with Hive results to display combined results.


1) Create sample table in Hana

We have created below table in Hana and inserted few records.




2) Create Sample Table in Hive 

hive> show create table hana_hive_team;
CREATE TABLE `hana_hive_team`(
  `emp_id` int,
  `emp_name` string,
  `join_date` timestamp,
  `country` string)


Below is the sample data i inserted into table default.hana_hive_team


3) Create virtual spark procedure in Hana to fetch data from hive.

Below is the sample code for virtual procedure written in Hana to read data from hive table default.hana_hive_team.

Note: We are not creating any virtual tables we have written a spark sql using scala programming.

For scala programming reference click here.





import sqlContext.implicits._
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql._
import scala.collection.mutable.WrappedArray

val result = sqlContext.sql("select * from default.hana_hive_team")
 TEAM_OUT = result



Below is the output of the virtual spark procedure, triggered from HANA. Output from Step 2 & 3 would be same.



4) Create another procedure to union Hana data results with Hive results.

To access Hana and Hadoop in parallel and display combined results with a union. Below is the sample procedure.


-- Reading Hive data

-- Storing Hive Data to temporary table
TEAM_OUT1 = select emp_id, emp_name, join_date, country from :a;

-- Joining Hive data with HANA Data
TEAM_OUT = select emp_id, emp_name, join_date, country from SYSTEM.TEAM 
union all 
select emp_id, emp_name, join_date, country from :TEAM_OUT1;



Output of the procedure combine hana and hive data results.



Done. This is how we can access Hana and Hadoop data sets in parallel using Hana spark controller.


– Rohit Gurejala

Assigned Tags

      1 Comment
      You must be Logged on to comment or reply to a post.
      Author's profile photo Srinivasa Narasimhan
      Srinivasa Narasimhan

      Hi Rohit Thanks for the blog.

      Need you help/suggestion.

      In my project I want to export the  data from SAP S/4 HANA / HANA DB to HADOOP based on the daily update from the few SAP tables . this flow may be online / by schedule . we are using HDB ( 2.00.037) how to achieve this?