Big Data Tools(Like Hadoop Framework) Integration with SAP
Hadoop, Bigdata, SAP HANA are all some of the buzz words in the Data management/ Enterprise data ware housing space.
SAP has been working to ensure that there is a good decent integration between SAP Analytical tools and the big data frameworks like Hadoop.
For a POC, we are trying to leverage various integration options between SAP and Hadoop and through this document, I would like to share with you the integration options that we have seen till now.
Our Developments and configurations are still in very early stage. So, mostly I would be taking you through the theoretical part only. The actual lessons learned, the real pain points, the challenges, the limitations – all will be added to the document wherever possible at later stages.
Would certainly like to hear from you as well, in case if you manage to find more integration options.
I am lazy person and hence would like to loop in various links in between, which explains more on that particular topic.
So, let’s see in detail;
1) Using BODS (Business Objects Data services)
SAP HANA Academy – Using Data Services to import data from a HADOOP system — https://www.youtube.com/watch?v=ls_MGp8R7Yk
The above HANA Academy video explains the connectivity in detail.
In BODS, we have format named HDFS Files.
We just have to give the name node host and name node Port details initially and further we have to provide the Root directory and File name.
Name Node –> The Name Node is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself.
We have some Pig scripting related options aswell.
Pig is a high level scripting language that is used with Apache Hadoop. Pig enables data workers to write complex data transformations without knowing Java. Pig’s simple SQL-like scripting language is called Pig Latin, and appeals to developers already familiar with scripting languages and SQL.
Sample Use case:
In BODS, we can have to basically create a project –> Create a Job –> Create a Dataflow –> Drag in HDFS file as the source –> Add a Query transformation –> Create a HANA data store as the target.
2) SAP VORA
SAP HANA Vora is a new in-memory query engine which leverages and extends the Apache Spark execution framework to provide enriched interactive analytics on Hadoop.
John Appleby has given a lot of information in the following blog:
The following are the key features of VORA.
SAP HANA Vora includes a unique set of features and capabilities:
• An in-memory query engine that runs on Apache Spark execution framework
• Compiled queries for accelerated processing across Hadoop Distributed File System (HDFS) nodes
• Enhanced Spark SQL semantic to include hierarchies to enable OLAP and drill-down analysis
• Enhanced mashup application programming interface (API) for easier access to enterprise application data for machine learning workloads
• Support for all Hadoop distributions
• An open development interface.
Lot of VORA topics can be seen from the HANA Academy videos:
More details can be found in this document:
3) Universe IDT –> Connection to Hadoop JDBC Drivers
Following are three of the wonderful documents (though a little old) that explains this integration in detail
In the following Wiki, Jacqueline Rahn, has quite extensively explained the connection of Hadoop Hive with IDT
Obviously, if we are able to reach up to universe level, then we can further take the same to the various BO reporting tools/dashboards.
4) Hadoop connectivity using SDA
SDA is a new method of SAP HANA for accessing the data stored in remote data source.
*Here we can see an adapter with the name HADOOP (ODBC)
Leo has explained the details in the following blog:
Debajit has explained in detail on the SDA access using Hive/Hadoop in the following document:
Lot of Hadoop/Hive/Spark/SDA topics can be seen from the HANA Academy videos:
5) Hadoop connectivity using Lumira
Please find some useful links here:
The following document shows us the connection using “Open with SQL” Method.
These days we can observe a direct connectivity to Hadoop:
This is a Collaborative document. My humble request to all of you is to add more points/options to this document. Let us all work together and make this a very useful repository which talks in and out of Bigdata Hadoop like framework integration with our SAP Tools.
Once again thanks a lot for reading my document.