Skip to Content
Author's profile photo Former Member

Apache Hadoop as NLS solution for SAP BW/HANA Part 1

This is a 3 part series of blogs covering the end to end integration scenario for SAP BW on HANA and Apache Hadoop as a Near Line Storage solution.

For part 2 : Here

For Part 3: Coming Soon


Apache Hadoop has become the poster child for big data largely due to its high scalability analytics platform capable of processing large volumes of structured and unstructured data. SAP HANA on the other hand, has gained ground as the leading in memory data analytics platform that lets you accelerate business processes and deliver quantifiable business intelligence at lightening speed. Both these database platforms are independent of each other and have pros and cons which make them a perfect fit for a long term sustainable high performance data lake strategy for any large multinational corporation.

This blog is intended as a walk through in implementing Apache Hadoop as a Near Line Storage Solution for SAP HANA leveraging SAP Spark Controller. For the sake of this blog, we will work with the below versions and software products :

SAP BW 7.5 SPS 5

SAP HANA 1.0 SPS 12 and higher

Core Apache Hadoop version 2.7.1 or higher (HDFS, MapReduce2,YARN)

Tez 0.7.0 as execution engine for Hive (Instead of MapReduce 2 if preferable)

Spark 1.5.2 or higher

SAP HANA Spark Controller 2.0 SP01 Patch 1 or higher

SAP recommends these as the base line requirements but I have come to believe through experience that these versions work very well with each other in terms of dependencies and interoperability. Both Hortonworks (HDP) and Cloudera (CDH) provide packaged Apache platforms which provide the above versions. I personally did not work with MapR so I am not aware if they do as well, but I am sure there should be something available through them which works well together with SAP.


Hadoop Cluster Architecture and Sizing:

In case you are considering a POC, my recommendation would be to go with at least a 3 node Hadoop cluster with 1 Namenode and 2 Datanodes. This will give the administration team a good feel of the production cluster in terms of setup and administrative duties. Apache Hadoop is a multi-component solution and as such, the fine tuning and configuration aspects are fairly diverse and yet interdependent. The overall architecture is depicted below at a high level.


Hadoop 3 Node Cluster:



SAP BW on HANA, Spark Controller and Hadoop:




Near Line Storage:




Please refer to vendor documentation for Hadoop sizing. For the sake of reference, below is the cluster sizing that I used for the proof of concept:

We went with a virtualized cluster for the POC.


Hadoop Installation:

Depending on the flavor of Hadoop chosen for the POC, you can install the Hadoop cluster through Apache Ambari or through the Cloudera Manager. The links for the detailed step by step installations are below:

Apache Ambari

Cloudera Manager


Coming up: Apache Hadoop as NLS solution for SAP HANA Part 2

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Former Member
      Former Member

      Thank for share!

      Author's profile photo Former Member
      Former Member
      Blog Post Author


      You are welcome Linda!

      Author's profile photo ANIRUDDHA SHINDE


      Hi Shantanu , how are you storing SAP BW Relational data into HADOOP , are you using HBase to store that relational data into Column oriented data stores?

      What criteria you are using to store warm and cold data into Hadoop.

      Keen to know in next parts too

      How is relational integrity maintained in Hadoop.?

      Author's profile photo Former Member
      Former Member
      Blog Post Author

      Hi Aniruddha,

      Thank you for your question. It is important to understand that Hadoop is not a relational database. It is in essence a file system storage. The relational nature is achieved through add-on products such as Apache Hive, Spark etc. The HDFS files, are linked to the BW ADSO through a HIVE table (metadata) and a Virtual View in HANA. I have update the link to the second part in the above blog which delves a bit into this topic.

      For determining which data is hot vs warm vs cold, this is driven by business. The idea is to segregate data based on reporting needs.


      Author's profile photo Roland Kramer
      Roland Kramer


      I have added your Guidance to the SAP NLS Blog unter Implementation -

      Thanks and Best Regards
      Roland Kramer, PM SAP EDW, SAP SE

      Author's profile photo Former Member
      Former Member
      Blog Post Author

      Thank you Roland!. Hope this helps others! I will be adding Blog number 3 soon.

      Author's profile photo Marcelo Viveiros
      Marcelo Viveiros

      thank you for share

      Author's profile photo Michael Fluch
      Michael Fluch

      nice blog! would it be possible to set up the szenario for testing / demo purpose also on a from cloudera provided quickstart VM / on a pseudo distributed single-node-cluster? Or is an actual cluster installation necessary?

      Asking because I tried to install SAP Spark Controller on such a clouder quickstart VM but was not successful. At least not with the standard SAP installation guide.

      Thanks a lot!

      Author's profile photo Shantanu Sardeshmukh
      Shantanu Sardeshmukh

      Had to move this to Linkedin since my account was deactivated.