Integrating SAP HANA with Hadoop – all you always wanted to know
After publishing a post called 3 tips to survive as a SAP BI consultant in 2016 and another one called Enterprise Data Warehouse + SAP Vora = High Value, I got a great question from Sivaramakrishnan M basically asking for some documents or links that could help him get started with SAP HANA and SAP Vora integration. After a quick search I realized it was not really easy to get hold of all information you need, so here it is: all you always wanted to know about SAP HANA and Hadoop, but was afraid to ask.
Before we get started on the link material, we can benefit from some context. I’m assuming you already know what SAP HANA is. SAP In-Memory plataform that allow real time transactional and analytical processing, composed of several different enginnes (Geospacial, Predictive, Business Function Library and so on) which allowed for a huge transformation on what we are able to do. More information can be found here at SCN: SAP HANA and In-Memory Computing
According to Hortonworks, the definition of Hadoop is: “… an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly gain insight from massive amounts of structured and unstructured data”. And for those really interested on how Hadoop (Hortonworks) architecture looks like, I’d suggest the following link: http://hortonworks.com/hadoop/ and there you will find the following image:Having the concepts of each component clear, next step is to define what the integration between those two components could look like. It basically will depend on the use case you have:
- Smart Data Access –> in case you need to read data out of Hadoop, you can use SAP HANA Smart Data Access (SDA) to do it. SDA is widely used when it comes to hybrid models (SAP HANA + SAP NetWeaver BW powered by SAP HANA) or even Near Line Storage (NLS) scenarios. You can basically access a “table” in a different repository (mainstream databases all included) from SAP HANA without actually having to bring the data over to SAP HANA. So you could have your “hot” data in SAP HANA and your cold data into Hadoop and using SDA a simple UNION would bring data from both “tables” together.
- SAP BusinessObjects Universe –> in case you only need to report in Hadoop data out of SAP BusinessObjects Suite, you can combine data from any source to Hadoop using the Universe, SAP BusinessObjects semantic layer to get the job done. There you can setup relationships, rules, etc.
- SAP DataServices 4.1 (and above) –> in case you really need to bring data from Hadoop into SAP HANA and maybe apply some heavy transformation on the way, that is your path to go. SAP DataServices have been tunned to been able to read and write huge amount of data both ways.
- SAP Lumira –> in case you only need front-end intregration and less complex data handling and transformation, that is a easy way to go. SAP Lumira can access and combine data from Hadoop (HDFS Data Set, Hive or Impala Data Set or a SAP Vora Data Set) and SAP HANA.
- SAP Vora –> in case you need to correlate Hadoop and SAP HANA data for instant insight that drives contextually-aware decisions that can be processes either on Hadoop or in SAP HANA
With all those use cases in mind, Hortonworks draw a great picture of how the architecture could look like:
With all that clear, I believe we can jump directly to the main topic of this post. Please, find below usefull links and its descriptions to bring you up to speed when integrating SAP HANA with Hadoop.
Descriptions | Content |
---|---|
Hadoop and HANA Integration Overview | Hadoop and HANA Integration |
How to Use Hadoop with Your SAP® Software Landscape from a CIO viewpoint | /How to Use Hadoop with Your SAP® Software Landscape |
Different methods of integrating SAP HANA with Hadoop | http://hortonworks.com/partner/sap/ |
SAP Press reference book | /Integrating SAP HANA and Hadoop |
SAP Help Vora landing page | SAP HANA Vora 1.1 – SAP Help Portal Page |
SAP HANA Data Warehousing Foundation 1.0, integrate Hadoop into your SAP HANA model to cold (not frequently used) data | SAP HANA Data Warehousing Foundation 1.0 – SAP Help Portal Page |
How to start SAP HANA Spark Controller | Start SAP HANA Spark Controller – SAP HANA Administration Guide – SAP Library |
Calling a Hadoop Map Reduce function from SAP HANA | Creating a Virtual Function – SAP HANA Administration Guide – SAP Library |
Adding Ambari to your SAP HANA Cockpit. Once you integrate SAP HANA and Hadoop, would be pretty smart to manage everything in one shop | Adding Ambari URL to SAP HANA Cockpit – SAP HANA Administration Guide – SAP Library |
How to go from ZERO to a working application using SAP Lumira, SAP HANA and SAP Vora with Hadoop in 12 steps |
|
SAP HANA Vora and Hadoop by Stephan Kessler & Óscar Puertas at Big Data Spain 2015 | |
How to get access to SAP Vora Development Edition | |
SAP HANA Integration with Hadoop using SDI (Smart Data Integration) to power Smart Forms | |
SAP HANA Integration with Hadoop using SDA (Smart Data Access) | |
SAP HANA Integration with Hadoop using SAP Data Services | |
SAP HANA VORA & Hadoop | SAP HANA VORA & Hadoop |
I’m very confident that once you reach the bottom of this post (and visit all or at least most of the links I have compiled here) you will be able to get your SAP HANA and Hadoop integration going. In case you have additional links that I should include on this post, please, let me know via comments and I’ll be more than glad to add them here.
All the best,
Very useful article . Thanks for sharing this information.
Regards,
Igor
I'm glad you think so, thanks for the feedback.
Very nice SAP HANA Vora landing page. Great job putting it all together.
Thanks,
Tiago
HI Tiago, thanks for the feedback
Great material!
Already studying it!
KR,
Nicolas
Glad it helps 🙂
Hi Edu,
great contributions and a most welcome entry point to a wide (or: big) field.
As you mentioned, there are many different ways HANA & Hadoop can work together. What about a use case suggestion to enter the game?
Best, Frank
wow always great to hear from you, I'm happy you enjoyed it. Great point, let me think it through and maybe I can put it in a new article 🙂
Great article Eduardo!
Thanks, Hage
Hi Eduardo, can you highlight the difference between HANA VORA and using HANA SDA to access a Spark SQL system?
Scenario; Using Lumira to consume data out of HANA, but data is growing too big, so what will be the difference between pushing that data down to any big data distribution then using Spark as SQL engine, will VORA and HANA SDA behave the same? different? how? and of course keep consuming the data out of Lumira
Very useful article indeed!! The link to SAP press reference book for integrating HANA and HADOOP is not opening. Could you please provide the correct url?
Thanks,
Arindam
This is very good start for studying it and very easily described. Could be also updated with some news or what was changed
Thanks
Kosto
outstanding blog. Keep it up!
Awesome information, thank you.
For those who would like to know about Big Data availability also in ABAP cockpit see this solution or check some of our blogs about SAP <-> Hadoop integration.
Have a nice day,
David