In this blog, I would like to share how Hadoop and HANA can be integrated with each other.
Lets start with advantages of using Hadoop:
It can easily handle huge amount of data volumes
It is very good for storing Unstructured data
It is reliable, scalable and fault tolerant
It is Open source so is less costly
It provides Batch Processing
Now Lets look at some of the limitations of Hadoop:
It is not efficient to use for small anmount of data
It is less mature
It is difficult to find qualified Talent
It is not suited for real time scenarios
HANA and Hadoop:
As you would already know by now that Hadoop can store very huge amount of data. It is well suited for storing unstructured data, is good for manipulating very large files and is tolerant to hardware and software failures.
But the main challenge with Hadoop is getting information out of this huge data in real time.
No we also have HANA and as you all already know that HANA is well suited for processing data in Real time.
So to get real time information from massive storage such as Hadoop, we can use HANA and HANA can be directly integrated to Hadoop.
So we can combine Hadoop and HANA to get real time information from huge data.
Read Solving Big Data with SAP HANA and Hadoop:
Watch the replay of SAP Big Data Chat on HANA and Hadoop:
Read Demystifying Big Data with SAP HANA and Hadoop:
Read Hadoop + SAP HANA: Turning Infinite Storage into Instant Insights:
SAP, Hadoop and HANA:
As explained in SAP CIO Guide on Using Hadoop, Hadoop can be used in various ways as mentioned below:
I have added Smart Data Access myself as it was not available at the time this guide was written but now we can use Smart Data Access to connect HANA with Hadoop.
Lets see how Hadoop can be used in SAP world:
Hadoop as a flexible data store:
As we know Hadoop is less costly so we can use Hadoop as a flexible data store by storing data from various sources including SAP and Non-SAP sources like Social data, streaming data, transaction data etc. By keeping all the data in Hadoop, we can get any information we want and can do any type of analysis.
Hadoop as a simple database:
We can also use Hadoop as a simple database for storing and retrieving data in very large data sets. We can retrieve data from Hadoop using Hive or HBase.
Hadoop as a processing engine:
We can use the power of MapReduce programming model for many purposes such as Pig can be used for Data Analysis and Mahout can be used for Data Mining. We can write MapReduce application code in language of our choice, which can be then arranged and executed on Hadoop.
Hadoop for data analytics:
We can use Hadoop for mining data held in Hadoop for business intelligence and analytics
We have huge amount of data in Hadoop but all of data is not useful as lot of data is a low value data – so we will load only useful data to HANA.
For loading data from Hadoop to HANA, we will use SAP Data Services.
You can check the below Youtube video on how to load data from Hadoop to HANA:
For getting more detail about the above scenarios, please refer to SAP CIO Guide on Using Hadoop.
Accessing Hadoop using Smart Data Access:
Smart Data Access is a new feature that was introduced with SAP HANA SPS06. It enables remote access to data as if they are local tables without copying the data into HANA .
One of the main benefits of Smart Data Access is that we don’t need any special syntax to access heterogeneous data sources.
Lets say we have structured data stored in HANA and unstructured data stored in Hadoop.
So now we can remotely access Hadoop data using Smart Data Access and combine both structured and unstructured data to create new models and get real insight to our business and make better decisions.
How this works:
Lets say we created a combined model using structured as well as unstructured data as told above and this model is available for reporting.
So now we will make request through our reporting tool, based on our request HANA will determine the best way to extract data(also determines where and how data will get processed based on optimum utilization of application and system resources.) and will send request to Hadoop.
To know more about Smart Data Access, check the below blogs:
You can also check the videos on how to use Smart Data Access at HANA Academy(HANA is connected to Sybase IQ using Smart Data Access):
Also check the below blog by Aaron on Streaming Real-time Data to HADOOP and HANA:
Check out this video on how Hadoop and HANA can work together by Intel:
Hadoop and HANA Use Cases:
1.) Genome Analysis:
MKI is using HANA with Hadoop to improve patient care in the realm of cancer research.
Genome analysis is the technique used to determine and compare the genetic sequence (e.g. DNA in the chromosomes).
Learn why HANA was selected for Real time Big Data Analysis to deliver advanced medical treatment
Check the below video:
Also Check out the below YouTube Video:
2.) Real Time Retail Point of Sales:
3.) Using Big Data In the Stadium to improve fan service:
Check out more HANA Customer Stories:
Check the below blog to know more of Hadoop Use Cases:
SAP’s Hadoop Strategy:
To get the latest news regarding SAP and Hadoop, follow SAP’s Big data site: http://www.sapbigdata.com/
Check this blog to know about SAP’s Hadoop Strategy:
Recently SAP has signed agreements to redistribute and support Intel Distribution Apache Hadoop and Hortonworks Data Platform to customers.
Hortonworks is a company that develops, distributes and supports Hadoop.
Also read the below article by Information Week:
If you are interested, you can also join Tomorrow’s SAP Big Data Chat with Hortonworks:
Learn more about Hadoop and HANA Integration:
Follow the channel SAP Database and Technology at https://www.brighttalk.com/channel/9727 and watch all Webinars for free.
Check the below document to get links to all Big Data Webinars:
Read about SAP Hortonworks Reference Architecture:
Read about Combining SAP Real-Time Data Platform with Hortonworks Data Platform
Thank You for reading my blog.