Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
This blog is part of the series My Learning Journey for Hadoop. In this blog I will focus on basic introduction of Hadoop. If you have reached this blog directly, I would recommend reading my previous blog first - What is Big Data and Why do we need Hadoop for Big Data?

What is Hadoop?


Hadoop is an open source framework licensed under the Apache.

  • Hadoop allows to store huge volume of data

  • It provides the capability to process that data using simple programming model



A little history behind Hadoop


Hadoop was created by Doug Cutting and Mike Cafarella in 2005.


Advantages of Hadoop


Below are the major advantages of Hadoop.

Hadoop is Cost Effective System




Hadoop does not require any expensive or specialized hardware. It can be implemented on simple hardware. These hardwares are technically referred as commodity hardwares.

Hadoop is also an open source framework and does not require any license.

Hadoop supports Large Cluster of Nodes




Hadoop cluster is made of huge no of nodes.  Main advantage of having a large cluster is

  • It offers more computing power

  • And huge storage system


Hadoop provides automatic fail-over management


In case any of the nodes within the cluster fails, Hadoop framework will replace that machine with another machine.

It also copies all the configuration settings and data from the failed machine to this newly added machine. Admins may not need to worry about all this.


Hadoop is flexible


Hadoop can manage any type of data, structured or not, from any number of sources

Data from multiple sources can be joined enabling deeper analyses


Hadoop Core Components?


Hadoop has 2 core components HDFS and MapReduce


HDFS - Hadoop Distributed File System




HDFS is a Hadoop distributed file system for storing large files on cluster of machines. HDFS does data chunking and distribute data across different machines.


MapReduce



MapReduce is the heart of Hadoop.

It is a programming model designed for processing the distributed data stored in parallel, by dividing the work into a set of independent tasks.

What’s Next?


Check the blog series My Learning Journey for Hadoop or directly jump to next article Hadoop Ecosystem.