Introduction to Hadoop in simple words
This blog is part of the series My Learning Journey for Hadoop. In this blog I will focus on basic introduction of Hadoop. If you have reached this blog directly, I would recommend reading my previous blog first – What is Big Data and Why do we need Hadoop for Big Data?
What is Hadoop?
Hadoop is an open source framework licensed under the Apache.
- Hadoop allows to store huge volume of data
- It provides the capability to process that data using simple programming model
A little history behind Hadoop
Hadoop was created by Doug Cutting and Mike Cafarella in 2005.
Advantages of Hadoop
Below are the major advantages of Hadoop.
Hadoop is Cost Effective System
Hadoop does not require any expensive or specialized hardware. It can be implemented on simple hardware. These hardwares are technically referred as commodity hardwares.
Hadoop is also an open source framework and does not require any license.
Hadoop supports Large Cluster of Nodes
Hadoop cluster is made of huge no of nodes. Main advantage of having a large cluster is
- It offers more computing power
- And huge storage system
Hadoop provides automatic fail-over management
In case any of the nodes within the cluster fails, Hadoop framework will replace that machine with another machine.
It also copies all the configuration settings and data from the failed machine to this newly added machine. Admins may not need to worry about all this.
Hadoop is flexible
Hadoop can manage any type of data, structured or not, from any number of sources
Data from multiple sources can be joined enabling deeper analyses
Hadoop Core Components?
Hadoop has 2 core components HDFS and MapReduce
HDFS – Hadoop Distributed File System
HDFS is a Hadoop distributed file system for storing large files on cluster of machines. HDFS does data chunking and distribute data across different machines.
MapReduce is the heart of Hadoop.
It is a programming model designed for processing the distributed data stored in parallel, by dividing the work into a set of independent tasks.
Check the blog series My Learning Journey for Hadoop or directly jump to next article Hadoop Ecosystem.