Skip to Content
Technical Articles
Author's profile photo Dolly Mishra

Introduction to Hadoop in simple words

This blog is part of the series My Learning Journey for Hadoop. In this blog I will focus on basic introduction of Hadoop. If you have reached this blog directly, I would recommend reading my previous blog first – What is Big Data and Why do we need Hadoop for Big Data?

What is Hadoop?

Hadoop is an open source framework licensed under the Apache.

  • Hadoop allows to store huge volume of data
  • It provides the capability to process that data using simple programming model


A little history behind Hadoop

Hadoop was created by Doug Cutting and Mike Cafarella in 2005.

Advantages of Hadoop

Below are the major advantages of Hadoop.

Hadoop is Cost Effective System

Hadoop does not require any expensive or specialized hardware. It can be implemented on simple hardware. These hardwares are technically referred as commodity hardwares.

Hadoop is also an open source framework and does not require any license.

Hadoop supports Large Cluster of Nodes

Hadoop cluster is made of huge no of nodes.  Main advantage of having a large cluster is

  • It offers more computing power
  • And huge storage system

Hadoop provides automatic fail-over management

In case any of the nodes within the cluster fails, Hadoop framework will replace that machine with another machine.

It also copies all the configuration settings and data from the failed machine to this newly added machine. Admins may not need to worry about all this.

Hadoop is flexible

Hadoop can manage any type of data, structured or not, from any number of sources

Data from multiple sources can be joined enabling deeper analyses

Hadoop Core Components?

Hadoop has 2 core components HDFS and MapReduce

HDFS – Hadoop Distributed File System

HDFS is a Hadoop distributed file system for storing large files on cluster of machines. HDFS does data chunking and distribute data across different machines.

MapReduce

MapReduce is the heart of Hadoop.

It is a programming model designed for processing the distributed data stored in parallel, by dividing the work into a set of independent tasks.

What’s Next?

Check the blog series My Learning Journey for Hadoop or directly jump to next article Hadoop Ecosystem.

 

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.