What is Hadoop?

former_member45323 · ‎06-24-2019

This blog is part of the series My Learning Journey for Hadoop. In this blog I will focus on basic introduction of Hadoop. If you have reached this blog directly, I would recommend reading my previous blog first - What is Big Data and Why do we need Hadoop for Big Data?

What is Hadoop?

Hadoop is an open source framework licensed under the Apache.

Hadoop allows to store huge volume of data

It provides the capability to process that data using simple programming model

A little history behind Hadoop

Hadoop was created by Doug Cutting and Mike Cafarella in 2005.

Advantages of Hadoop

Below are the major advantages of Hadoop.

Hadoop is Cost Effective System

Hadoop does not require any expensive or specialized hardware. It can be implemented on simple hardware. These hardwares are technically referred as commodity hardwares.

Hadoop is also an open source framework and does not require any license.

Hadoop supports Large Cluster of Nodes

Hadoop cluster is made of huge no of nodes. Main advantage of having a large cluster is

It offers more computing power

And huge storage system

Hadoop provides automatic fail-over management

In case any of the nodes within the cluster fails, Hadoop framework will replace that machine with another machine.

It also copies all the configuration settings and data from the failed machine to this newly added machine. Admins may not need to worry about all this.

Hadoop is flexible

Hadoop can manage any type of data, structured or not, from any number of sources

Data from multiple sources can be joined enabling deeper analyses

Hadoop Core Components?

Hadoop has 2 core components HDFS and MapReduce

HDFS - Hadoop Distributed File System

HDFS is a Hadoop distributed file system for storing large files on cluster of machines. HDFS does data chunking and distribute data across different machines.

MapReduce

MapReduce is the heart of Hadoop.

It is a programming model designed for processing the distributed data stored in parallel, by dividing the work into a set of independent tasks.

What’s Next?

Check the blog series My Learning Journey for Hadoop or directly jump to next article Hadoop Ecosystem.

Introduction to Hadoop in simple words

What is Hadoop?

A little history behind Hadoop

Advantages of Hadoop

Hadoop is Cost Effective System

Hadoop supports Large Cluster of Nodes

Hadoop provides automatic fail-over management

Hadoop is flexible

Hadoop Core Components?

HDFS - Hadoop Distributed File System

MapReduce

What’s Next?

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win

Share your HANA story – Win 4 Tickets to SAP d-code(formerly SAP TechEd)

Challenge Submission (SAP Purchase Order - Intelligent assistant)

Steampunk is going all-in

SAP (HANA) Cheat Sheet

DataGenius: Challenge Accepted

Getting Started with the ABAP RESTful Application Programming Model (RAP)

Great Infographic to explain SAP Business Technology Platform (2023 Update!)