Software Development Exam  >  Software Development Notes  >  Hadoop Tutorials: Brief Introduction  >  How Hadoop Works Internally – Inside Hadoop

How Hadoop Works Internally – Inside Hadoop | Hadoop Tutorials: Brief Introduction - Software Development PDF Download

1. How Hadoop Works Tutorial – Objective

Apache Hadoop is an open source software framework that stores data in a distributed manner and process that data in Parallel. Hadoop provides world’s most reliable storage layer – HDFS, a batch Processing engine – MapReduce and a Resource Management Layer – YARN. In this tutorial on ‘How Hadoop works internally’, we will learn what is Hadoop, how Hadoop works, different components of Hadoop, daemons in Hadoop, roles of HDFS and MapReduce in Hadoop and various steps to understand How Hadoop works.


2. Hadoop Components and Daemons

Before learning how hadoop works, let us brush our Hadoop Skills. And if you face any query regarding How Hadoop works in the tutorial please ask us in comments.

There are 2 layers in Hadoop – HDFS layer and Map-Reduce layer and 5 daemons which run on Hadoop in these 2 layers. Daemons are the processes that run in the background. The Hadoop Daemons are:-

a) Namenode – It runs on master node for HDFS.

b) Datanode – It runs on slave nodes for HDFS.

c) Resource Manager – It runs on YARN master node for MapReduce.

d) Node Manager – It runs on YARN slave node for MapReduce.

e) Secondary Namenode – It is backup for namenode and runs on a different system (other than master and slave nodes. One can also configure it on the slave node.)

These 5 daemons run for Hadoop to be functional.


HDFS provides the storage layer and MapReduce provides the computation layer in Hadoop. There are 1 namenode and several datanodes on storage layer ie HDFS. Similarly there is a resource manager and several node managers on computation layer ie MapReduce.

Namenode (HDFS) and resource manager (Map-Reduce) run on master while datanodes (HDFS) and node manager (Map-Reduce) run on slaves.


3. How Hadoop Works?

Hadoop does distributed processing for huge data sets across the cluster of commodity servers and works on multiple machines simultaneously. To process any data, the client submits data and program to Hadoop. HDFS stores the data while Mapreduce process the data.


As we know, HDFS is the storing element of Hadoop. There are 2 daemons that run for HDFS:

  • Namenode runs on the master node.
  • Datanode runs on slaves.

Namenode daemon stores the meta data while datanode daemons store the actual data.

The data is broken into small chunks called as blocks and these blocks are stored distributedly on different nodes in the cluster. Each block is replicated as per the replication factor (By default 3).

Let us now understand how data is processed in Hadoop.

Map Reduce is the processing layer of Hadoop. It has 2 daemons:

  • Resource manager that splits the job submitted by the client into small tasks.
  • Node manager that actually do the tasks in parallel in a distributed manner on data stored in datanodes.

To process the data, the client needs to submit the algorithm to the master node. Hadoop works on the principle of data locality ie. Instead of moving data to the algorithm, the algorithm is moved to datanodes where data is stored.

Let us summarize how Hadoop works step by step:

  • Input data is broken into blocks of size 128 Mb and then blocks are moved to different nodes.
  • Once all the blocks of the data are stored on data-nodes, the user can process the data.
  • Resource Manager then schedules the program (submitted by the user) on individual nodes.
  • Once all the nodes process the data, the output is written back to HDFS. Learn how to write data to HDFS.


4. How Hadoop Works Tutorial  – Conclusion

In conclusion to How Hadoop Works, we can say, the client first submits the data and program. HDFS stores that data and MapReduce processes that data. So now when we have learned Hadoop introduction and How Hadoop works.

The document How Hadoop Works Internally – Inside Hadoop | Hadoop Tutorials: Brief Introduction - Software Development is a part of the Software Development Course Hadoop Tutorials: Brief Introduction.
All you need of Software Development at this link: Software Development
1 videos|14 docs

Top Courses for Software Development

Explore Courses for Software Development exam

Top Courses for Software Development

Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

practice quizzes

,

Previous Year Questions with Solutions

,

Exam

,

Important questions

,

pdf

,

How Hadoop Works Internally – Inside Hadoop | Hadoop Tutorials: Brief Introduction - Software Development

,

Extra Questions

,

How Hadoop Works Internally – Inside Hadoop | Hadoop Tutorials: Brief Introduction - Software Development

,

study material

,

mock tests for examination

,

Summary

,

ppt

,

shortcuts and tricks

,

Objective type Questions

,

Sample Paper

,

MCQs

,

Free

,

How Hadoop Works Internally – Inside Hadoop | Hadoop Tutorials: Brief Introduction - Software Development

,

video lectures

,

Semester Notes

,

past year papers

,

Viva Questions

;