IT & Software  >  Hadoop Tutorials: Brief Introduction  >  Comparison Between Hadoop 2.x vs Hadoop 3.x

Comparison Between Hadoop 2.x vs Hadoop 3.x Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software

Document Description: Comparison Between Hadoop 2.x vs Hadoop 3.x for IT & Software 2022 is part of Hadoop Tutorials: Brief Introduction preparation. The notes and questions for Comparison Between Hadoop 2.x vs Hadoop 3.x have been prepared according to the IT & Software exam syllabus. Information about Comparison Between Hadoop 2.x vs Hadoop 3.x covers topics like and Comparison Between Hadoop 2.x vs Hadoop 3.x Example, for IT & Software 2022 Exam. Find important definitions, questions, notes, meanings, examples, exercises and tests below for Comparison Between Hadoop 2.x vs Hadoop 3.x.

Introduction of Comparison Between Hadoop 2.x vs Hadoop 3.x in English is available as part of our Hadoop Tutorials: Brief Introduction for IT & Software & Comparison Between Hadoop 2.x vs Hadoop 3.x in Hindi for Hadoop Tutorials: Brief Introduction course. Download more important topics related with notes, lectures and mock test series for IT & Software Exam by signing up for free. IT & Software: Comparison Between Hadoop 2.x vs Hadoop 3.x Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software
1 Crore+ students have signed up on EduRev. Have you?

1. Objective

In this Hadoop tutorial, we will discuss the Comparison between Hadoop 2.x vs Hadoop 3.x. What are the new features added in Hadoop version 3, is Hadoop 2 programs compatible in Hadoop 3, what are the difference between Hadoop 2 and Hadoop 3? We hope that this Feature wise difference between Hadoop 2 and Hadoop 3. will help you to answer the above questions.


2. Feature wise Comparison Between Hadoop 2.x vs Hadoop 3.x

This section will let you know the Top 22 differences between Hadoop 2.x vs Hadoop 3.x. Let us now discuss each feature one by one-

2.1. License

  • Hadoop 2.x – Apache 2.0, Open Source
  • Hadoop 3.x – Apache 2.0, Open Source

2.2. Minimum supported version of Java

  • Hadoop 2.x – Minimum supported version of java is java 7.
  • Hadoop 3.x – Minimum supported version of java is java 8

2.3. Fault Tolerance

  • Hadoop 2.x – Fault tolerance can be handled by replication (which is wastage of space).
  • Hadoop 3.x – Fault tolerance can be handled by Erasure coding.

2.4. Data Balancing

  • Hadoop 2.x – For data, balancing uses HDFS balancer.
  • Hadoop 3.x – For data, balancing uses Intra-data node balancer, which is invoked via the HDFS disk balancer CLI.

2.5. Storage Scheme

  • Hadoop 2.x – Uses 3X replication scheme
  • Hadoop 3.x – Support for erasure encoding in HDFS.

2.6. Storage Overhead

  • Hadoop 2.x – HDFS has 200% overhead in storage space.
  • Hadoop 3.x – Storage overhead is only 50%.

2.7. Storage Overhead Example

  • Hadoop 2.x – If there is 6 block so there will be 18 blocks occupied the space because of the replication scheme.
  • Hadoop 3.x – If there is 6 block so there will be 9 blocks occupied the space 6 block and 3 for parity.

2.8. YARN Timeline Service

  • Hadoop 2.x – Uses an old timeline service which has scalability issues.
  • Hadoop 3.x – Improve the timeline service v2 and improves the scalability and reliability of timeline service.

2.9. Default Ports Range

  • Hadoop 2.x – In Hadoop 2.0 some default ports are Linux ephemeral port range. So at the time of startup, they will fail to bind.
  • Hadoop 3.x – But in Hadoop 3.0 these ports have been moved out of the ephemeral range.

2.10. Tools

  • Hadoop 2.x – Uses Hive, pig, Tez, Hama, Giraph and other Hadoop tools.
  • Hadoop 3.x – Hive, pig, Tez, Hama, Giraph and other Hadoop tools are available.

2.11. Compatible File System

  • Hadoop 2.x – HDFS (Default FS), FTP File system: This stores all its data on remotely accessible FTP servers. Amazon S3 (Simple Storage Service) file system Windows Azure Storage Blobs (WASB) file system.
  • Hadoop 3.x – It supports all the previous one as well as Microsoft Azure Data Lake filesystem.

2.12. Datanode Resources

  • Hadoop 2.x – Datanode resource is not dedicated for the MapReduce we can use it for other application.
  • Hadoop 3.x – Here also data node resources can be used for other Applications too.

2.13. MR API Compatibility

  • Hadoop 2.x – MR API compatible with Hadoop 1.x program to execute on Hadoop 2.X
  • Hadoop 3.x – Here also MR API is compatible with running Hadoop 1.x programs to execute on Hadoop 3.X

2.14. Support for Microsoft Windows

  • Hadoop 2.x – It can be deployed on windows.
  • Hadoop 3.x – It also supports for Microsoft windows.

2.15. Slots/Container

  • Hadoop 2.x – Hadoop 1 works on the concept of slots but Hadoop 2.X works on the concept of the container. Through in the container, we can run the generic task.
  • Hadoop 3.x – It also works on the concept of a container.

2.16. Single Point of Failure

  • Hadoop 2.x – Has Features to overcome SPOF so whenever Namenode fails it recovers automatically.
  • Hadoop 3.x – Has Feature to overcome SPOF so whenever Namenode fails it recovers automatically no needs manual intervention to overcome it.

2.17. HDFS Federation

  • Hadoop 2.x – In Hadoop 1.0, only single NameNode to manage all Namespace but in Hadoop 2.0, multiple NameNode for multiple Namespace.
  • Hadoop 3.x – Hadoop 3.x also have multiple Namenode for multiple namespaces.

2.18. Scalability

  • Hadoop 2.x – We can scale up to 10,000 Nodes per cluster.
  • Hadoop 3.x – Better scalability. we can scale more than 10,000 nodes per cluster.

2.19. Faster Access to Data

  • Hadoop 2.x – Due to data Node caching we can fast access the data.
  • Hadoop 3.x – Here also through Datanode caching we can fast access the data.

2.20. HDFS Snapshot

  • Hadoop 2.x – Hadoop 2 adds the support for a snapshot. It provides disaster recovery and protection for user error.
  • Hadoop 3.x – Hadoop 2 also support for the snapshot feature.

2.21. Platform

  • Hadoop 2.x – Can serve as a platform for a wide variety of data analytics possible to run event processing, streaming, and real-time operations.
  • Hadoop 3.x – Here also it is possible to run event processing, streaming and real-time operation on the top of YARN.

2.22. Cluster Resource Management

  • Hadoop 2.x – For cluster resource Management it uses YARN. It improves scalability, high availability, Multi-tenancy.
  • Hadoop 3.x – For a cluster, resource Management Uses YARN, with all the features.

3. Conclusion

As we have discussed 22 important differences between Hadoop 2.x vs Hadoop 3.x, now we can decide which is better between Hadoop 2 and Hadoop 3 for installation. 

The document Comparison Between Hadoop 2.x vs Hadoop 3.x Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software is a part of the IT & Software Course Hadoop Tutorials: Brief Introduction.
All you need of IT & Software at this link: IT & Software
Download as PDF

Download free EduRev App

Track your progress, build streaks, highlight & save important lessons and more!

Related Searches

ppt

,

Summary

,

shortcuts and tricks

,

pdf

,

Semester Notes

,

Sample Paper

,

Comparison Between Hadoop 2.x vs Hadoop 3.x Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software

,

MCQs

,

Viva Questions

,

Important questions

,

Previous Year Questions with Solutions

,

past year papers

,

Comparison Between Hadoop 2.x vs Hadoop 3.x Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software

,

Extra Questions

,

Exam

,

study material

,

practice quizzes

,

Free

,

mock tests for examination

,

Objective type Questions

,

video lectures

,

Comparison Between Hadoop 2.x vs Hadoop 3.x Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software

;