Software Development Exam  >  Software Development Notes  >  Hadoop Tutorials: Brief Introduction  >  Hadoop 2.6 Multi Node Cluster Setup and Hadoop Installation

Hadoop 2.6 Multi Node Cluster Setup and Hadoop Installation | Hadoop Tutorials: Brief Introduction - Software Development PDF Download

1. Hadoop 2.6 Multi Node Cluster Setup Tutorial – Objective

In this tutorial on Install Hadoop 2.6 Multi node cluster setup on Ubuntu, we will learn how to install a Hadoop 2.6 multi-node cluster setup with YARN. We will learn various steps for Hadoop 2.6 installing on Ubuntu to setup Hadoop multi-node cluster. We will start with platform requirements for Hadoop 2.6  Multi Node Cluster Setup on Ubuntu, prerequisites to install Hadoop on master and slave, various software required for installing Hadoop, how to start Hadoop cluster and how to stop Hadoop cluster. It will also cover how to install Hadoop CDH5 to help you in programming in Hadoop.



2. Hadoop 2.6 Multi Node Cluster Setup

Let us now start with steps to setup Hadoop multi-node cluster in Ubuntu. Let us first understand the recommended platform for installing Hadoop on the multi-node cluster in Ubuntu.

2.1. Recommended Platform for Hadoop 2.6 Multi Node Cluster Setup

  • OS: Linux is supported as a development and production platform. You can use Ubuntu 14.04 or 16.04 or later (you can also use other Linux flavors like CentOS, Redhat, etc.)
  • Hadoop: Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)

2.2. Install Hadoop on Master

Let us now start with installing Hadoop on master node in the distributed mode.

I. Prerequisites for Hadoop 2.6 Multi Node Cluster Setup

Let us now start with learning the prerequisites to install Hadoop:

a. Add Entries in hosts file

Edit hosts file and add entries of master and slaves:

  1. sudo nano /etc/hosts
  2. MASTER-IP master
  3. SLAVE01-IP slave01
  4. SLAVE02-IP slave02

(NOTE: In place of MASTER-IP, SLAVE01-IP, SLAVE02-IP put the value of the corresponding IP)

b. Install Java 8 (Recommended Oracle Java)

  • Install Python Software Properties

sudo apt-get install python-software-properties

  • Add Repository

sudo add-apt-repository ppa:webupd8team/java

  • Update the source list

sudo apt-get update

  • Install Java

sudo apt-get install oracle-java8-installer

c. Configure SSH

  • Install Open SSH Server-Client

sudo apt-get install openssh-server openssh-client

  • Generate Key Pairs

ssh-keygen -t rsa -P ""

  • Configure passwordless SSH

Copy the content of .ssh/id_rsa.pub (of master) to .ssh/authorized_keys (of all the slaves as well as master)

  • Check by SSH to all the Slaves
  1. ssh slave01
  2. ssh slave02

II. Install Apache Hadoop in distributed mode

Let us now learn how to download and install Hadoop?

a. Download Hadoop

Below is the link to download Hadoop 2.x.

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz

b. Untar Tarball

tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz

(Note: All the required jars, scripts, configuration files, etc. are available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2))


III. Hadoop multi-node cluster setup Configuration

Let us now learn how to setup Hadoop configuration while installing Hadoop?

a. Edit .bashrc

Edit .bashrc file located in user’s home directory and add following environment variables:

  1. export HADOOP_PREFIX="/home/ubuntu/hadoop-2.5.0-cdh5.3.2"
  2. export PATH=$PATH:$HADOOP_PREFIX/bin
  3. export PATH=$PATH:$HADOOP_PREFIX/sbin
  4. export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
  5. export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
  6. export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
  7. export YARN_HOME=${HADOOP_PREFIX}

(Note: After above step restart the Terminal/Putty so that all the environment variables will come into effect)

b. Check environment variables

Check whether the environment variables added in the .bashrc file are available:

  1. bash
  2. hdfs

(It should not give error: command not found)

c. Edit hadoop-env.sh

Edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-8-oracle/)

d. Edit core-site.xml

Edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration>
  2. <property>
  3. <name>fs.defaultFS</name>
  4. <value>hdfs://master:9000</value>
  5. </property>
  6. <property>
  7. <name>hadoop.tmp.dir</name>
  8. <value>/home/ubuntu/hdata</value>
  9. </property>
  10. </configuration>

Note: /home/ubuntu/hdata is a sample location; please specify a location where you have Read Write privileges

e. Edit hdfs-site.xml

Edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration>
  2. <property>
  3. <name>dfs.replication</name>
  4. <value>2</value>
  5. </property>
  6. </configuration>

f. Edit mapred-site.xml

Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration>
  2. <property>
  3. <name>mapreduce.framework.name</name>
  4. <value>yarn</value>
  5. </property>
  6. </configuration>

g. Edit yarn-site.xml

Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration>
  2. <property>
  3. <name>yarn.nodemanager.aux-services</name>
  4. <value>mapreduce_shuffle</value>
  5. </property>
  6. <property>
  7. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  8. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  9. </property>
  10. <property>
  11. <name>yarn.resourcemanager.resource-tracker.address</name>
  12. <value>master:8025</value>
  13. </property>
  14. <property>
  15. <name>yarn.resourcemanager.scheduler.address</name>
  16. <value>master:8030</value>
  17. </property>
  18. <property>
  19. <name>yarn.resourcemanager.address</name>
  20. <value>master:8040</value>
  21. </property>
  22. </configuration>

h. Edit salves

Edit configuration file slaves (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. slave01
  2. slave02

“Hadoop is set up on Master, now setup Hadoop on all the Slaves”

Refer this guide to learn Hadoop Features and design principles.

2.3. Install Hadoop On Slaves

I. Setup Prerequisites on all the slaves

Run following steps on all the slaves:

  • Add Entries in hosts file
  • Install Java 8 (Recommended Oracle Java)

II. Copy configured setups from master to all the slaves

a. Create tarball of configured setup

tar czf hadoop.tar.gz hadoop-2.5.0-cdh5.3.2

(NOTE: Run this command on Master)

b. Copy the configured tarball on all the slaves

scp hadoop.tar.gz slave01:~

(NOTE: Run this command on Master)

scp hadoop.tar.gz slave02:~

(NOTE: Run this command on Master)

c. Un-tar configured Hadoop setup on all the slaves

tar xzf hadoop.tar.gz

(NOTE: Run this command on all the slaves)

“Hadoop is set up on all the Slaves. Now Start the Cluster”

2.4. Start the Hadoop Cluster

Let us now learn how to start Hadoop cluster?

I. Format the name node

bin/hdfs namenode -format

(Note: Run this command on Master)

(NOTE: This activity should be done once when you install Hadoop, else it will delete all the data from HDFS)

II. Start HDFS Services

sbin/start-dfs.sh

(Note: Run this command on Master)

III. Start YARN Services

sbin/start-yarn.sh

(Note: Run this command on Master)

IV. Check for Hadoop services

a. Check daemons on Master

  1. jps</pre>
  2. NameNode
  3. ResourceManager

b. Check daemons on Slaves

  1. jps</pre>
  2. DataNode
  3. NodeManager

2.5. Stop The Hadoop Cluster

Let us now see how to stop the Hadoop cluster?

I. Stop YARN Services

sbin/stop-yarn.sh

(Note: Run this command on Master)

II. Stop HDFS Services

sbin/stop-dfs.sh

(Note: Run this command on Master)

This is how we do Hadoop 2.6 multi node cluster setup on Ubuntu.

The document Hadoop 2.6 Multi Node Cluster Setup and Hadoop Installation | Hadoop Tutorials: Brief Introduction - Software Development is a part of the Software Development Course Hadoop Tutorials: Brief Introduction.
All you need of Software Development at this link: Software Development
1 videos|14 docs

Top Courses for Software Development

FAQs on Hadoop 2.6 Multi Node Cluster Setup and Hadoop Installation - Hadoop Tutorials: Brief Introduction - Software Development

1. What is Hadoop?
Ans. Hadoop is an open-source framework that allows for distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from a single server to thousands of machines, each offering local computation and storage.
2. What is a multi-node cluster setup in Hadoop?
Ans. A multi-node cluster setup in Hadoop refers to configuring and deploying Hadoop on multiple machines to create a distributed computing environment. It allows for parallel processing of data across multiple nodes, enabling faster and more efficient data analysis.
3. How do I set up a multi-node Hadoop cluster?
Ans. To set up a multi-node Hadoop cluster, you need to follow several steps: 1. Install Hadoop on each node. 2. Configure the Hadoop environment variables. 3. Update the Hadoop configuration files on each node. 4. Set up the SSH connection between the nodes for seamless communication. 5. Start the Hadoop daemons on the master and slave nodes. 6. Verify the cluster setup using the Hadoop web interface or command-line tools.
4. What are the benefits of a multi-node Hadoop cluster?
Ans. A multi-node Hadoop cluster offers several benefits, including: 1. Increased processing power and capability to handle large datasets. 2. Improved fault tolerance and reliability through data replication across nodes. 3. Scalability to add or remove nodes based on the changing workload. 4. Parallel processing of data, resulting in faster data analysis and reduced processing time. 5. Effective utilization of resources by distributing the workload across multiple nodes.
5. What are the minimum hardware requirements for setting up a multi-node Hadoop cluster?
Ans. The minimum hardware requirements for setting up a multi-node Hadoop cluster depend on the size and complexity of the data being processed. However, some general guidelines include: 1. Each node should have sufficient RAM (minimum 8GB) to handle the data processing tasks. 2. Each node should have multiple CPU cores (minimum 4 cores) to enable parallel processing. 3. Adequate storage space (minimum 1TB) to store the input and output data. 4. Fast and reliable network connectivity between the nodes to ensure efficient data transfer.
Explore Courses for Software Development exam

Top Courses for Software Development

Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

Summary

,

past year papers

,

mock tests for examination

,

Previous Year Questions with Solutions

,

ppt

,

Sample Paper

,

video lectures

,

Free

,

Viva Questions

,

Hadoop 2.6 Multi Node Cluster Setup and Hadoop Installation | Hadoop Tutorials: Brief Introduction - Software Development

,

Semester Notes

,

study material

,

Important questions

,

Exam

,

shortcuts and tricks

,

Extra Questions

,

Hadoop 2.6 Multi Node Cluster Setup and Hadoop Installation | Hadoop Tutorials: Brief Introduction - Software Development

,

Objective type Questions

,

pdf

,

MCQs

,

Hadoop 2.6 Multi Node Cluster Setup and Hadoop Installation | Hadoop Tutorials: Brief Introduction - Software Development

,

practice quizzes

;