Software Development Exam  >  Software Development Notes  >  Hadoop Tutorials: Brief Introduction  >  Installation of Hadoop 3.x on Ubuntu on Single Node Cluster

Installation of Hadoop 3.x on Ubuntu on Single Node Cluster | Hadoop Tutorials: Brief Introduction - Software Development PDF Download

1. Objective

In this tutorial on Installation of Hadoop 3.x on Ubuntu, we are going to learn steps for setting up a pseudo-distributed, single-node Hadoop 3.x cluster on Ubuntu. We will learn steps like how to install java, how to install SSH and configure passwordless SSH, how to download Hadoop, how to setup Hadoop configurations like .bashrc file, hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, YARN-site.xml, how to start the Hadoop cluster and how to stop the Hadoop services.

Learn step by step installation of Hadoop 2.7.x on Ubuntu.


2. Installation of Hadoop 3.x on Ubuntu

Before we start with Hadoop 3.x installation on Ubuntu, let us understand key features that have been added in Hadoop 3 that makes the comparison between Hadoop 2 and Hadoop 3.

2.1. Java 8 installation

Hadoop requires working java installation. Let us start with steps for installing java 8:

a. Install Python Software Properties

sudo apt-get install python-software-properties

b. Add Repository

sudo add-apt-repository ppa:webupd8team/java

c. Update the source list

sudo apt-get update

d. Install Java 8

sudo apt-get install oracle-java8-installer

e. Check if java is correctly installed

java -version

2.2. Configure SSH

SSH is used for remote login. SSH is required in Hadoop to manage its nodes, i.e. remote machines and local machine if you want to use Hadoop on it. Let us now see SSH installation of Hadoop 3.x on Ubuntu:

a. Installation of passwordless SSH

  1. sudo apt-get install ssh
  2. sudo apt-get install pdsh

b. Generate Key Pairs

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

c. Configure passwordless ssh

cat ~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys

e. Change the permission of file that contains the key

chmod 0600 ~/.ssh/authorized_keys

f. check ssh to the localhost

ssh localhost

2.3. Install Hadoop

a. Download Hadoop

http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-3.0.0-alpha2/hadoop-3.0.0-alpha2.tar.gz

(Download the latest version of Hadoop hadoop-3.0.0-alpha2.tar.gz)

b. Untar Tarball

tar -xzf hadoop-3.0.0-alpha2.tar.gz

2.4. Hadoop Setup Configuration

a. Edit .Bashrc

Open .bashrc

nano ~/.bashrc

Edit .bashrc:

Edit .bashrc file is located in user’s home directory and adds following parameters:

  1. export HADOOP_PREFIX="/home/dataflair/hadoop-3.0.0-alpha2"
  2. export PATH=$PATH:$HADOOP_PREFIX/bin
  3. export PATH=$PATH:$HADOOP_PREFIX/sbin
  4. export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
  5. export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
  6. export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
  7. export YARN_HOME=${HADOOP_PREFIX}

Then run

Source ~/.bashrc

b. Edit hadoop-env.sh

Edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

export JAVA_HOME=/usr/lib/jvm/java-8-oracle/

c. Edit core-site.xml
Edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration> 
  2. <property> 
  3. <name>fs.defaultFS</name>
  4. <value>hdfs://localhost:9000</value>
  5. </property> 
  6. <property> 
  7. <name>hadoop.tmp.dir</name> 
  8. <value>/home/dataflair/hdata</value>
  9. </property> 
  10. </configuration>

d. Edit hdfs-site.xml

Edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration>
  2. <property>
  3. <name>dfs.replication</name>
  4. <value>1</value>
  5. </property>
  6. </configuration>

e. Edit mapred-site.xml

If mapred-site.xml file is not available, then use

cp mapred-site.xml.template mapred-site.xml

Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration>
  2. <property>
  3. <name>mapreduce.framework.name</name>
  4. <value>yarn</value>
  5. </property>
  6. </configuration>

f. Yarn-site.xml

Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration>
  2. <property>
  3. <name>yarn.nodemanager.aux-services</name>
  4. <value>mapreduce_shuffle</value>
  5. </property>
  6. <property>
  7. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  8. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  9. </property>
  10. </configuration>

Test your Hadoop knowledge with this Big data Hadoop quiz.

2.5. How to Start the Hadoop services

Let us now see how to start the Hadoop cluster:

The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster”. This is done as follows:

a. Format the namenode

bin/hdfs namenode -format

NOTE: This activity should be done once when you install Hadoop and not for running Hadoop filesystem, else it will delete all your data from HDFS

b. Start HDFS Services

sbin/start-dfs.sh

It will give an error at the time of start HDFS services then use:

echo "ssh" | sudo tee /etc/pdsh/rcmd_default

c. Start YARN Services

sbin/start-yarn.sh

d. Check how many daemons are running

Let us now see whether expected Hadoop processes are running or not:

  1. jps
  2. 2961 ResourceManager
  3. 2482 DataNode
  4. 3077 NodeManager
  5. 2366 NameNode
  6. 2686 SecondaryNameNode
  7. 3199 Jps


2.6. How to Stop the Hadoop services

Let us learn how to stop Hadoop services now:

a. Stop YARN services

sbin/stop-yarn.sh

b. Stop HDFS services

sbin/stop-dfs.sh

Note:

Browse the web interface for the NameNode; by default, it is available at:

NameNode – http://localhost:9870/

Browse the web interface for the ResourceManager; by default, it is available at:

ResourceManager – http://localhost:8088/

The document Installation of Hadoop 3.x on Ubuntu on Single Node Cluster | Hadoop Tutorials: Brief Introduction - Software Development is a part of the Software Development Course Hadoop Tutorials: Brief Introduction.
All you need of Software Development at this link: Software Development
1 videos|14 docs

Top Courses for Software Development

FAQs on Installation of Hadoop 3.x on Ubuntu on Single Node Cluster - Hadoop Tutorials: Brief Introduction - Software Development

1. What is Hadoop and why is it used?
Ans. Hadoop is an open-source framework that allows for distributed processing and storage of large datasets across clusters of computers. It is used to efficiently process and analyze big data, providing scalability, fault-tolerance, and high availability.
2. How do I install Hadoop 3.x on Ubuntu for a single node cluster?
Ans. To install Hadoop 3.x on Ubuntu for a single node cluster, you can follow these steps: 1. Download the Hadoop distribution from the official Apache Hadoop website. 2. Extract the downloaded tar file to a directory of your choice. 3. Set up the necessary environment variables in the ~/.bashrc file. 4. Configure the Hadoop files by editing the core-site.xml, hdfs-site.xml, and yarn-site.xml files. 5. Format the Hadoop file system by running the command: hdfs namenode -format. 6. Start the Hadoop daemons by running the command: start-all.sh. 7. Verify the installation by accessing the Hadoop web interface.
3. What are the system requirements for installing Hadoop 3.x on Ubuntu?
Ans. The system requirements for installing Hadoop 3.x on Ubuntu are as follows: - Ubuntu operating system (version 18.04 or higher is recommended) - Java Development Kit (JDK) version 8 or higher - Sufficient RAM and disk space to handle the data and processing requirements of your specific use case - Good network connectivity for communication between nodes in a cluster
4. Can Hadoop be used for a multi-node cluster on Ubuntu?
Ans. Yes, Hadoop can be used for a multi-node cluster on Ubuntu. In a multi-node setup, multiple machines or nodes are connected to form a Hadoop cluster. Each node contributes its processing power and storage capacity to the cluster, allowing for distributed processing of large datasets. The installation and configuration process for a multi-node cluster is more complex than a single node cluster, but it provides greater scalability and performance.
5. How does Hadoop ensure fault-tolerance in a cluster?
Ans. Hadoop ensures fault-tolerance in a cluster through various mechanisms: - Data replication: Hadoop replicates data across multiple nodes in the cluster to ensure that even if one node fails, the data is still available on other nodes. - Task monitoring and reassignment: If a node fails during the execution of a task, Hadoop detects the failure and reassigns the task to another node to ensure its completion. - Node monitoring: Hadoop continuously monitors the health and status of each node in the cluster. If a node becomes unresponsive, it is marked as failed and its tasks are reassigned to other nodes. - Job recovery: In the event of a job failure, Hadoop can recover and restart the job from the point of failure, ensuring that the processing is not lost.
1 videos|14 docs
Download as PDF
Explore Courses for Software Development exam

Top Courses for Software Development

Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

Extra Questions

,

Important questions

,

MCQs

,

pdf

,

Sample Paper

,

Exam

,

past year papers

,

video lectures

,

Objective type Questions

,

Viva Questions

,

ppt

,

Installation of Hadoop 3.x on Ubuntu on Single Node Cluster | Hadoop Tutorials: Brief Introduction - Software Development

,

Installation of Hadoop 3.x on Ubuntu on Single Node Cluster | Hadoop Tutorials: Brief Introduction - Software Development

,

Installation of Hadoop 3.x on Ubuntu on Single Node Cluster | Hadoop Tutorials: Brief Introduction - Software Development

,

study material

,

Summary

,

mock tests for examination

,

Semester Notes

,

Previous Year Questions with Solutions

,

Free

,

practice quizzes

,

shortcuts and tricks

;