Software Development Exam  >  Software Development Notes  >  Hadoop Tutorials: Brief Introduction  >  Install Hadoop 2.8.x on Ubuntu | Hadoop Installation Steps

Install Hadoop 2.8.x on Ubuntu | Hadoop Installation Steps | Hadoop Tutorials: Brief Introduction - Software Development PDF Download

1. Install Hadoop on Ubuntu Tutorial: Objective

This article explains how to install Hadoop on Ubuntu in simple steps. Moreover, we will deploy Hadoop on the single node cluster on Ubuntu Linux. We will also enable YARN on the cluster while installation.


2. Hadoop Overview  

Hadoop is a framework for running distributed computing programs. It comprises of HDFS and Map Reduce (Programming framework).

The user can run only the MapReduce program in the earlier versions of Hadoop. Therefore, it was fit for batch processing computations.

The YARN provides API for requesting and allocating resource in the cluster. So the YARN is available in later versions of Hadoop 2. Hence, the API facilitates application program to process large-scale data of HDFS.

3. Prerequisites to Install Hadoop on Ubuntu

  • Hardware requirement- The machine must have 4GB RAM and minimum 60 GB hard disk for better performance.
  • Check java version- It is recommended to install Oracle Java 8. If you are not aware of Java installation, follow this Java 8 installation tutorial. The user can check the version of java with below command.

$ java -version

4. Easy Steps to install Hadoop on Ubuntu

Lets now discuss the steps to install Hadoop single node cluster on Ubuntu-

4.1. Setup passwordless ssh

a) Install Open SSH Server and Open SSH Client

We will now setup the password less ssh client with the following command.

sudo apt-get install openssh-server openssh-client

Install Hadoop on Ubuntu - Open SSH Server and Open SSH Client

Install Hadoop on Ubuntu – Open SSH Server and Open SSH Client


b) Generate Public & Private Key Pairs

ssh-keygen -t rsa -P ""

The terminal will prompt the user for entering the file name. Press enter and proceed. The location of a file will be in the home directory. Moreover, the extension will be the .ssh file.

c) Configure password-less SSH

The below command will add the public ssh-key to authorized_keys. Moreover, it will configure the passwordless ssh.

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Install Hadoop on Ubuntu - Configure password-less SSH

Install Hadoop on Ubuntu – Configure password-less SSH


d) Now verify the working of password-less ssh

As we type the “ssh localhost” it will prompt us to connect with it. Type ‘yes’ and press enter to proceed.

e) Now install rsync with command

$ sudo apt-get install rsync

install rsync

Install Hadoop on Ubuntu | Install rsync


4.2. Configure and Setup Hadoop

a) Download the Hadoop package 2.8.x

Use the link given below to download the Hadoop 2.8.x from Apache mirrors.


http://www-eu.apache.org/dist/hadoop/common/hadoop-2.8.2/

b) Untar the Tarball

Then we will extract the Hadoop into the home directory

tar xzf hadoop-2.8.2.tar.gz

how to install hadoop on ubuntu - Untar the Tarball

how to install hadoop on Ubuntu – Untar the Tarball


4.3. Setup Configuration

We can add only the minimum property in the Hadoop configuration. The user can add more properties to it.

a) Setting Up the environment variables

  • Edit .bashrc- Edit the bashrc and therefore add hadoop in a path:

nano bash.bashrc

And add the following path variables in it

  1. export HADOOP_HOME=/home/hduser/hadoop-2.8.2
  2. export HADOOP_INSTALL=$HADOOP_HOME
  3. export HADOOP_MAPRED_HOME=$HADOOP_HOME
  4. export HADOOP_COMMON_HOME=$HADOOP_HOME
  5. export HADOOP_HDFS_HOME=$HADOOP_HOME
  6. export YARN_HOME=$HADOOP_HOME
  7. export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
  8. export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin


Hadoop Installation - Edit .bashrc

Hadoop Installation – Edit .bashrc


  • Source .bashrc in current login session in terminal

source ~/.bashrc

b) Hadoop configuration file changes

  • Edit hadoop-env.sh

Edit hadoop-env.sh file which is in etc/hadoop inside the Hadoop installation directory. The user can set JAVA_HOME:

export JAVA_HOME=<root directory of Java-installation> (eg: /usr/lib/jvm/jdk1.8.0_151/)

hadoop installation - Edit hadoop-env.sh

Hadoop Installation – Edit hadoop-env.sh


  • Edit core-site.xml

Edit the core-site.xml with “nano core-site.xml”. The file is in the etc/hadoop inside Hadoop directory. Then we will add following entries.

  1. <configuration>
  2. <property>
  3. <name>fs.defaultFS</name>
  4. <value>hdfs://localhost:9000</value>
  5. </property>
  6. <property>
  7. <name>hadoop.tmp.dir</name>
  8. <value>/home/hdadmin/hdata</value>
  9. </property>
  10. </configuration>

Hadoop 2 Installation - Edit core-site.xml

Hadoop 2 Installation – Edit core-site.xml


Then we will edit the hdfs-site.xml with “nano hdfs-site.xml”. This file is actually located in etc/hadoop inside Hadoop installation directory. We will add the following entries:           

  • Edit hdfs-site.xml
  1. <configuration>
  2. <property>
  3. <name>dfs.replication</name>
  4. <value>1</value>
  5. </property>
  6. </configuration>


Install hadoop on Ubuntu - Edit hdfs-site.xml

Install hadoop on Ubuntu – Edit hdfs-site.xml


  • Edit mapred-site.xml

We will create a copy of mapred-site.xml from mapred-site.xml.template using cp command (cp mapred-site.xml.template mapred-site.xml). Now edit the mapred-site.xml with “nano command”. This file is also located in etc/hadoop inside Hadoop directory. We will copy the file with same name mapred-site.xml. This will add following entries:

  1. <configuration>
  2. <property>
  3. <name>mapreduce.framework.name</name>
  4. <value>yarn</value>
  5. </property>
  6. </configuration>

Install hadoop on Ubuntu - Edit mapred-site.xml

Install hadoop on Ubuntu – Edit mapred-site.xml


  • Edit yarn-site.xml

We will now edit yarn-site.xml with “nano yarn-site.xml”. It is in etc/hadoop inside Hadoop installation directory. Finally we add following entries:

  1. <configuration>
  2. <property>
  3. <name>yarn.nodemanager.aux-services</name>
  4. <value>mapreduce_shuffle</value>
  5. </property>
  6. <property>
  7. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  8. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  9. </property>
  10. </configuration>


Hadoop Installation - Edit yarn-site.xml file.

Hadoop Installation – Edit yarn-site.xml file.


4.4. Start the cluster

We will now start the single node cluster with the following commands.

a) Format the namenode

Moreover, we will format the namenode before using it the first time.

hdfs namenode -format

Install Hadoop 2 - Format the namenode

Install Hadoop 2 – Format the Namenode


b) Start the HDFS

We will start the hadoop cluster using the hadoop start-up script.

start-dfs.sh

Apache Hadoop Installation - Start the HDFS

Apache Hadoop Installation – Start the HDFS


c) Starting the YARN services

For starting the YARN we use

start-yarn.sh

 Apache Hadoop Installation on Ubuntu - Starting the YARN services

Apache Hadoop Installation on Ubuntu – Starting the YARN services


d) Verify if all process started

  1. jps
  2. 6775 DataNode
  3. 7209 ResourceManager
  4. 7017 SecondaryNameNode
  5. 6651 NameNode
  6. 7339 NodeManager
  7. 7663 Jps

Install Hadoop on Ubuntu

Install Hadoop on Ubuntu


e) Web interface-For viewing Web UI of NameNode

visit : (http://localhost:50070)

Web interface-For viewing Web UI of NameNode

Web interface-For viewing Web UI of NameNode


f) Resource Manager UI  (http://localhost:8088)

The web interface will display all running jobs on cluster information. Hence, this will help monitor the progress report.

Hadoop Installation - Resource Manager UI (http://localhost:8088)

Hadoop Installation – Resource Manager UI (http://localhost:8088)


4.5. Stopping the clusters

To Stop the HDFS Services we use stop-dfs.sh. To Stop YARN Services we use

stop-yarn.sh

Install Hadoop on Ubuntu - Stopping the clusters

Install Hadoop on Ubuntu – Stopping the clusters


You have successfully installed Hadoop 2.8.x on Ubuntu. Now you can play with big data using Hadoop HDFS commands.

The document Install Hadoop 2.8.x on Ubuntu | Hadoop Installation Steps | Hadoop Tutorials: Brief Introduction - Software Development is a part of the Software Development Course Hadoop Tutorials: Brief Introduction.
All you need of Software Development at this link: Software Development
1 videos|14 docs

Top Courses for Software Development

FAQs on Install Hadoop 2.8.x on Ubuntu - Hadoop Installation Steps - Hadoop Tutorials: Brief Introduction - Software Development

1. How do I install Hadoop 2.8.x on Ubuntu?
Ans. Here are the steps to install Hadoop 2.8.x on Ubuntu: 1. Update the system: sudo apt update 2. Install Java Development Kit (JDK): sudo apt install default-jdk 3. Download Hadoop 2.8.x package from the official Apache Hadoop website. 4. Extract the downloaded package: tar -xzvf hadoop-2.8.x.tar.gz 5. Move the extracted folder to a desired location: sudo mv hadoop-2.8.x /usr/local/hadoop 6. Set up environment variables: - Open .bashrc file: nano ~/.bashrc - Add the following lines at the end of the file: export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME 7. Save the file and exit. Then, run the command: source ~/.bashrc 8. Configure Hadoop by editing the configuration files in the Hadoop installation directory. 9. Format the Hadoop file system: hdfs namenode -format 10. Start the Hadoop daemons: start-all.sh
2. What are the system requirements for installing Hadoop 2.8.x on Ubuntu?
Ans. The system requirements for installing Hadoop 2.8.x on Ubuntu are as follows: - Ubuntu operating system (version 16.04 or higher) - Java Development Kit (JDK) installed (version 7 or higher) - Minimum 2GB RAM (8GB or higher recommended) - At least 2 CPU cores - Sufficient disk space for Hadoop installation and data storage - Strong network connectivity for cluster communication
3. How can I verify the successful installation of Hadoop 2.8.x on Ubuntu?
Ans. You can verify the successful installation of Hadoop 2.8.x on Ubuntu by following these steps: 1. Open a terminal and run the command: hadoop version 2. If the installation is successful, you will see the Hadoop version and other details displayed in the terminal. 3. Additionally, you can check the Hadoop web user interface by opening a web browser and entering the following URL: http://localhost:50070 (for HDFS) and http://localhost:8088 (for YARN). 4. If the web interfaces load without any errors, it indicates that Hadoop is installed correctly.
4. How do I configure Hadoop after installation on Ubuntu?
Ans. To configure Hadoop after installation on Ubuntu, you need to edit the configuration files in the Hadoop installation directory. Here are the steps: 1. Navigate to the Hadoop installation directory: cd /usr/local/hadoop/etc/hadoop 2. Edit the core-site.xml file to specify the Hadoop filesystem and default port settings. 3. Edit the hdfs-site.xml file to configure HDFS-specific settings such as replication factor and block size. 4. Edit the yarn-site.xml file to configure YARN-specific settings such as resource allocation and cluster configuration. 5. Edit the mapred-site.xml file to configure MapReduce-specific settings such as job scheduler and task tracker. 6. Save the changes and exit the text editor. 7. Restart the Hadoop daemons for the changes to take effect.
5. How can I start and stop Hadoop services on Ubuntu?
Ans. To start and stop Hadoop services on Ubuntu, you can use the following commands: - Start all Hadoop services: start-all.sh - Stop all Hadoop services: stop-all.sh - Start HDFS services: start-dfs.sh - Stop HDFS services: stop-dfs.sh - Start YARN services: start-yarn.sh - Stop YARN services: stop-yarn.sh - Start MapReduce services: start-mapred.sh - Stop MapReduce services: stop-mapred.sh Note that these commands should be executed from the Hadoop installation directory.
Explore Courses for Software Development exam

Top Courses for Software Development

Signup for Free!
Signup to see your scores go up within 7 days! Learn & Practice with 1000+ FREE Notes, Videos & Tests.
10M+ students study on EduRev
Related Searches

MCQs

,

past year papers

,

Install Hadoop 2.8.x on Ubuntu | Hadoop Installation Steps | Hadoop Tutorials: Brief Introduction - Software Development

,

Sample Paper

,

Objective type Questions

,

Install Hadoop 2.8.x on Ubuntu | Hadoop Installation Steps | Hadoop Tutorials: Brief Introduction - Software Development

,

Summary

,

Exam

,

ppt

,

Extra Questions

,

Important questions

,

Viva Questions

,

practice quizzes

,

pdf

,

Install Hadoop 2.8.x on Ubuntu | Hadoop Installation Steps | Hadoop Tutorials: Brief Introduction - Software Development

,

shortcuts and tricks

,

study material

,

video lectures

,

Previous Year Questions with Solutions

,

mock tests for examination

,

Free

,

Semester Notes

;