1. Objective
In this tutorial on Installation of Hadoop 3.x on Ubuntu, we are going to learn steps for setting up a pseudo-distributed, single-node Hadoop 3.x cluster on Ubuntu. We will learn steps like how to install java, how to install SSH and configure passwordless SSH, how to download Hadoop, how to setup Hadoop configurations like .bashrc file, hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, YARN-site.xml, how to start the Hadoop cluster and how to stop the Hadoop services.
Learn step by step installation of Hadoop 2.7.x on Ubuntu.
2. Installation of Hadoop 3.x on Ubuntu
Before we start with Hadoop 3.x installation on Ubuntu, let us understand key features that have been added in Hadoop 3 that makes the comparison between Hadoop 2 and Hadoop 3.
2.1. Java 8 installation
Hadoop requires working java installation. Let us start with steps for installing java 8:
a. Install Python Software Properties
sudo apt-get install python-software-properties
b. Add Repository
sudo add-apt-repository ppa:webupd8team/java
c. Update the source list
sudo apt-get update
d. Install Java 8
sudo apt-get install oracle-java8-installer
e. Check if java is correctly installed
java -version
2.2. Configure SSH
SSH is used for remote login. SSH is required in Hadoop to manage its nodes, i.e. remote machines and local machine if you want to use Hadoop on it. Let us now see SSH installation of Hadoop 3.x on Ubuntu:
a. Installation of passwordless SSH
b. Generate Key Pairs
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
c. Configure passwordless ssh
cat ~/.ssh/id_rsa.pub>>~/.ssh/authorized_keys
e. Change the permission of file that contains the key
chmod 0600 ~/.ssh/authorized_keys
f. check ssh to the localhost
ssh localhost
2.3. Install Hadoop
a. Download Hadoop
http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-3.0.0-alpha2/hadoop-3.0.0-alpha2.tar.gz
(Download the latest version of Hadoop hadoop-3.0.0-alpha2.tar.gz)
b. Untar Tarball
tar -xzf hadoop-3.0.0-alpha2.tar.gz
2.4. Hadoop Setup Configuration
a. Edit .Bashrc
Open .bashrc
nano ~/.bashrc
Edit .bashrc:
Edit .bashrc file is located in user’s home directory and adds following parameters:
Then run
Source ~/.bashrc
b. Edit hadoop-env.sh
Edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
c. Edit core-site.xml
Edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
d. Edit hdfs-site.xml
Edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
e. Edit mapred-site.xml
If mapred-site.xml file is not available, then use
cp mapred-site.xml.template mapred-site.xml
Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
f. Yarn-site.xml
Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
Test your Hadoop knowledge with this Big data Hadoop quiz.
2.5. How to Start the Hadoop services
Let us now see how to start the Hadoop cluster:
The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster”. This is done as follows:
a. Format the namenode
bin/hdfs namenode -format
NOTE: This activity should be done once when you install Hadoop and not for running Hadoop filesystem, else it will delete all your data from HDFS
b. Start HDFS Services
sbin/start-dfs.sh
It will give an error at the time of start HDFS services then use:
echo "ssh" | sudo tee /etc/pdsh/rcmd_default
c. Start YARN Services
sbin/start-yarn.sh
d. Check how many daemons are running
Let us now see whether expected Hadoop processes are running or not:
2.6. How to Stop the Hadoop services
Let us learn how to stop Hadoop services now:
a. Stop YARN services
sbin/stop-yarn.sh
b. Stop HDFS services
sbin/stop-dfs.sh
Note:
Browse the web interface for the NameNode; by default, it is available at:
NameNode – http://localhost:9870/
Browse the web interface for the ResourceManager; by default, it is available at:
ResourceManager – http://localhost:8088/
1 videos|14 docs
|
1. What is Hadoop and why is it used? |
2. How do I install Hadoop 3.x on Ubuntu for a single node cluster? |
3. What are the system requirements for installing Hadoop 3.x on Ubuntu? |
4. Can Hadoop be used for a multi-node cluster on Ubuntu? |
5. How does Hadoop ensure fault-tolerance in a cluster? |
|
Explore Courses for Software Development exam
|