IT & Software  >  Hadoop Tutorials: Brief Introduction  >  Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5

Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software

Document Description: Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 for IT & Software 2022 is part of Hadoop Tutorials: Brief Introduction preparation. The notes and questions for Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 have been prepared according to the IT & Software exam syllabus. Information about Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 covers topics like and Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 Example, for IT & Software 2022 Exam. Find important definitions, questions, notes, meanings, examples, exercises and tests below for Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5.

Introduction of Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 in English is available as part of our Hadoop Tutorials: Brief Introduction for IT & Software & Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 in Hindi for Hadoop Tutorials: Brief Introduction course. Download more important topics related with notes, lectures and mock test series for IT & Software Exam by signing up for free. IT & Software: Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software
1 Crore+ students have signed up on EduRev. Have you?

1. Hadoop 2 Installation Tutorial: Objective

This Hadoop 2 Installation tutorial describes how to install and configure Hadoop cluster on a single-node on Ubuntu OS. Single Node Hadoop cluster is also called as “Hadoop Pseudo-Distributed Mode”. The Hadoop 2 installation is explained here very simply and to the point, so that you can learn Hadoop CDH5 Installation in 10 Min. Once the you install Hadoop 2 is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.

Looking to BOOST your career in the exciting field of Big Data, Learn Big Data and Hadoop from Experts.



2. Hadoop 2 Installation: Video Tutorial

https://edurev.in/studytube/Easiest-way-to-install--setup-hadoop--Hadoop-tutor/9a1e6494-41a1-4e6a-894d-f380be774c2d_v


3. Install Hadoop 2 on Ubuntu

Follow the steps given below to install and configure Hadoop 2 cluster on ubuntu os-

3.1. Recommended Platform

  • OS – Linux is supported as a development and production platform. You can use Ubuntu 14.04 or later (you can also use other Linux flavors like CentOS, Redhat, etc.)
  • Hadoop – Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)

I. Setup Platform

If you are using Windows/Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.

3.2. Prerequisites

I. Install Java 8 (Recommended Oracle Java)

a. Install Python Software Properties

sudo apt-get install python-software-properties

b. Add Repository

sudo add-apt-repository ppa:webupd8team/java

c. Update the source list

sudo apt-get update

d. Install Java

sudo apt-get install oracle-java8-installer

II. Configure SSH

a. Install Open SSH Server-Client

sudo apt-get install openssh-server openssh-client

b. Generate Key Pairs

ssh-keygen -t rsa -P ""

c. Configure password-less SSH

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

d. Check by SSH to localhost

ssh localhost

3.2. Install Hadoop

I. Download Hadoop 2

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz


II. Untar Tar ball

tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz

Note: All the required jars, scripts, configuration files, etc. are available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2).


III. Hadoop 2 Setup Configuration

a. Edit .bashrc

Now, edit .bashrc file located in user’s home directory and add following parameters:


  1. export HADOOP_PREFIX="/home/hdadmin/hadoop-2.5.0-cdh5.3.2"
  2. export PATH=$PATH:$HADOOP_PREFIX/bin
  3. export PATH=$PATH:$HADOOP_PREFIX/sbin
  4. export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
  5. export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
  6. export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
  7. export YARN_HOME=${HADOOP_PREFIX}


Note: After above step restarts the terminal so that all the environment variables will come into effect.

b. Edit hadoop-env.sh

Now, edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-8-oracle/)

c. Edit core-site.xml

Now, edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:


  1. <configuration>
  2. <property>
  3. <name>fs.defaultFS</name>
  4. <value>hdfs://localhost:9000</value>
  5. </property>
  6. <property>
  7. <name>hadoop.tmp.dir</name>
  8. <value>/home/dataflair/hdata</value>
  9. </property>
  10. </configuration>

Note: /home/hdadmin/hdata is a sample location; please specify a location where you have Read Write privileges

d. Edit hdfs-site.xml

Now, edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration>
  2. <property>
  3. <name>dfs.replication</name>
  4. <value>1</value>
  5. </property>
  6. </configuration>


e. Edit mapred-site.xml

Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration>
  2. <property>
  3. <name>mapreduce.framework.name</name>
  4. <value>yarn</value>
  5. </property>
  6. </configuration>


f. Edit yarn-site.xml

Now, edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

  1. <configuration>
  2. <property>
  3. <name>yarn.nodemanager.aux-services</name>
  4. <value>mapreduce_shuffle</value>
  5. </property>
  6. <property>
  7. <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  8. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  9. </property>
  10. </configuration>


3.4. Start the Cluster

I. Format the name node

bin/hdfs namenode -format

NOTE: This activity should be done once when you install Hadoop, else It will delete all your data from HDFS.


II. Start HDFS Services

sbin/start-dfs.sh


III. Start YARN Services

sbin/start-yarn.sh

Follow this link to learn What is YARN?


IV. Check whether services have been started

  1. jps
  2. NameNode
  3. DataNode
  4. ResourceManager
  5. NodeManager


3.5. Run Map-Reduce Jobs

I. Run word count example


  1.  bin/hdfs dfs -mkdir /inputwords
  2. bin/hdfs dfs -put <data-file> /inputwords
  3. bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0-cdh5.3.2.jar wordcount /inputwords /outputwords
  4. bin/hdfs dfs -cat /outputwords/*


Follow HDFS command Guide to Play with HDFS Commands and perform various operations,


3.6. Stop The Cluster

I. Stop HDFS Services

sbin/stop-dfs.sh

II. Stop YARN Services

sbin/stop-yarn.sh

The document Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software is a part of the IT & Software Course Hadoop Tutorials: Brief Introduction.
All you need of IT & Software at this link: IT & Software

Related Searches

study material

,

Objective type Questions

,

pdf

,

mock tests for examination

,

Previous Year Questions with Solutions

,

Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software

,

shortcuts and tricks

,

Summary

,

practice quizzes

,

Sample Paper

,

video lectures

,

Free

,

Semester Notes

,

Important questions

,

Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software

,

MCQs

,

Hadoop 2 Installation on Ubuntu – Setup of Hadoop CDH5 Notes | Study Hadoop Tutorials: Brief Introduction - IT & Software

,

Viva Questions

,

ppt

,

past year papers

,

Exam

,

Extra Questions

;