This guide will help you to install a single node Apache Hadoop cluster on your machine.

System Requirements

  • Ubuntu 16.04
  • Java 8 Installed

1. Download Hadoop

wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.0/hadoop-2.7.0.tar.gz

2. Prepare for Installation

tar xfz hadoop-2.7.0.tar.gz
sudo mv hadoop-2.7.0 /usr/local/hadoop

3. Create Dedicated Group and User

sudo addgroup hadoop
sudo adduser --ingroup hadoop hduser
sudo adduser hduser sudo

4. Switch to Newly Created User Account

su -hduser

5. Add Variables to ~/.bashrc

#Begin Hadoop Variables
 
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
 
#End Hadoop Variables

6. Source ~/.bashrc

source ~/.bashrc

7. Set Java Home for Hadoop

  • Open the file : /usr/local/hadoop/etc/hadoop/hadoop-env.sh
  • Find and edit the line as :
    export JAVA_HOME=/usr/lib/jvm/java-8-oracle

8. Edit core-site.xml

  • Open the file: /usr/local/hadoop/etc/hadoop/core-site.xml
  • Add following lines between <configuration> … </configuration> tags.

<property>
 <name>fs.default.name</name>
 <value>hdfs://localhost:9000</value>
</property>

9. Edit yarn-site.xml

  • Open the file: /usr/local/hadoop/etc/hadoop/yarn-site.xml
  • Add following lines between <configuration> … </configuration> tags.

<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>
<property>
 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

10. Edit mapred-site.xml

  • Copy the mapred-site.xml template first using:
    cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapredsite.xml
  • Open the file: /usr/local/hadoop/etc/hadoop/mapred-site.xml
  • Add following lines between <configuration> … </configuration> tags.

<property>
 <name>fs.default.name</name>
 <value>hdfs://localhost:9000</value>
</property>

11. Edit hdfs-site.xml

First, we create following directories:

sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
sudo chown hduser:hadoop -R /usr/local/hadoop_store
sudo chmod 777 -R /usr/local/hadoop_store
Now open /usr/local/hadoop/etc/hadoop/hdfs-site.xml and enter the following content in between the tag <configuration></configuration>

<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>
<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
</property>

12. Format NameNode

cd /usr/local/hadoop/
bin/hdfs namenode -format

13. Start Hadoop Daemons

cd /usr/local/hadoop/
sbin/start-dfs.sh
sbin/start-yarn.sh

14. Check Service Status

jps

15. Check Running Jobs

Type in browser’s address bar:

http://localhost:8088

Done!

Want to Learn More? Signup in a Click.

Comments

avatar
  Subscribe  
Notify of