How to Install Hadoop 2.7.1 on CentOs, LinuxMint and Ubuntu


Now Apache Hadoop 2.7.1 is minor release in the 2.x.y release line, it is developed version of 2.7.0. It is a stable release after the Apache Hadoop 2.6.0.This release drops support for the support of JDK6 runtime and works with JDK7+ only. It’s HDFS feature support for files with variable length block, file truncation and  quotas per storage type. It’s MAPREDUCE feature has the ability to speed up FileOutputCommiter for very large jobs with many output files.

Now in this post i have mentioned the step by step process to install the Apache Hadoop 2.7.1 in CentOs, LinuxMint, and Ubuntu.

 

Step 1:             First Install Java

If you wish to install Hadoop 2.7.1 then you should have Java in your system. Check the availability of the Java in your system using the following command.

# java -version 

java version "1.8.0_66"

Java(TM) SE Runtime Environment (build 1.8.0_66-b17)

Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

 

 

Step 2:             Create Account for Hadoop

It is recommended to create a normal account for Hadoop working, So first create the account using following command.

 

# adduser hadoop

# passwd hadoop

 

 

After creating the account then type the following command to setup the key based ssh to own the account.

 

# su - hadoop

$ ssh-keygen -t rsa

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$ chmod 0600 ~/.ssh/authorized_keys

 

Now we have to verify key based login. Below command should not ask for password but first time it will prompt for adding RSA to the list of known hosts.

 

$ ssh localhost

$ exit

 

Step 3:             Download Hadoop 2.7.1 Source File

Download hadoop 2.7.1 source archive file using below command. You can also select alternate download mirror for increasing download speed.

$ cd ~

$ wget http://apache.claz.org/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

$ tar xzf hadoop-2.7.1.tar.gz

$ mv hadoop-2.7.1 hadoop

 

Step 4:             Configuring Hadoop Pseudo-Distributed Mode

 

Now we have to setup environment variable, Edit ~/.bashrc file and append following values at end of file.

export HADOOP_HOME=/home/hadoop/hadoop

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Apply the above changes in the current environment.

$ source ~/.bashrc

And now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file. And set JAVA_HOME environment variable. Change the JAVA path as per install on your system. This path may vary as per your operating system version and installation source. So make sure you are using correct path.

Type the following command to edit the path

export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Step 5:                         Setup Hadoop Configuration Files

There are many configuration file in Hadoop, which need to configure as per requirements of our Hadoop infrastructure. Now let us start with the configuration with basic Hadoop single node cluster setup.

Navigate to the below given location

$ cd $HADOOP_HOME/etc/hadoop

And type the following configuration code.

Edit core-site.xml

Type the following code

<configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

Edit hdfs-site.xml
Type the following code

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

 

<property>

<name>dfs.name.dir</name>

<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>

</property>

 

 

 

<property>

<name>dfs.data.dir</name>

<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>

</property>

</configuration>

Edit mapred-site.xml

Type the following code

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

Edit yarn-site.xml

<configuration>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

 

Step 6:                         Format Namenode

Now format the namenode using following command.

$ hdfs namenode -format

Step 7:                         Starting the Hadoop Cluster

2

After formatting the namenode then start the Hadoop Cluster

Move to your Hadoop sbin directory and execute scripts one by one.

$ cd $HADOOP_HOME/sbin/

And now run the start-dfs.sh script.

$ ./start-dfs.sh

after running the dfs script run the yarn script.sh by using the following code.

$ ./start-yarn.sh

Step 8:                        Accessing the Hadoop Services

Access the Hadoop services by the port number 50070 default on the web browser.

For example: http:/svr1.ubuntumag.com:50070/

and after knowing the details about port 50070, Now access port 8088 for getting the information about cluster and all applications by using the following command.

For example: http://svr1.ubuntumag.com:8088/

Access port  50090 for getting details about secondary namenode.

For example: http://svr1.ubuntumag.com: 50090/

And finally access the port 50075 to get details about the data node.

Step 9:                        Testing The Hadoop Single Node Setup

Make the HDFS directories required using following commands.

$ bin/hdfs dfs -mkdir /user

$ bin/hdfs dfs -mkdir /user/hadoop

Now copy all files from local file system /var/log/httpd to hadoop distributed file system using below command.

$ bin/hdfs dfs -put /var/log/apache2 logs

After finding the log files, now copy all files from local file system /var/log/httpd to hadoop distributed file system using below command.

$ bin/hdfs dfs -put /var/log/apache2 logs

Now browse hadoop distributed file system by opening below url in browser. You will see apache2 folder in list. Click on folder name to open and you will find all log files there.

For Example:

 http://svr1.ubuntumag.com:50070/explorer.html#/user/hadoop/logs/

Now copy logs directory for hadoop distributed file system to local file system.

$ bin/hdfs dfs -get logs /tmp/logs

$ ls -l /tmp/logs/

And finally Hadoop 2.7.1 is installed in your system.

 

 

 

 

 

6:52 am