Working with Hadoop 2.7.3

Assalamualaykum Wr Br..:)

Today we discuss about stuff in hadoop 2.7.3

  1. To Know the version of hadoop installed in your system then user the below command  $ hadoop versionhadoopversion
  2. Start the Hadoop using command $ start-all.sh
  3. Know the status of Java virtual Machine process status from command $ JPS
  4. Create a text file in home directory with command as gedit aejaaz.txt and write some data in it and save then close it.
  5. Now moving this file in hdfs using command $ hdfs dfs -put /aejaaz.txt /inputjps
  6. The Above image shows result of all commands stated above.
  7. Find the count of all words in files using hadoop predefined wordcount mapreduce program. Note: at first type hadoop then type jar then path of   wordcount
  8. Now lets start exploring it in web browser from url http://localhost:50070namenodeThats it for today..!!!  Jazakallah khair

Installing Hadoop Accumulo

Assalamualaykum..:)
Today we discuss about installing Apache Accumulo on Ubuntu machine.

Download the package from the following link:
$ wget http://www-eu.apache.org/dist/accumulo/1.8.0/accumulo-1.8.0-bin.tar.gz

Extract the package and add the home of $ACCUMULO_HOME in bashrc file as shown:
$ tar zxf accumulo-1.8.0-bin$ gedit ~/.bashrc

Add following lines in bashrc file:
#$ACCUMULO_HOME
export $ACCUMULO_HOME=<Installation location>
export PATH=$PATH:ACCUMULO_HOME/bin
#ACCUMULO_HOME_END

Verify the bashrc file from source ~/.bashrc

Thats it for today..:)

Jazakallah khair

Installing Hadoop HBase On Ubuntu

Assalamualaykum Wr Br..:)

Today we discuss about the installation steps of Hbase on Ubuntu OS.

Download the hbase software from the following link and using the command:
$ wget http://www-eu.apache.org/dist/hbase/0.98.23/hbase-0.98.23-hadoop1-bin.tar.gz

Extract the Package using the command:
$ tar xvf hbase-0.98.23.tar.gz

Edit the bashrc file by adding the HBASE_HOME parameters in it as shown below:
$ gedit ~/.bashrc#HBASE_HOME
export HBASE_HOME=<LOCATION OF INSTALLTION>
export PATH=$PATH:$HBASE_HOME/bin
#HBASE_HOME_END

verify the bashrc file using command
$ source ~/.bashrc

Thats it for today..:)

Happy Learning..Jazakallah khair

Installing Hadoop ZooKeeper

Assalamualaykum Wr Br..:)

Today we discuss about the required steps to install Zookeeper on Ubuntu.

  1. Open the terminal using Ctrl + Shift + T
  2. Download the Zoo Keeper package using wget command from the link:                             $ wget http://www-eu.apache.org/dist/zookeeper/zookeeper-3.4.8/   
  3. Unzip the downloaded package using command: $ tar -xvf zookeeper-3.4.8.tar.gz
  4. Open the bashrc file and update the file with zookeeper home location using the command $ gedit ~/.bashrc   add the following lines
  5. #zookeeper_home
    export ZOOKEEPER_HOME = /home/<system_name>/zookeeper-3.4.8

    export PATH=$PATH:$ZOOKEEPER_HOME/bin
    #zookeeper_home end
  6. To verify correctness of bashrc file, run the bashrc file using command
    $ source ~/.bashrc
  7. Finally close the terminal to refresh all the updates to affect then open new terminal and execute the following command:
    $  zkServer.sh start
    which displays the information as follows:
    Zookeeper JMX enabled by default..
    ……
    Starting Zookeeper….STARTED

Thats it..Thank for the day..:) enjoy Technology.

Jazakallah Khair.

Installing Hadoop on Ubuntu

Assalamualaykum Wr Br..:)

Today we discuss about Installation of Hadoop on Ubuntu 14.4. The very first requirement of Hadoop Installation is availability of Java and SSH in your OS.

1.  To do Quick installation of java just type the below command.

$ sudo apt-get update

$ sudo apt-get install default-jdk

To Check successful install of java on machine, just type the command.

$ java -version

The above command gives you the version of java installed on system.

2. Create and Setup SSH Certificates

Hadoop uses SSH (to access its nodes) which would normally require the user to enter a password. However, this requirement can be eliminated by creating and setting up SSH certificates using the following commands:

ssh-keygen -t rsa -P ''
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

After executing the first of these two commands, you might be asked for a filename. Just leave it blank and press the enter key to continue. The second command adds the newly created key to the list of authorized keys so that Hadoop can use SSH without prompting for a password.

2

3. Fetch and Install Hadoop

First let’s fetch Hadoop from one of the mirrors using the following command:

wget http://www.motorlogy.com/apache/hadoop/common/current/hadoop-2.3.0.tar.gz

After downloading the Hadoop package, execute the following command to extract it:

tar xfz hadoop-2.3.0.tar.gz

This command will extract all the files in this package in a directory named hadoop-2.3.0.

4. Edit and Setup Configuration Files

To complete the setup of Hadoop, the following files will have to be modified:

  • ~/.bashrc
  • /usr/local/hadoop/etc/hadoop/hadoop-env.sh
  • /usr/local/hadoop/etc/hadoop/core-site.xml
  • /usr/local/hadoop/etc/hadoop/yarn-site.xml
  • /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
  • /usr/local/hadoop/etc/hadoop/hdfs-site.xml

i. Editing ~/.bashrc

Before editing the .bashrc file in your home directory, we need to find the path where Java has been installed to set the JAVA_HOME environment variable. Let’s use the following command to do that:

update-alternatives --config java

This will display something like the following:

3

The value for JAVA_HOME is everything before /jre/bin/java in the above path – in this case, /usr/lib/jvm/java-7-openjdk-amd64. Make a note of this as we’ll be using this value in this step and in one other step.

Now use nano (or your favored editor) to edit ~/.bashrc using the following command:

nano ~/.bashrc

This will open the .bashrc file in a text editor. Go to the end of the file and paste/type the following content in it:

#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP VARIABLES END

Note 1: If the value of JAVA_HOME is different on your VPS, make sure to alter the first export statement in the above content accordingly.

Note 2: Files opened and edited using nano can be saved using Ctrl + X. Upon the prompt to save changes, type Y. If you are asked for a filename, just press the enter key.

The end of the .bashrc file should look something like this:

.bashrc contents

After saving and closing the .bashrc file, execute the following command so that your system recognizes the newly created environment variables:

source ~/.bashrc

Putting the above content in the .bashrc file ensures that these variables are always available when your VPS starts up.

ii. Editing /usr/local/hadoop/etc/hadoop/hadoop-env.sh

Open the /usr/local/hadoop/etc/hadoop/hadoop-env.sh file with nano using the following command:

nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

In this file, locate the line that exports the JAVA_HOME variable. Change this line to the following:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Note: If the value of JAVA_HOME is different on your VPS, make sure to alter this line accordingly.

The hadoop-env.sh file should look something like this:

hadoop-env.sh contents

Save and close this file. Adding the above statement in the hadoop-env.sh file ensures that the value of JAVA_HOME variable will be available to Hadoop whenever it is started up.

iii. Editing /usr/local/hadoop/etc/hadoop/core-site.xml

The /usr/local/hadoop/etc/hadoop/core-site.xml file contains configuration properties that Hadoop uses when starting up. This file can be used to override the default settings that Hadoop starts with.

Open this file with nano using the following command:

nano /usr/local/hadoop/etc/hadoop/core-site.xml

In this file, enter the following content in between the <configuration></configuration> tag:

<property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
</property>

The core-site.xml file should look something like this:

core-site.xml contents

Save and close this file.

iv. Editing /usr/local/hadoop/etc/hadoop/yarn-site.xml

The /usr/local/hadoop/etc/hadoop/yarn-site.xml file contains configuration properties that MapReduce uses when starting up. This file can be used to override the default settings that MapReduce starts with.

Open this file with nano using the following command:

nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

In this file, enter the following content in between the <configuration></configuration> tag:

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

The yarn-site.xml file should look something like this:

yarn-site.xml contents

Save and close this file.

v. Creating and Editing /usr/local/hadoop/etc/hadoop/mapred-site.xml

By default, the /usr/local/hadoop/etc/hadoop/ folder contains the /usr/local/hadoop/etc/hadoop/mapred-site.xml.template file which has to be renamed/copied with the name mapred-site.xml. This file is used to specify which framework is being used for MapReduce.

This can be done using the following command:

cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

Once this is done, open the newly created file with nano using the following command:

nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

In this file, enter the following content in between the <configuration></configuration> tag:

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>

The mapred-site.xml file should look something like this:

mapred-site.xml contents

Save and close this file.

vi. Editing /usr/local/hadoop/etc/hadoop/hdfs-site.xml

The /usr/local/hadoop/etc/hadoop/hdfs-site.xml has to be configured for each host in the cluster that is being used. It is used to specify the directories which will be used as the namenode and the datanode on that host.

Before editing this file, we need to create two directories which will contain the namenode and the datanode for this Hadoop installation. This can be done using the following commands:

mkdir -p /usr/local/hadoop_store/hdfs/namenode
mkdir -p /usr/local/hadoop_store/hdfs/datanode

Note: You can create these directories in different locations, but make sure to modify the contents of hdfs-site.xml accordingly.

Once this is done, open the /usr/local/hadoop/etc/hadoop/hdfs-site.xml file with nano using the following command:

nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

In this file, enter the following content in between the <configuration></configuration> tag:

<property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
 </property>

The hdfs-site.xml file should look something like this:

hdfs-site.xml contents

Save and close this file.

Format the New Hadoop Filesystem

After completing all the configuration outlined in the above steps, the Hadoop filesystem needs to be formatted so that it can start being used. This is done by executing the following command:

$ hadoop namenode -format

Note: This only needs to be done once before you start using Hadoop. If this command is executed again after Hadoop has been used, it’ll destroy all the data on the Hadoop file system.

 Start Hadoop

All that remains to be done is starting the newly installed single node cluster:

start-dfs.sh

While executing this command, you’ll be prompted twice with a message similar to the following:

Are you sure you want to continue connecting (yes/no)?

Type in yes for both these prompts and press the enter key. Once this is done, execute the following command:

start-yarn.sh

Executing

the above two commands will get Hadoop up and running. You can verify this by typing in the following command:

jps

Executing this command should show you something similar to the following:

jps command

If you can see a result similar to the depicted in the screenshot above, it means that you now have a functional instance of Hadoop running on your VPS.

 

Alhamdulliah…Thats it..Enjoy Hadoop…:)


Getting Started PIG 1

Assalamualaykum wr br..:)

In this Post we are discussing the basics of Hadoop PIG. It is a language which is used to analyze the data in hadoop. It is also know as PIG LATIN. It is high level data processing language which possess rich data types and operators to perform various operations on Data in Hadoop.

To analyze data in hadoop we need to use PIG scripts and that should be executed in grunt

shell.Internally Apache converts these pig scripts into a series of mapreduce jobs and thus making programmers job easy. Architecture of PIG can be illustrated as below:

apache_pig_architecture-jpg

As we see there are various components involved in Apache PIG. Let us brief them.

 

Parser

It checks the syntax and semantics of scripts. Also involve in type checking and other miscellaneous checks. The output of the parser will be a DAG (directed acyclic graph), which represents the Pig Latin statements and logical operators.

In the DAG, the logical operators of the script are represented as the nodes and the data flows are represented as edges.

Optimizer

The logical plan (DAG) is passed to the logical optimizer, which carries out the logical optimizations such as projection and pushdown.

Compiler

The compiler compiles the optimized logical plan into a series of MapReduce jobs.

Execution engine

Finally the MapReduce jobs are submitted to Hadoop in a sorted order. Finally, these MapReduce jobs are executed on Hadoop producing the desired results.

Pig Latin Data Model

The data model of Pig Latin is fully nested and it allows complex non-atomic datatypes such as map and tuple. Given below is the diagrammatical representation of Pig Latin’s data model.

Data Model

Atom

Any single value in Pig Latin, irrespective of their data, type is known as an Atom. It is stored as string and can be used as string and number. int, long, float, double, chararray, and bytearray are the atomic values of Pig. A piece of data or a simple atomic value is known as a field.

Example − ‘Aejaaz’ or ‘27’

Tuple

A record that is formed by an ordered set of fields is known as a tuple, the fields can be of any type. A tuple is similar to a row in a table of RDBMS.

Example − (Aejaaz,27)

Bag

A bag is an unordered set of tuples. In other words, a collection of tuples (non-unique) is known as a bag. Each tuple can have any number of fields (flexible schema). A bag is represented by ‘{}’. It is similar to a table in RDBMS, but unlike a table in RDBMS, it is not necessary that every tuple contain the same number of fields or that the fields in the same position (column) have the same type.

Example − {(Aejaaz,27), (Mohammad, 45)}

A bag can be a field in a relation; in that context, it is known as inner bag.

Example − {Aejaaz,27, {008022008, aaejaaz@gmail.com,}}

Map

A map (or data map) is a set of key-value pairs. The key needs to be of type chararray and should be unique. The value might be of any type. It is represented by ‘[]’

Example − [name#aejaaz, age#27]

Relation

A relation is a bag of tuples. The relations in Pig Latin are unordered (there is no guarantee that tuples are processed in any particular order).

Thats all for the Day..

Jazakallah khair..:)  Alhamdulliah.

Installing Hadoop PIG

Assalamualaykum Wr Br..:)

Today we discuss about the installation of PIG on Ubuntu environment.

  1. Download the PIG from the following link either from web browser or from terminal
  2. To download from terminal use wget http://mirror2.shellbot.com/apache/pig/pig-0.15.0/pig-0.15.0-src.tar.gz
  3. After download extract the files and place it on home environment of hadoop.
  4. Open bashrc file from the command gedit ~/.bashrc then add the pig home variables as#pig_home

export PIG_HOME=/home/aejaaz/pig-0.16.0

export PATH=$PATH:$PIG_HOME/bin

#end

5. Run the bashrc file to verify the just add variables are not conflicting, to do it, just type the command source /.bashrc

6. Now type Pig on your terminal then grunt shell will be opened which makes sure PIG is installed properly on system. To quit from grunt shell just type command quit.

Thats all for the Day…

Jazakallah khair.