ISM 550 & 580 TASK 2

khairul30
ISM580-Task2.docx

ISM580 – Task 2

Instructions:

Task 2 show how to create a file, enter some data, and then use Wordcount one of the program function from MapReduce module to demonstrate Data analysis process by MapReduce. Task 2 include following steps:

1. Open a terminal and type the following commands:

$ mkdir yourname (this command will make a directory with your name)

$ cd yourname (this command will change directory)

$ ls –ltr (this command will list the files in the directory)

$ nano myfile (this command will open a text editor with a name of myfile)

2. Now type some text into the text file. The more text you put, the better, and more interesting the results will be. Try to add some sentences about big data and what you like to learn about big data, or how you would like to use big data in your career.

3. Once complete with the text file, press CTRL + X and press Y to save changes to the file. Press enter to confirm the file name.

4. Now type the following commands:

$ hadoop fs –ls (this command will return the contents of the HDFS)

$ hadoop fs –mkdir input (this command will create a folder in the HDFS with name input)

$ hadoop fs –ls

$ hadoop fs –put myfile input (this command will upload myfile to the HDFS)

$ hadoop fs –ls input (this command will return the contents of the input folder)

$ ls –ltr /usr/lib/hadoop-mapreduce/ (this command will list the libraries in MapReduce, check whether hadoop-mapreduce-examples-2.0.0-cdh5.4.0.jar exists)

$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar (this command will show valid subroutines which are listed in the Java archive)

$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input/myfile output (this command will tell Hadoop to execute a Java Archive file subroutine called “wordcount” against the file myfile and output the contents in a folder called output)

5. $ hadoop fs –ls output (this command will reveal the 3 files which the process has output)

$ hadoop fs –get output/part-r-00000 (this command will pull the part-r-00000 file from the HDFS)

$ cat part-r-00000

6. Observe the results in the file printed on the screen, and discuss what delimitation is.

Task:

Write a report (4-6 pages) includes:

· Following APA standards cover page and table of content,

· Short research report on two main components of Hadoop platform: HDFS and MapReduce.

· Create a file and loading data in the file; include a document on your understanding of the process and purpose, along with supporting screen shots.

· Use MapReduce and Run the WordCount program and generate the result and include a description about delimitation and define the purpose of delimitation as it is used in the wordcount example, along with supporting screen shots.

Rubrics:

2.1. Short research report on two main components of Hadoop platform: HDFS and MapReduce – Following APA Standards and include cover page.

Weight: 20%

2.2. Create a file and loading data in the file; include a document on your understanding of the process and purpose, along with supporting screen shots (Follow the instructions). Weight: 20%

2.3. Use MapReduce and Run the WordCount program and generate the result and include a description about delimitation and define the purpose of delimitation as it is used in the wordcount example, along with supporting screen shots.

(Follow the instructions). Weight:20%

2.3. Clarity, writing mechanics, and formatting requirements (follow APA standards)

Weight: 10%