Map reduce

Nick122
Mapreduce.pdf

Chapter 4 – Homework - File Processing with Map Reduce Use either Java or Python to perform the following practice exercises. The data files are organized by chapter and all of them have been uploaded to Canvas for CS 480/591.

Each exercise involves the following tasks:

1. Modify existing or write new Map Reduce code. 2. If using Java, compile the program as a JAR file. 3. Upload the input file(s) into the Hadoop cluster. 4. Execute the program and display the results.

The deliverables for each task are:

1. Modified Java or Python code 2. Linux and HDFS commands to execute the project 3. Output results in a text file or as otherwise indicated in the exercise

For Exercises 4-1 through 4-3, download the NYSE dataset NYSE.csv using the link above. This is a comma-separated values (CSV) file. Open the file using a text editor like notepad and examine its file format and structure before sending it to the cluster. Exercise 4-1

Write a Map Reduce program to find out the maximum closing price (stock_price_close) for each stock.

Hint: A similar solution is provided in the chapter; however, you may need to slightly adjust the mapper to read the values from the input lines properly.

Exercise 4-2

Modify the Map Reduce Java code to find the highest price for each stock. Consider only records that have a volume greater than 250,000.

Exercise 4-3

Modify the Map Reduce Java code to find:

1. Total volume traded for each stock for each month

2. Total volume traded for each stock

These two tasks should be included in a single Map Reduce program.

Hint: Map emits two (key, value) pairs for each record.

For Exercise 4-4 and 4-5, download the U.S. Patent Dataset named US_Patent.csv from relevant link posted above on this assignment page. This is a comma-separated values (CSV) file. Open the file using a text editor like notepad and examine its file format and structure before sending it to the cluster. Exercise 4-4

Write a Map Reduce program to find the total number of patents granted each year originated in the United States and abroad. Compile and execute the job in the Hadoop cluster.

Exercise 4-5

For the same task as in Exercise 4-5, add a combiner function to use for doing local aggregation before the reducer. Compile and execute the job in the Hadoop cluster.

  • Chapter 4 – Homework - File Processing with Map Reduce
  • For Exercises 4-1 through 4-3, download the NYSE dataset NYSE.csv using the link above.
  • This is a comma-separated values (CSV) file.
  • Open the file using a text editor like notepad and examine its file format and structure before sending it to the cluster.
  • For Exercise 4-4 and 4-5, download the U.S. Patent Dataset named US_Patent.csv from relevant link posted above on this assignment page. This is a comma-separated values (CSV) file.
  • Open the file using a text editor like notepad and examine its file format and structure before sending it to the cluster.