Lab#-01 Assignment

profileManojMartha
Labassignment01.pdf

Big Data Hadoop Ecosystems

Lab #1 Setup and General Notes

Dr. Gasan Elkhodari

Installing Hadoop VM on your laptop ( Windows users) Hardware Requirements: 64 bit OS, Windows Laptop with SSD, with 50 Gb of free space and at least 8GB of memory. Hadoop/Linux sandbox requires at least 8 Gb of memory to run correctly.

Windows 10 1. Download the VM Sandbox image (the executable file) from:

https://ucumberlands.box.com/s/kk6a8mcqvupq6durdnxy7dc85squoji9

2. Download the VMware station player (Free license for individual use) from: https://www.vmware.com/products/workstation-player/workstation-player-evaluation.html

3. VMware station player installation: Play the below video and follow the configuration instructions. Don’t download the products mentioned in the video. The focus is on the configuration steps for VMware station.

https://www.youtube.com/watch?v=4XBXJpYPkUk

4. Start the VM

Installing Hadoop VM on your laptop ( Mac users)

1. Download the VM Sandbox image (the executable file) from: https://ucumberlands.box.com/s/kk6a8mcqvupq6durdnxy7dc85squoji9

2. Download the Virtualbox from: https://www.virtualbox.org/wiki/Downloads

3. Install Virtualbox. Play the below video and follow the configuration instructions. Don’t download the products mentioned at the video. The focus is on the configuration steps for VMware station. https://www.youtube.com/watch?v=BeCtjd86YXo

4. Start the VM

Lab #1 – General Note

This Lab uses the a Virtual Machine running the CentOS Linux distribution. This VM has CDH

(Cloudera’s Distribution, including Apache Hadoop) installed in Pseudo-Distributed mode. Pseudo-

Distributed mode is a method of running Hadoop whereby all Hadoop daemons run on the same

machine. It is, essentially, a cluster consisting of a single machine. It works just like a larger Hadoop

cluster, the only difference (apart from speed, of course!) being that the block replication factor is

set to 1, since there is only a single Data Node available.

Lab#1 – HDFS Setup

Enable services and set up any data required for the course. You must run this script before starting the Lab.

$ $DEV1/scripts/training_setup_dev1.sh

Lab#1 HDFS Setup - Continue

Lab#1 – Access HDFS with Command Line

• Assignment

1) Move the data folder “KB” that is under the location “/home/training/training_materials/data” to the Hadoop file system /loudacre

Hints:

• Use ‘hdfs dfs -mkdir’ command to create a new directory ‘/loudacre’ in the HDFS file system

• Use ‘hdfs dfs –put’ command line to move the data from the local Linux file system into HDFS

file system

• Use ‘hdfs dfs –cat’ to view the data you just moved into HDFS

• Output View one the files you just moved by ’hdfs dfs –cat’, take screenshot and upload it in the designated assignment folder.

Example:

Lab#1 – Access HDFS with Command Line