Lab- Big Data

profileAnonymus
Lab01-BigDataEcosystem.pdf

Big Data Hadoop Ecosystems

Lab #1 Setup and General Notes

Dr. Gasan Elkhodari

Lab #1 – General Note

This Lab uses the a Virtual Machine running the CentOS Linux distribution. This VM has CDH

(Cloudera’s Distribution, including Apache Hadoop) installed in Pseudo-Distributed mode. Pseudo-

Distributed mode is a method of running Hadoop whereby all Hadoop daemons run on the same

machine. It is, essentially, a cluster consisting of a single machine. It works just like a larger Hadoop

cluster, the only difference (apart from speed, of course!) being that the block replication factor is

set to 1, since there is only a single Data Node available.

Lab#1 – HDFS Setup

Enable services and set up any data required for the course. You must run this script before starting the Lab.

$ $DEV1/scripts/training_setup_dev1.sh

Lab#1 HDFS Setup - Continue

Lab#1 – Access HDFS with Command Line

• Assignment

1) Move the data folder “KB” that is under the location “/home/training/training_materials/data” into the Hadoop file system /loudacre

Hints: • Use ‘hdfs dfs -mkdir’ command to create a new directory ‘loudacre’ • Use ‘hdfs dfs –put’ command line to move the data from the local Linux file

system into HDFS file system