Lab- Big Data
Big Data Hadoop Ecosystems
Lab #1 Setup and General Notes
Dr. Gasan Elkhodari
Lab #1 – General Note
This Lab uses the a Virtual Machine running the CentOS Linux distribution. This VM has CDH
(Cloudera’s Distribution, including Apache Hadoop) installed in Pseudo-Distributed mode. Pseudo-
Distributed mode is a method of running Hadoop whereby all Hadoop daemons run on the same
machine. It is, essentially, a cluster consisting of a single machine. It works just like a larger Hadoop
cluster, the only difference (apart from speed, of course!) being that the block replication factor is
set to 1, since there is only a single Data Node available.
Lab#1 – HDFS Setup
Enable services and set up any data required for the course. You must run this script before starting the Lab.
$ $DEV1/scripts/training_setup_dev1.sh
Lab#1 HDFS Setup - Continue
Lab#1 – Access HDFS with Command Line
• Assignment
1) Move the data folder “KB” that is under the location “/home/training/training_materials/data” into the Hadoop file system /loudacre
Hints: • Use ‘hdfs dfs -mkdir’ command to create a new directory ‘loudacre’ • Use ‘hdfs dfs –put’ command line to move the data from the local Linux file
system into HDFS file system