week6
Big Data – Hadoop Ecosystems Lab #5
Big Data – Hadoop Ecosystems Lab #5
Narender Reddy Kudumula
University of Cumberlands
Data Science & Big Data Analysis (ITS-836)
Prof. Dr. Gasan Elkhodari
10/05/2019
Big Data – Hadoop Ecosystems
Import the accounts table into HDFS file system:
1) Import account:
$ sqoop import \
--connect jdbc:mysql://localhost/loudacre \
--username training --password training \
--table accounts \
--target-dir /loudacre/accounts \
--null-non-string '\\N'
2) List the contents of the accounts directory:
$ hdfsdfs -ls /loudacre/accounts
3) Import incremental updates to accounts
As Loudacre adds new accounts in MySQL accounts table, the account data in HDFS must be updated as accounts are created. You can use Sqoop to append these new records.
Run the add_new_accounts.py script to add the latest accounts to MySQL.
$ DEV1/exercises/sqoop/add_new_accounts.py
Incrementally import and append the newly added accounts to the accounts
directory. Use Sqoop to import on the last value on the acct_num column
largest account ID:
$ sqoop import \
--connect jdbc:mysql://localhost/loudacre \
--username training --password training \
--incremental append \
--null-non-string '\\N' \
--table accounts \
--target-dir /loudacre/accounts \
--check-column acct_num \
--last-value <largest_acct_num>
4) You should see three new files. Use Hadoop’s cat command to view the entirecontents of these files.
hdfsdfs -cat /loudacre/accounts/part-m-0000[456]