Big Data R Program

profilePip
wk15-Assignment1.pdf

Week 15 Assignment - 100 points

Objective: Using SQL query in R data frame.

Given the Consumer_Complaints.csv, use sqldf( ) to query the database for the following

reports:

1) Query for all the states where “Movement Mortgage” are present – (columns are

Company & State)

2) Query for all the states where “Movement Mortgage” OR “Hewitt-Capital” are present

– (columns are Company & State). Sort the by Company.

3) Query for the Product of “Consumer Loan” and the various means of Submitting the

complain relating to Consumer Loan (Submission via). Count each mean of submission

(Submission via). (Columns are Product, ‘Submitted via’, ‘Count Submitted Via’

4) Query for company called “ACS Education Services” and its various Company

Response. Count each Company Response. Columns are Company, CompanyResponse

& ‘Count Company Response’.

5) Query for all Product and ‘Submitted via’ and count each of the ‘Submitted via’. Sort by

Product and ‘Submitted via’. Your columns are Product, Submittedvia, & ‘Count

Submitted via’

Example:

> sqldf("select company, state from myData where company =='360 Mortgage'") Company State 1 360 Mortgage MD 2 360 Mortgage CA 3 360 Mortgage FL 4 360 Mortgage AZ 5 360 Mortgage TX 6 360 Mortgage UT 7 360 Mortgage TX

We start by loading the appropriate libraries in R: sqldf package as follows:

> install.packages("sqldf")

> library(sqldf)

Next, we load the csv file and check the statistical properties of the csv

File as follow:

> setwd("C:/RData”) # your working directory

> myData <- read.csv("path to Consumer_complaint.csv") # loading the file

> str(myData) # check the properties of the file

. . . continue from here!

Important Note

You need to rename the column with space to remove the space. For example, you should

rename ‘Submitted via’ to SubmittedVia before you begin your query because R programming

does not like either space or dot in-between words.

Useful Links

https://www.youtube.com/watch?v=dn8OBC1L_Cw

http://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/sql.html

https://jasminedaly.com/tech-short-papers/sqldf_tutorial.html

Additional Note:

Your paper must have cover-page and reference-page.

Due on Sunday, August 18, 2019 at 11:59pm.

There will be penalty for late assignment.

No any screenshot is allowed in this assignment.