Big Data R Program
Week 15 Assignment - 100 points
Objective: Using SQL query in R data frame.
Given the Consumer_Complaints.csv, use sqldf( ) to query the database for the following
reports:
1) Query for all the states where “Movement Mortgage” are present – (columns are
Company & State)
2) Query for all the states where “Movement Mortgage” OR “Hewitt-Capital” are present
– (columns are Company & State). Sort the by Company.
3) Query for the Product of “Consumer Loan” and the various means of Submitting the
complain relating to Consumer Loan (Submission via). Count each mean of submission
(Submission via). (Columns are Product, ‘Submitted via’, ‘Count Submitted Via’
4) Query for company called “ACS Education Services” and its various Company
Response. Count each Company Response. Columns are Company, CompanyResponse
& ‘Count Company Response’.
5) Query for all Product and ‘Submitted via’ and count each of the ‘Submitted via’. Sort by
Product and ‘Submitted via’. Your columns are Product, Submittedvia, & ‘Count
Submitted via’
Example:
> sqldf("select company, state from myData where company =='360 Mortgage'") Company State 1 360 Mortgage MD 2 360 Mortgage CA 3 360 Mortgage FL 4 360 Mortgage AZ 5 360 Mortgage TX 6 360 Mortgage UT 7 360 Mortgage TX
We start by loading the appropriate libraries in R: sqldf package as follows:
> install.packages("sqldf")
> library(sqldf)
Next, we load the csv file and check the statistical properties of the csv
File as follow:
> setwd("C:/RData”) # your working directory
> myData <- read.csv("path to Consumer_complaint.csv") # loading the file
> str(myData) # check the properties of the file
. . . continue from here!
Important Note
You need to rename the column with space to remove the space. For example, you should
rename ‘Submitted via’ to SubmittedVia before you begin your query because R programming
does not like either space or dot in-between words.
Useful Links
https://www.youtube.com/watch?v=dn8OBC1L_Cw
http://dept.stat.lsa.umich.edu/~jerrick/courses/stat701/notes/sql.html
https://jasminedaly.com/tech-short-papers/sqldf_tutorial.html
Additional Note:
Your paper must have cover-page and reference-page.
Due on Sunday, August 18, 2019 at 11:59pm.
There will be penalty for late assignment.
No any screenshot is allowed in this assignment.