ANLY600
In this Module 2 Discussion, we shall discuss how to use R to obtain information by exploring, cleaning, and preprocessing the data. The following is a kind of checklist of frequent steps in data preparation. More precisely, they are also typical steps in “cleansing” data. Such steps include (at least):
No.
Steps
R functions
1
Loading and looking at the dataset in R
2
Identify missing values
3
Identify outliers
4
Check for overall plausibility and errors (e.g, typos)
5
Identify highly correlated variables
6
Identify variables with (nearly) no variance
7
Identify variables with strange names or values
8
Check variable classes (eg. Characters vs factors)
9
Remove/transform some variables (maybe your model does not like categorial variables)
10
Rename some variables or values (especially interesting if large number)
11
Check some overall pattern (statistical/numerical summaries/graphical illustrations)
12
Center/scale variables
In view of the above steps, please scan through the three examples (Example 1,2,3) in Data Mining and Business Analytics with R Chapter 2 and Data Mining for Business Analytics: Concepts, Techniques, and Applications in R section 2.4 (found in this week's Reading & Resources) to find and then fill in the blanks in the above table for those R functions we can use to handle these steps, respectively. For example, you may put read.csv() and view() in the first row as they are the ways to realize that specific step. You may also refer to some open resources to find relevant R functions to fill in those blanks and each blank can have multiple R functions as answers.
In your response to other students, suggest changes to their answers that you think would make it a stronger analysis, or ask clarifying questions if anything was a bit confusing.
5 years ago
1
Purchase the answer to view it

- DatapreparationinR.docx