ANLY600

Will32

Main

Home >Homework Answsers >Business & Finance homework help >Management homework help

In this Module 2 Discussion, we shall discuss how to use R to obtain information by exploring, cleaning, and preprocessing the data. The following is a kind of checklist of frequent steps in data preparation. More precisely, they are also typical steps in “cleansing” data. Such steps include (at least):

No.

Steps

R functions

Loading and looking at the dataset in R

Identify missing values

Identify outliers

Check for overall plausibility and errors (e.g, typos)

Identify highly correlated variables

Identify variables with (nearly) no variance

Identify variables with strange names or values

Check variable classes (eg. Characters vs factors)

Remove/transform some variables (maybe your model does not like categorial variables)

Rename some variables or values (especially interesting if large number)

Check some overall pattern (statistical/numerical summaries/graphical illustrations)

Center/scale variables

In view of the above steps, please scan through the three examples (Example 1,2,3) in Data Mining and Business Analytics with R Chapter 2 and Data Mining for Business Analytics: Concepts, Techniques, and Applications in R section 2.4 (found in this week's Reading & Resources) to find and then fill in the blanks in the above table for those R functions we can use to handle these steps, respectively. For example, you may put read.csv() and view() in the first row as they are the ways to realize that specific step. You may also refer to some open resources to find relevant R functions to fill in those blanks and each blank can have multiple R functions as answers.

In your response to other students, suggest changes to their answers that you think would make it a stronger analysis, or ask clarifying questions if anything was a bit confusing.

5 years ago
15.07.2021
1

Answer(1)

Purchase the answer to view it

DatapreparationinR.docx

5 years ago

other Questions(10)