Statistics Data Project using Rstudio

profilemousasaleh8
DataProjectGuideandRubric.docx

Data Project Guide & Rubric

This project will be graded out of 100 points as described in the attached rubric. Your grade on the data project will count as 10% of your final course grade.

This is not a group work. If two individuals have the same project writeup, both will get a zero.

Learning Objective

Part 1:

Create graphical representations for data using statistical software. Interpret and present the graphics and measures of central tendency and variability for an approved data set in a business format.

Use the specific data set assigned to you based on the first initial of your last name

1. Using complete English sentences, describe the data. Make sure you

a. State the source of the data.

b. Create a list of each variable with a brief description of what the variable measures.

c. State specifically whether each variable is quantitative or qualitative.

Create, present, and interpret the following:

2. Create a bar chart, pie chart or pareto graph for a qualitative variable.

3. Create a frequency histogram for a quantitative variable.

4. Create a relative frequency histogram for a different quantitative variable.

5. Find and display the following summary statistics for a quantitative variable.

a. Mean

b. Median

c. Standard deviation

d. Variance

e. Range

f. Q1

g. Q3

h. Interquartile Range

i. n

6. Make a boxplot for a quantitative variable.

7. Make a side-by-side boxplot for a different quantitative variable separated by a qualitative variable’s classes.

8. Make a scatter plot of two quantitative variables.

Each of the graphs you create should include a title describing the graph and correctly labeled axes. Each of the graphs should also be followed by at least one complete English sentence describing a feature of the variable that the graph illustrates. For example: “From the histogram we can see that the data is nearly symmetric and centered around $4500.”

Part 2: ( 44 points)

Reporting statistical results from data analysis. Use simple hypothesis test and confidence intervals to help make decisions in business, economics and marketing. Explain statistical inference in its most basic form. Use one or more variables (independent variables) to predict values of another (Response or dependent variable).

a. For one of continuous variable in your dataset, set up a reasonable claim about the true mean and perform an hypothesis test to verify this claim

b. Use one categorical variable to group a numerical variable and test the difference between means in at least two categories.

c. Select two numerical variable and reasonably suggest which one can be used to predict the other. Set up a linear regression model to the best of your knowledge.

d. Write a report for part II of the project

STT 315 – Data Project Rubric – Spring 2019

Part 1:

1. Data description

a. Report states the source of the data set. (1)

b. Report includes a list of variables in the data set. (1)

c. Report includes a verbal description of each variable. (1)

d. Report correctly identifies the type of each variable (quantitative or qualitative). (2)

2. Bar Chart

a. Report includes a bar chart of a qualitative variable. (1)

b. Bar chart correctly represents the variable. (1)

c. Bar chart has a descriptive title. (1)

d. Bar chart has descriptive labels for each category. (1)

e. A short note (at least two sentences) below the graph correctly describes a feature of the variable illustrated by the bar chart. (3)

3. Frequency Histogram

a. Report includes a frequency histogram for a quantitative variable. (1)

b. Frequency histogram has a descriptive title. (1)

c. Frequency histogram has correctly labeled x-axis. (1)

d. Frequency histogram has correctly labeled x-axis. (1)

e. A short note (at least two sentences) below the graph correctly describes a feature of the variable illustrated by the frequency histogram. (3)

4. Relative Frequency Histogram

a. Report includes a relative frequency histogram for a different quantitative variable. (1)

b. Relative frequency histogram has a descriptive title. (1)

c. Relative frequency histogram has correctly labeled x-axis. (1)

d. Relative frequency histogram has a correctly labeled y-axis. (1)

e. A short note (at least two sentences) below the graph correctly describes a feature of the variable illustrated by the relative frequency histogram. (3)

5. Table of Statistics (10 points)

a. Report contains a display of the summary statistics for one of the quantitative variables.

b. Report correctly states the following

i. Mean

ii. Median

iii. Standard deviation

iv. Variance

v. Range

vi. First quartile (Q1)

vii. Third quartile (Q3)

viii. Interquartile Range (IQR)

ix. Sample size

6. Box Plot

a. Report includes a boxplot for a quantitative variable. (1)

b. Boxplot has a descriptive title. (1)

c. Boxplot has correctly labeled x-axis (horizontal plot) or y-axis (vertical plot). (1)

d. A short note (at least two sentences) correctly describes a feature of the variable illustrated by the boxplot. (3)

7. Side-by-side Box Plot (6 points)

a. Report includes a side-by-side boxplot for a quantitative variable, separated by a qualitative variable’s categories. (1)

b. Side-by-side boxplot has a descriptive title. (1)

c. Side-by-side boxplot has correctly labeled x-axis. (1)

d. Side-by-side boxplot has a correctly labeled y-axis. (1)

e. A short note (at least two sentences) correctly describes a feature of the variable illustrated by the side-by-side boxplot. (3)

8. Scatter Plot

a. Report includes a scatter plot of two quantitative variables. (1)

b. Scatter plot has a descriptive title. (1)

c. Scatter plot has correctly labeled x-axis. (1)

d. Scatter plot has a correctly labeled y-axis. (1)

e. A short note (at least two sentences) correctly describes a feature of the variable illustrated by the scatter plot. (3)

Part 2:

11 points for each of a, b, c & d