Statistics Assignment #2 Descriptive Statistics Analysis and Writeup

peachesparfait
Assignment1-Francisco.docx

University of Maryland University College

STAT200 - Assignment #1: Descriptive Statistics Data Analysis Plan

Identifying Information

Student :Mandy Francisco

Class: STAT200 6392

Instructor: Professor Roger Davis

Date: June 2, 2020

Scenario:

From the dataset, I chose UniqueID# 36. I am married with an annual income of $99610. I am the head of the household and I am 36. It is only two of us, me and my spouse. We both have our bachelor’s degree and are both employed. Our annual income is a total of both our incomes together in the household.

Table 1. Variables Selected for the Analysis

Variable Name in the Data Set

Description

(See the data dictionary for describing the variables.)

Type of Variable

(Qualitative or Quantitative)

Variable 1: “Income”

Annual household income in USD.

Quantitative

Variable 2: “Age Head Household”

Head of household’s age group

Quantitative

Variable 3: “Family Size”

Household family size

Qualitative

Variable 4:” Food Expenditures”

Total amount of expenditure of food annually

Quantitative

Variable 5:” Entertainment Expenditures”

Total amount of expenditure on entertainment annually

Quantitative

Reason(s) for Selecting the Variables and Expected Outcome(s):

1. Variable 1: “Income” –Having more than one employed person in the household helps increase the total annual income. This gives us a bit more to spend with when it comes to our budget plan. This variable will show the annual income in USD altogether.

2. Variable 2: “Age Head of Household “- I chose age group because each age group has a general spending pattern/habit. There are a “need”and “want” that differs between age groups.

This will show the age of the head of household.

3. Variable 3: “Family Size“ - This variable will show the size of the family in the household. I expect double the expenditure with two people.

4. Variable 4: “ Food Expenditures“ This variable shows the annual expenditure of food in the household. I expect it to be double.

5. Variable 5: “Entertainment Expenditures “ - This shows the entertainment expenditure per household per year. This should double since there is two as well.

Data Set Description:

Proposed Data Analysis:

Measures of Central Tendency and Dispersion

Complete Table 2. Numerical Summaries of the Selected Variables and briefly explain why you choose those measurements. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.

Table 2. Numerical Summaries of the Selected Variables

Variable Name

Measures of Central Tendency and Dispersion

Rationale for Why Appropriate

Variable 1:

“Income”

· Number of Observations

· Median

· Sample Standard Deviation

I am using median for two reasons:

1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency.

2. The variable is quantitative.

I am using sample standard deviation for three reasons:

1. The data is a sample from a larger data set.

2. It is the most commonly used measure of dispersion.

3. The variable is quantitative.

Variable 2:” Age Head Household”

· Number of Observations

· Median

· Standard Deviation

I am using mode for two reasons:

1. Median variable of age will be better than a mean for then it will look asymmetric

2. The variable is Quantitative

I am using sample standard deviation for two reasons:

1. It shows the spread of data

2. It is the most commonly used for standard interpretation

Variable 3: “Family Size”

· Number of Observations

· Mean

· Standard Deviation

I chose mean because:

1. The variable is quantitative

2. The range is smaller with only little outliers

I chose standard deviation because:

1. The variable is quantitative

2. It is most commonly used when it comes to measuring dispersion of data

Variable 4: “Annual Food Expenditure”

· Number of Observations

· Mean

· Standard Deviation

I chose mean because:

1. The variable is quantitative

2. The range is smaller with only little outliers

3. Mean is a good choice if there are no extreme values

I chose standard deviation because:

1.The variable is quantitative

2.It is most commonly used when it comes to measuring dispersion of data

Variable 5: “Annual Entertainment Expenditure”

· Number of Observations

· Mean

· Standard Deviation

I am using mean because:

1. The variable is quantitative

2. Mean is the best when it comes to data with not much outliers and is normally distributed

I chose standard deviation because:

1. It shows the spread of data

2. It is a quantitative variable

Graphs and/or Tables

Table 3. Type of Graphs and/or Tables for Selected Variables

Variable Name

Graph and/or Table

Rationale for why Appropriate?

Variable 1:

“Income”

Graph: I will use the histogram to show the normal distribution of data.

Histogram is one of the best plot to show the normal distribution of quantitative level data .

Variable 2:” Age Head Household”

Graph: I will use box plot in order to show the income distribution

To show the distribution of income. Box plots are best used to show data that is highly skewed like income.

Variable 3: “Family Size”

Graph: I will use pie chart to show the family size

The pie chart is an ideal graph to show the size of a family

Variable 4: “Food Expenditures”

Graph: I will use the histogram

to show the annual food expenditures

The histogram is best to show the distribution of quantitative data

Variable 5: “Entertainment Expenditures”

Graph: I will use the histogram to show the annual entertainment expenditures

The histogram is the best plot when it comes to showing distribution of quantitative level data

STAT200: Assignment #1 - Descriptive Statistics Analysis Plan - Template

Page 1 of 3