stat200 assignment #2

profileMhensell
Assignment-1-Hensell.docx

University of Maryland University College

STAT200 - Assignment #1: Descriptive Statistics Data Analysis Plan

Identifying Information

Student (Full Name): Matthieu Hensell

Class: Stats200

Instructor: Dr Sanga

Date: 20201108

Scenario: Please write a few lines describing your scenario and the four variables (in addition to income) you have selected.

Scenario: A 32-year old married man with a bachelor’s degree has an annual income of $60,000. He is a father of two (2), which makes his family size four (4). This family spends $6,000 annually on food (groceries, takeout, etc.). The family spends $1200 annual on transportation (gas, car maintenance, public transportation, etc.)

Use Table 1 to report the variables selected for this assignment. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.

Table 1. Variables Selected for the Analysis

Variable Name in the Data Set

Description

(See the data dictionary for describing the variables.)

Type of Variable

(Qualitative or Quantitative)

Variable 1: “Income”

Annual household income in USD.

Quantitative

Variable 2: SE-MaritalStatus

Marital Status of Head of Household

Qualitative

Variable 3: SE- Family Size

Total number of people in family (both adults and children)

Quantitative

Variable 4: USD-Food

Total Amount of Annual Expenditure on Food

Quantitative

Variable 5: USD-Transport

Total Amount of Annual Expenditure on Transportation

Quantitative

Reason(s) for Selecting the Variables and Expected Outcome(s):

1. Variable 1: “Income” – This was a default variable, chosen for quantitative data points.

2. Variable 2: “SE Marital Status” -I chose this variable because it’s very common and easily represented, and will illustrate an important descriptive dynamic for the data. I predict the outcome of this variable will be a wide dispersion of marital status (married or not-married).

3. Variable 3: “SE Family Size” - I chose this variable because family size predicts spending on necessities in day to day life. The family size will likely heavily determine the other two spending variables. I predict the outcome of this variable will have a moderately wide range and a mean family size of four.

4. Variable 4: “USD Food” - I chose this variable because food is essential in our lives on an everyday basis, so it is a very predictable variable. I predict the mean spending on food for a family size of four, would equate to $6,000 per year.

5. Variable 5: “USD Transportation” - I chose this variable because transportation is another essential spending factr, whether you go to work or leisure on your day to day schedule. I estimate the mean spending on transportation for a family size of four would be around $960.00 pers year. This variable also depends if the family has one or two cars, as well as if the car is a lien title or owned, along with several other variables. This variable is less stable, so my predication will be totally wrong if this were the case.

Data Set Description:

Proposed Data Analysis:

Measures of Central Tendency and Dispersion

Complete Table 2. Numerical Summaries of the Selected Variables and briefly explain why you choose those measurements. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.

Table 2. Numerical Summaries of the Selected Variables

Variable Name

Measures of Central Tendency and Dispersion

Rationale for Why Appropriate

Variable 1:

“Income”

· Number of Observations

· Median

· Sample Standard Deviation

I am using median for two reasons:

1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency.

2. The variable is quantitative.

I am using sample standard deviation for three reasons:

1. The data is a sample from a larger data set.

2. It is the most commonly used measure of dispersion.

3. The variable is quantitative.

Variable 2:

Marital Status

· Number of observations

· Mode

I am using mode because:

1. the variable is qualitative and thus will not have calculable data.

2. This serves as a descriptive feature to the data set

Variable 3: Family Size

· Number of observations

· Mean

· Standard Deviation

1. I am using mean because I think the average data will be a good representation, as I do not expect large outliers.

2. I am using standard deviation because the data is quantitative. It is the most common use of dispersion and the figures will be from a larger data set.

Variable 4: Food

· Number of observations

· Standard Deviation

· Mean

1. I am using mean because I think the average data will be a good representation, as I do not expect large outliers.

2. I am using standard deviation because the data is quantitative. It is the most common use of dispersion and the figures will be from a larger data set.

Variable 5:

Transportation

· Number of observations

· Standard Deviation

· Mean

1. I am using mean because I think the average data will be a good representation, as I do not expect large outliers.

2. I am using standard deviation because the data is quantitative. It is the most common use of dispersion and the figures will be from a larger data set.

Graphs and/or Tables

Complete Table 3. Type of Graphs and/or Table for Selected Variables and briefly explain why you choose those graphs and/or tables. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.

Table 3. Type of Graphs and/or Tables for Selected Variables

Variable Name

Graph and/or Table

Rationale for why Appropriate?

Variable 1:

“Income”

Graph: I will use the histogram to show the normal distribution of data.

Histogram is one of the best plot to show the normal distribution of quantitative level data.

Variable 2:

Marital Status

Table: I will use a frequency table to show the spread of qualitative data with marital status.

The frequency table shows qualitative data that does not show standard deviation.

Variable 3:

Family size

Graph: I will us a pie chart to show the normal distribution of date.

The pie chart will clearly show data with a family size, so it sets a good visual foundation to better understand the line chart.

Variable 4:

Food

Graph: I will use a line chart to show annual average of money spent on food.

The line chart can show certain trends in spending that other charts cannot point out.

Variable 5:

Transportation

Graph: I will use a line chart to show an annual average of money spent on gas/maintenance.

The line chart can show certain trends in spending that other charts cannot point out.

STAT200: Assignment #1 - Descriptive Statistics Analysis Plan - Template

Page 1 of 3