stat200 assignment #2
University of Maryland University College
STAT200 - Assignment #1: Descriptive Statistics Data Analysis Plan
Identifying Information
Student (Full Name): Matthieu Hensell
Class: Stats200
Instructor: Dr Sanga
Date: 20201108
Scenario: Please write a few lines describing your scenario and the four variables (in addition to income) you have selected.
Scenario: A 32-year old married man with a bachelor’s degree has an annual income of $60,000. He is a father of two (2), which makes his family size four (4). This family spends $6,000 annually on food (groceries, takeout, etc.). The family spends $1200 annual on transportation (gas, car maintenance, public transportation, etc.)
Use Table 1 to report the variables selected for this assignment. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Table 1. Variables Selected for the Analysis
Reason(s) for Selecting the Variables and Expected Outcome(s):
1. Variable 1: “Income” – This was a default variable, chosen for quantitative data points.
2. Variable 2: “SE Marital Status” -I chose this variable because it’s very common and easily represented, and will illustrate an important descriptive dynamic for the data. I predict the outcome of this variable will be a wide dispersion of marital status (married or not-married).
3. Variable 3: “SE Family Size” - I chose this variable because family size predicts spending on necessities in day to day life. The family size will likely heavily determine the other two spending variables. I predict the outcome of this variable will have a moderately wide range and a mean family size of four.
4. Variable 4: “USD Food” - I chose this variable because food is essential in our lives on an everyday basis, so it is a very predictable variable. I predict the mean spending on food for a family size of four, would equate to $6,000 per year.
5. Variable 5: “USD Transportation” - I chose this variable because transportation is another essential spending factr, whether you go to work or leisure on your day to day schedule. I estimate the mean spending on transportation for a family size of four would be around $960.00 pers year. This variable also depends if the family has one or two cars, as well as if the car is a lien title or owned, along with several other variables. This variable is less stable, so my predication will be totally wrong if this were the case.
Data Set Description:
Proposed Data Analysis:
Measures of Central Tendency and Dispersion
Complete Table 2. Numerical Summaries of the Selected Variables and briefly explain why you choose those measurements. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Table 2. Numerical Summaries of the Selected Variables
|
Variable Name |
Measures of Central Tendency and Dispersion |
Rationale for Why Appropriate |
|
Variable 1: “Income”
|
· Number of Observations · Median · Sample Standard Deviation |
I am using median for two reasons: 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency. 2. The variable is quantitative.
I am using sample standard deviation for three reasons: 1. The data is a sample from a larger data set. 2. It is the most commonly used measure of dispersion. 3. The variable is quantitative.
|
|
Variable 2: Marital Status |
· Number of observations · Mode
|
I am using mode because: 1. the variable is qualitative and thus will not have calculable data. 2. This serves as a descriptive feature to the data set
|
|
Variable 3: Family Size |
· Number of observations · Mean · Standard Deviation |
1. I am using mean because I think the average data will be a good representation, as I do not expect large outliers. 2. I am using standard deviation because the data is quantitative. It is the most common use of dispersion and the figures will be from a larger data set. |
|
Variable 4: Food |
· Number of observations · Standard Deviation · Mean |
1. I am using mean because I think the average data will be a good representation, as I do not expect large outliers. 2. I am using standard deviation because the data is quantitative. It is the most common use of dispersion and the figures will be from a larger data set. |
|
Variable 5: Transportation |
· Number of observations · Standard Deviation · Mean |
1. I am using mean because I think the average data will be a good representation, as I do not expect large outliers. 2. I am using standard deviation because the data is quantitative. It is the most common use of dispersion and the figures will be from a larger data set. |
Graphs and/or Tables
Complete Table 3. Type of Graphs and/or Table for Selected Variables and briefly explain why you choose those graphs and/or tables. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Table 3. Type of Graphs and/or Tables for Selected Variables
|
Variable Name |
Graph and/or Table |
Rationale for why Appropriate? |
|
Variable 1: “Income”
|
Graph: I will use the histogram to show the normal distribution of data.
|
Histogram is one of the best plot to show the normal distribution of quantitative level data. |
|
Variable 2: Marital Status |
Table: I will use a frequency table to show the spread of qualitative data with marital status. |
The frequency table shows qualitative data that does not show standard deviation. |
|
Variable 3: Family size |
Graph: I will us a pie chart to show the normal distribution of date. |
The pie chart will clearly show data with a family size, so it sets a good visual foundation to better understand the line chart. |
|
Variable 4: Food |
Graph: I will use a line chart to show annual average of money spent on food. |
The line chart can show certain trends in spending that other charts cannot point out. |
|
Variable 5: Transportation |
Graph: I will use a line chart to show an annual average of money spent on gas/maintenance. |
The line chart can show certain trends in spending that other charts cannot point out. |
STAT200: Assignment #1 - Descriptive Statistics Analysis Plan - Template
Page 1 of 3