Statistics Assignment #2 Descriptive Statistics Analysis and Writeup
University of Maryland University College
STAT200 - Assignment #1: Descriptive Statistics Data Analysis Plan
Identifying Information
Student :Mandy Francisco
Class: STAT200 6392
Instructor: Professor Roger Davis
Date: June 2, 2020
Scenario:
From the dataset, I chose UniqueID# 36. I am married with an annual income of $99610. I am the head of the household and I am 36. It is only two of us, me and my spouse. We both have our bachelor’s degree and are both employed. Our annual income is a total of both our incomes together in the household.
Table 1. Variables Selected for the Analysis
|
Variable Name in the Data Set |
Description (See the data dictionary for describing the variables.) |
Type of Variable (Qualitative or Quantitative) |
|
Variable 1: “Income”
|
Annual household income in USD. |
Quantitative |
|
Variable 2: “Age Head Household” |
Head of household’s age group |
Quantitative |
|
Variable 3: “Family Size” |
Household family size |
Qualitative |
|
Variable 4:” Food Expenditures” |
Total amount of expenditure of food annually |
Quantitative |
|
Variable 5:” Entertainment Expenditures” |
Total amount of expenditure on entertainment annually |
Quantitative |
Reason(s) for Selecting the Variables and Expected Outcome(s):
1. Variable 1: “Income” –Having more than one employed person in the household helps increase the total annual income. This gives us a bit more to spend with when it comes to our budget plan. This variable will show the annual income in USD altogether.
2. Variable 2: “Age Head of Household “- I chose age group because each age group has a general spending pattern/habit. There are a “need”and “want” that differs between age groups.
This will show the age of the head of household.
3. Variable 3: “Family Size“ - This variable will show the size of the family in the household. I expect double the expenditure with two people.
4. Variable 4: “ Food Expenditures“ This variable shows the annual expenditure of food in the household. I expect it to be double.
5. Variable 5: “Entertainment Expenditures “ - This shows the entertainment expenditure per household per year. This should double since there is two as well.
Data Set Description:
Proposed Data Analysis:
Measures of Central Tendency and Dispersion
Complete Table 2. Numerical Summaries of the Selected Variables and briefly explain why you choose those measurements. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Table 2. Numerical Summaries of the Selected Variables
|
Variable Name |
Measures of Central Tendency and Dispersion |
Rationale for Why Appropriate |
|
Variable 1: “Income”
|
· Number of Observations · Median · Sample Standard Deviation |
I am using median for two reasons: 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency. 2. The variable is quantitative.
I am using sample standard deviation for three reasons: 1. The data is a sample from a larger data set. 2. It is the most commonly used measure of dispersion. 3. The variable is quantitative.
|
|
Variable 2:” Age Head Household” |
· Number of Observations · Median · Standard Deviation |
I am using mode for two reasons: 1. Median variable of age will be better than a mean for then it will look asymmetric 2. The variable is Quantitative I am using sample standard deviation for two reasons: 1. It shows the spread of data 2. It is the most commonly used for standard interpretation
|
|
Variable 3: “Family Size” |
· Number of Observations · Mean · Standard Deviation |
I chose mean because: 1. The variable is quantitative 2. The range is smaller with only little outliers I chose standard deviation because: 1. The variable is quantitative 2. It is most commonly used when it comes to measuring dispersion of data
|
|
Variable 4: “Annual Food Expenditure” |
· Number of Observations · Mean · Standard Deviation |
I chose mean because: 1. The variable is quantitative 2. The range is smaller with only little outliers 3. Mean is a good choice if there are no extreme values
I chose standard deviation because: 1.The variable is quantitative 2.It is most commonly used when it comes to measuring dispersion of data
|
|
Variable 5: “Annual Entertainment Expenditure” |
· Number of Observations · Mean · Standard Deviation |
I am using mean because: 1. The variable is quantitative 2. Mean is the best when it comes to data with not much outliers and is normally distributed I chose standard deviation because: 1. It shows the spread of data 2. It is a quantitative variable |
Graphs and/or Tables
Table 3. Type of Graphs and/or Tables for Selected Variables
|
Variable Name |
Graph and/or Table |
Rationale for why Appropriate? |
|
Variable 1: “Income”
|
Graph: I will use the histogram to show the normal distribution of data.
|
Histogram is one of the best plot to show the normal distribution of quantitative level data . |
|
Variable 2:” Age Head Household” |
Graph: I will use box plot in order to show the income distribution |
To show the distribution of income. Box plots are best used to show data that is highly skewed like income. |
|
Variable 3: “Family Size” |
Graph: I will use pie chart to show the family size |
The pie chart is an ideal graph to show the size of a family |
|
Variable 4: “Food Expenditures” |
Graph: I will use the histogram to show the annual food expenditures |
The histogram is best to show the distribution of quantitative data |
|
Variable 5: “Entertainment Expenditures” |
Graph: I will use the histogram to show the annual entertainment expenditures |
The histogram is the best plot when it comes to showing distribution of quantitative level data |
STAT200: Assignment #1 - Descriptive Statistics Analysis Plan - Template
Page 1 of 3