Statistics Assignment #2 Descriptive Statistics Analysis And Writeup
STAT200: Assignment #2 - Descriptive Statistics Analysis Writeup
Page 3 of 3
University of Maryland University College
STAT200 - Assignment #2: Descriptive Statistics Analysis and Writeup
Identifying Information
Student: Andres Altamiranda
Class: STATS 200 6380
Instructor: Kaan Tanali
Date: September 18, 2019
Introduction:
Table 1. Variables Selected for the Analysis
Variable Name in data set
Description
Type of Variable (Qualitative or Quantitative)
Variable 1:
“Income”
Annual household income in USD.
Quantitative
Variable 2:
“Marital Status”
Marital status of head of household
Qualitative
Variable 3:
“Education Expenditure”
Total amount of annual expenditure
Quantitative
Variable 4: “Age of Head
of Household”
Age of head of household
Quantitative
Variable 5:
“Family Size”
Total number of people in family
(including adults and children
Quantitative
Data Set Description and Method Used for Analysis:
The data set used in this project was taken from the US Department of Labor’s 2016 Consumer Expenditure Surveys. The sample consists of 30 records containing information about the reporting household and its expenditures. Data used from this set included annual household income, the marital status of the head of the household, the annual expenditure for education, the age of the head of the
household, and the total family size. For quantitative data, the central tendency and dispersion of the data were determined by the median and range values, while for qualitative data the mode and range were used. The exception to this is the income variable, where the median was used as a measure of central tendency, but the standard deviation was used to measure the dispersion, in keeping with the plan from the first assignment.
Histograms were used to display data from continuous quantitative variables, while pie charts and bar charts were used to display data from categorical or non-continuous quantitative variables (i.e. age). The histograms were plotted using R, while the pie charts and bar charts were generated using Excel.
Results:
Variable 1: Income
Numerical Summary.
Table 2. Descriptive Analysis for Variable 1
Variable
n
Median
Standard Deviation
Variable: Income
30
96697
5143.10
Graph and/or Table: Histogram of Income
Description of Findings.
Analysis of the annual household income variable reveals a median income of $96697, with a standard deviation of $5143. The histogram above shows that 2/3 of the data set has an income between $95,000 and $100,000, with the bulk of the remaining data lying on the high side of this range. The data do not appear to be normally distributed as there is an extremely high peak in one interval, with a nearly uniform distribution in the surrounding intervals. This data does not display the typical “bell curve” seen with normally distributed data.
It should, perhaps, be noted that a 2017 U.S. Census report indicated that the median household income in the U.S. was only $59,039. This suggests that the data used in this analysis is not representative of the national population.
Variable 2: Marital Status
Numerical Summary.
Table 3. Descriptive Analysis for Variable 2
Variable
n
Mode
Range
Variable: Marital Status
30
Bimodal: Married, Non-married (15 each)
N/A
Graph and/or Table.
Description of Findings.
The thirty records in this study were exactly evenly divided with regard to marital status. Fifteen of the thirty records represented married heads of households, while the other fifteen represented non-married heads of households. The data was thus bimodal, with each status representing one of the modes. In this case, the range of the data was not considered, as the marital status variable was simply a binary variable and the “range” between the two values has no particular meaning.
Variable 3: Education Expenditure
Numerical Summary.
Table 4. Descriptive Analysis for Variable 3
Variable
n
Median
Range
Variable: Education Expenditure
30
273.5
792
Graph and/or Table.
Description of Findings.
The education expenditure variable was the only expenditure variable analyzed for this project. The mean household expenditure for education was $273.50, with a range of $792. The minimum expenditure in the sample just $2, while the maximum value was $794. While there is some other low value in the data set, the $2 value seems suspiciously low and in other circumstances might warrant investigation to determine if there is a possible data entry error.
The above histogram shows that over 2/3 of the households reported spending between 200 and $500 annually on education (21 out of 30 households). The records represented by the bars in the 600 to 800 intervals represent possible outliers. These values should be evaluated and discarded from the data set if they are actual outliers.
Variable 4: Age of Head of Household
Numerical Summary.
Table 5. Descriptive Analysis for Variable 4
Variable
n
Mode
Range
Variable 4: Age of head of household
30
Tri-modal: 43, 51, 52 (3 each)
33
Graph and/or Table
Description of Findings.
Analysis of the data representing the age of the head of the household showed that the data is tri-modal, with ages of 43, 51, and 52 years each appearing three times. The range was 33, spanning from a low of 27 years to a high of 60 years.
The above bar chart displays the distribution of the ages of the heads of households. While most ages between the low and high ages are represented, there is a noticeable gap between 29 and 39 years of age, with only the 35-year-old age being represented in this gap.
Variable 5: Family Size
Numerical Summary.
Table 6. Descriptive Analysis for Variable 5
Variable
n
Median
Range
Variable: Family Size
30
3.5
4
Graph and/or Table
Description of Findings.
The variable representing the family size, which included all adults and children in the household, had a median value of 3.5, with a range of 4. The data spanned values from 1 to 5, with 4 being the modal value.
Discussion and Conclusion.
While not included in the above analysis, the medians of the four expenditure values were $64,339.50 for total household expenditures, $7,817 for food, $105.50 for entertainment, and $273.50 for education. Of the three individual expenditures, food had the highest median value (as would be expected), while entertainment has the lowest.
Without additional information about what is actually being purchased with these expenditures, it is difficult to determine where the most effective savings could be realized. For instance, the $7800 food expenditure could include a significant amount of dining out, which is usually costly compared to home cooking. If that is the case, then perhaps some of this expenditure should be reclassified as entertainment. Of the three, the entertainment expenditure has the lowest median value, but it is also perhaps the least important of the three as there are ways to entertain oneself without spending money. With the limited information available, it would seem that cutting entertainment expenditures would be a better choice.