Descriptive Analysis and Writeup - STAT200
University of Maryland University College
STAT200 - Assignment #1: Descriptive Statistics Data Analysis Plan
Score: 42 out of 50
Identifying Information
Student (Full Name): DeNedra Hodge
Class: STAT 200 6366 Introduction to Statistics
Instructor:
Date: 3 June 2018
(You need a plan describing what you will like to learn from the dataset available to you, and how you’ll do it)
Table 1. Variables Selected for the Analysis
|
Variable Name in the Data Set |
Description (See the data dictionary for describing the variables.) |
Type of Variable (Qualitative or Quantitative) |
|
Variable 1: “Income”
|
Annual household income in USD. |
Quantitative |
|
Variable 2: “Expenditures” |
Annual household expenditures USD |
Quantitative |
|
Variable 3: “Housing” |
Annual household housing costs USD |
Quantitative |
|
Variable 4: “Electricity” |
Annual household electricity costs USD |
Quantitative |
|
Variable 5: “Water” |
Annual household water costs USD |
Quantitative |
Reason(s) for Selecting the Variables and Expected Outcome(s):
1. Variable 1: “Income” - To show how much income comes in household
2. Variable 2: “Expenditures“ - To show expenditures paid out
3. Variable 3: “ Housing“ - To show housing expenses paid out annually
4. Variable 4: “Electricity“ - To show electricity expenses paid out annually
5. Variable 5: “Water“ - To show water expenses paid out annually
Data Set Description:
Proposed Data Analysis:
Measures of Central Tendency and Dispersion
Complete Table 2. Numerical Summaries of the Selected Variables and briefly explain why you choose those measurements. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Table 2. Numerical Summaries of the Selected Variables
|
Variable Name |
Measures of Central Tendency and Dispersion |
Rationale for Why Appropriate |
|
Variable 1: “Income”
|
· ● Number of Observations · ● Median · ● Sample Standard Deviation |
I am using median for two reasons: 1. 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency. 2. 2. The variable is quantitative.
I am using sample standard deviation for three reasons: 1. 1. The data is a sample from a larger data set. 2. 2. It is the most commonly used measure of dispersion. 3. 3. The variable is quantitative.
|
|
Variable 2: |
· ● Number of Observations · ● Median · ● Sample Standard Deviation |
I am using median for two reasons: 1. 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency. 2. 2. The variable is quantitative.
I am using sample standard deviation for three reasons: 1. 1. The data is a sample from a larger data set. 2. 2. It is the most commonly used measure of dispersion. 3. 3. The variable is quantitative. |
|
Variable 3: |
· ● Number of Observations · ● Median · ● Sample Standard Deviation |
I am using median for two reasons: 1. 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency. 2. 2. The variable is quantitative.
I am using sample standard deviation for three reasons: 1. 1. The data is a sample from a larger data set. 2. 2. It is the most commonly used measure of dispersion. 3. 3. The variable is quantitative. |
|
Variable 4: |
· ● Number of Observations · ● Median · ● Sample Standard Deviation |
I am using median for two reasons: 1. 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency. 2. 2. The variable is quantitative.
I am using sample standard deviation for three reasons: 1. 1. The data is a sample from a larger data set. 2. 2. It is the most commonly used measure of dispersion. 3. 3. The variable is quantitative. |
|
Variable 5: |
· ● Number of Observations · ● Median · ● Sample Standard Deviation |
I am using median for two reasons: 1. 1. If there are any outliers or the data is not normally distributed, the median is the best measure of central tendency. 2. 2. The variable is quantitative.
I am using sample standard deviation for three reasons: 1. 1. The data is a sample from a larger data set. 2. 2. It is the most commonly used measure of dispersion. 3. 3. The variable is quantitative. |
Graphs and/or Tables
Complete Table 3. Type of Graphs and/or Table for Selected Variables and briefly explain why you choose those graphs and/or tables. Note: The information for the required variable, “Income,” has already been completed and can be used as a guide for completing information on the remaining variables.
Table 3. Type of Graphs and/or Tables for Selected Variables
|
Variable Name |
Graph and/or Table |
Rationale for why Appropriate? |
|
Variable 1: “Income”
|
Graph: I will use the histogram to show the normal distribution of data.
|
Histogram is one of the best plot to show the normal distribution of quantitative level data . |
|
Variable 2: “Expenditures” |
Graph: I will use the histogram to show the normal distribution of data. |
Histogram is one of the best plot to show the normal distribution of quantitative level data . |
|
Variable 3: “Housing” |
Graph: I will use the histogram to show the normal distribution of data. |
Graph: I will use the histogram to show the normal distribution of data. |
|
Variable 4: “Electricity” |
Graph: I will use the histogram to show the normal distribution of data. |
Histogram is one of the best plot to show the normal distribution of quantitative level data . |
|
Variable 5: “Water” |
Graph: I will use the histogram to show the normal distribution of data. |
Histogram is one of the best plot to show the normal distribution of quantitative level data . |