Assignment: Is There a Difference? Web Page
1
Data Analytics and Research 2
Data Analytics and Research:
3.2 Assignment
Data Analytics & Research
Problem: Examine the County Complete database. Pick three states in the same area of the country as yours, one of which is your home state. Determine one variable that was not included in your workshop two analysis. Complete the following analysis:
a. Determine the mean, median, mode, standard deviation, and variance for the counties in all three states. How are they different? The same?
b. Assess each of your three variables for normality.
c. Determine a 95% confidence level for each of the three states for the mean value of counties.
d. Compare the confidence level of your home state to the actual value for your home county. Is it within the confidence limit you have calculated? If not, what could be factors causing it to be an outlier?
Solution:
I live in Owing Mills, Baltimore County, Maryland. In my previous assignment I chose the variables “age_under_5” (Percentage of the population under the age 5 in the year 2010), “age_under_18” ( Percentage of population under the age 18 in the year 2010) and “age_over_65” ( Percentage of population over the age 65 in the year 2010) for the analysis purpose.
So, for this assignment I choose the variable “female_2010” = Percentage of female population in the year 2010 and the other two states Florida and New York for my analysis.
For the solution of part (a), we are required to compute the values of the mean, median, mode, standard deviation and the variance for all the counties in the three states. In the previous assignment we wrote in detail the formulae for computing these quantities. The following are the computed values put in a tabular form:
|
“female 2010” |
Maryland |
Florida |
New York |
|
Mean |
50.95833 |
48.55373 |
50.35 |
|
Median |
51.05 |
50.6 |
50.5 |
|
Mode |
52.3 |
51.6 |
50.2 |
|
Variance |
2.099928 |
15.31737 |
2.201885 |
|
Standard Deviation |
1.449113 |
3.913741 |
1.483875 |
The mean for Maryland and New York differs only by a value of 0.6 whereas the mean value for Florida differs from Maryland by 2.4 and from New York by 1.8. And again, the standard deviations for Maryland and New York are almost same and differ by a value of 0.3 whereas the standard deviation for Florida differs by a value of around 2.5 from that of Maryland and by 2.1 from that of New York.
For the solution of part (b) we are required to assess each of the three variables for normality. In the previous assignment, we had written in detail the conditions for assessing the normality. As a first step for assessing the normality, we plot histograms for each of the three states. The histogram plots for the variable “female_2010” for each of the three states are as follows:
1) Maryland
2) Florida
3) New York
Assessing Normality for each of the three states:
· Maryland
As we can see from the figure, the histogram is not symmetric around the mean, but just denser near the mean. And we can clearly see that the distribution does not satisfy most of the properties of a normal distribution.
· Florida
We can see from the plot that the distribution does not satisfy the properties of a normal distribution and hence it cannot be called a normal distribution.
· New York
From the figure, we can see that, the distribution is denser around the mean, less dense towards the tails, 68% of the data points lie within 1 standard deviation from the mean and around 95% of the data points lie within around 2 standard deviations from the mean. Hence this distribution can be called a normal distribution.
For part (c) we are required to compute 95% confidence intervals for all the three states. It is computed in such a way that the probability of a data point lying within the interval is 0.95. The formula can be used to compute the lower and upper limits of 95% confidence intervals:
where µ is the mean of the sample, is the standard deviation of the sample and is the number of standard deviations from the mean required to contain 95% of the values. Here we can take the value for
Lower limit = µ -
Upper limit = µ +
95% Confidence intervals for each of the three states are as follows:
· Maryland
µ = 50.95833, = 1.449113
Lower limit = 48.11807
Upper limit = 53.79859
· Florida
µ = 48.55373, = 3.913741
Lower limit = 40.882798
Upper limit = 56.224662
· New York
µ = 50.35, = 1.483875
Lower limit = 47.441605
Upper limit = 53.258395
Part (d): The value for my home county which is the Baltimore County is 52.7. It clearly lies with the confidence interval that we have computed.