Statistic
1. The “Q3” tab in the midterm excel data sheet contains data on the maximum August temperatures for Denver between 1958 and 2017. Calculate the means and the standard deviations for the entire sample, and then separately for 1958-1987 and 1988-2017. Give the equations used to calculate the values.
2.Transparency International compiles an index of corruption for over 150 world nations. The data for these nations for 2018 is provided in the midterm excel file. The 24 nations for whom oil and gas make up more than 50% of export revenues are separated from the other 127 nations. Note that higher values in the index indicate lower levels of corruption.
i.(4) Find the means and variances of the corruption index for each of the two groups of countries.
ii.(8) Construct a 95% confidence interval for the true mean difference between the two groups.
iii.(8) Using t-stat, determine whether the difference in mean corruption between the two groups of countries is statistically significant at the 95% confidence level. Give your answer in the form of a classical hypothesis test.
3.A “Happiness” statistic of different countries was compiled by the World Value Survey. The "Happiness (net)" statistic was calculated by the percentage of people who rated themselves as either "quite happy" or "very happy" minus the percentage of people who rated themselves as either "not very happy" or "not at all happy". Now we want to know if reading comprehension affects happiness. Conduct a regression analysis for predicting “Happiness” from “Reading Comprehension” by answering the following questions. You can use the regression tool in the data analysis toolpak, rather than performing regression manually in the spreadsheet.
i. (2) State the dependent and independent variables and explain your selection.
ii. (2) Make a scatterplot of the variables and comment on the relationship between X and Y evident from the scatterplot.
iii. (5) Perform a linear regression on the dataset. What is the regression equation? What is the r2? What does the r2 mean?
iv. (2) Add a regression line to the scatter
v. (2) Provide a .05 level of significance test for slope being 0
vi. (5) Show and comment on the residual plot. Are there any apparent violations of regression assumptions or outliers?
vii. (2) Calculate the expected net happiness of Hungary, which has a reading comprehension index of 470.
13. (20) In the worksheet, Q12 lists a data set that you want to investigate for the relationship between body weight and brain weight of some animals.
Show and comment on the scatterplot, discussing the apparent relationship between the two variables.
Conduct a regression analysis (in data analysis toolpak) for the two variables and report the results. In particular, comment on the strength of the relationship (i.e., coefficient of determination), the regression coefficients, their significance, and analyze the residual plot for violations of regression assumptions and outliers.
Try try to improve your initial regression by performing log transformations and/or eliminating outlier(s). After each change, perform a new regression analysis and report the results, making sure to discuss all of the points raised above. Pay particular attention to changes in the coefficient of determination and in the residual plot. After performing 2 regression analyses with transformed variables and/or removed outliers, compare your results and determine the best regression equation. Explain why you chose that particular result.
1.
The “Q3” tab in the midterm excel data sheet contains data on the maximum August temperatures for Denver
between 1958 and 2017. Calculate the
means
and the
standard deviations
for the
entire sample
, and then
separately for 1958
-
1987 and 1988
-
2017
. Give the equations used to calculate the values.
2.
Transparency International compiles an index of corruption f
or over 150 world nations. The data for
these nations for 2018 is provided in the midterm excel file. The 24 nations for whom oil and gas make up
more than 50% of export revenues are separated from the other 127 nations. Note that higher values in
the inde
x indicate lower levels of corruption.
i.(4)
Find the means and variances of the corruption index for each of the two groups of countries.
ii.(8)
Construct a 95% confidence interval for the true mean difference between the two groups.
iii.(8)
Using t
-
st
at, determine whether the difference in mean corruption between the two groups of countries is
statistically significant at the 95% confidence level. Give your answer in the form of a classical hypothesis test.
3.
A “Happiness” statistic of different countries was compiled by the World Value Surve
y. The
"Happiness (net)" statistic was calculated by the percentage of people who rated themselves as either
"quite happy" or "very happy" minus the percentage of people who rated themselves as either "not very
happy" or "not at all happy". Now we want to
know if reading comprehension affects happiness. Conduct
a regression analysis for predicting “Happiness” from “Reading Comprehension” by answering the
following questions.
You can use the regression tool in the data analysis toolpak, rather than
performin
g regression manually in the spreadsheet.
i. (2)
State the dependent and independent variables and explain your selection.
ii. (2)
Make a scatterplot of the variables and comment on the relationship between X and Y evident from
the scatterplot.
iii. (5)
Perform a linear regression on the dataset. What is the regression equation? What is the r
2
? What
does the r
2
mean?
iv. (2)
Add a regression line to the scatter
v. (2)
Provide a .05 level of significance test for slope being 0
vi. (5)
Show and comment
on the residual plot. Are there any apparent violations of regression
assumptions or outliers?
vii. (2)
Calculate the expected net happiness of Hungary, which has a reading comprehension index of
470.
13. (20)
In the worksheet, Q12 lists a data set that
you want to investigate for the relationship between
body weight and brain weight of some animals.
Show and comment on the scatterplot, discussing the apparent relationship between the two variables.
Conduct a regression analysis (in data analysis toolpa
k) for the two variables and report the results. In
particular, comment on the strength of the relationship (i.e., coefficient of determination), the regression
coefficients, their significance, and analyze the residual plot for violations of regression as
sumptions and
outliers.
Try try to improve your initial regression by performing log transformations and/or eliminating outlier(s). After
each change, perform a new regression analysis and report the results, making sure to discuss all of the points
raise
d above. Pay particular attention to changes in the coefficient of determination and in the residual plot. After
performing 2 regression analyses with transformed variables and/or removed outliers, compare your results and
determine the best regression equ
ation. Explain why you chose that particular result.
1. The “Q3” tab in the midterm excel data sheet contains data on the maximum August temperatures for Denver
between 1958 and 2017. Calculate the means and the standard deviations for the entire sample, and then
separately for 1958-1987 and 1988-2017. Give the equations used to calculate the values.
2.Transparency International compiles an index of corruption for over 150 world nations. The data for
these nations for 2018 is provided in the midterm excel file. The 24 nations for whom oil and gas make up
more than 50% of export revenues are separated from the other 127 nations. Note that higher values in
the index indicate lower levels of corruption.
i.(4) Find the means and variances of the corruption index for each of the two groups of countries.
ii.(8) Construct a 95% confidence interval for the true mean difference between the two groups.
iii.(8) Using t-stat, determine whether the difference in mean corruption between the two groups of countries is
statistically significant at the 95% confidence level. Give your answer in the form of a classical hypothesis test.
3.A “Happiness” statistic of different countries was compiled by the World Value Survey. The
"Happiness (net)" statistic was calculated by the percentage of people who rated themselves as either
"quite happy" or "very happy" minus the percentage of people who rated themselves as either "not very
happy" or "not at all happy". Now we want to know if reading comprehension affects happiness. Conduct
a regression analysis for predicting “Happiness” from “Reading Comprehension” by answering the
following questions. You can use the regression tool in the data analysis toolpak, rather than
performing regression manually in the spreadsheet.
i. (2) State the dependent and independent variables and explain your selection.
ii. (2) Make a scatterplot of the variables and comment on the relationship between X and Y evident from
the scatterplot.
iii. (5) Perform a linear regression on the dataset. What is the regression equation? What is the r2? What
does the r2 mean?
iv. (2) Add a regression line to the scatter
v. (2) Provide a .05 level of significance test for slope being 0
vi. (5) Show and comment on the residual plot. Are there any apparent violations of regression
assumptions or outliers?
vii. (2) Calculate the expected net happiness of Hungary, which has a reading comprehension index of
470.
13. (20) In the worksheet, Q12 lists a data set that you want to investigate for the relationship between
body weight and brain weight of some animals.
Show and comment on the scatterplot, discussing the apparent relationship between the two variables.
Conduct a regression analysis (in data analysis toolpak) for the two variables and report the results. In
particular, comment on the strength of the relationship (i.e., coefficient of determination), the regression
coefficients, their significance, and analyze the residual plot for violations of regression assumptions and
outliers.
Try try to improve your initial regression by performing log transformations and/or eliminating outlier(s). After
each change, perform a new regression analysis and report the results, making sure to discuss all of the points
raised above. Pay particular attention to changes in the coefficient of determination and in the residual plot. After
performing 2 regression analyses with transformed variables and/or removed outliers, compare your results and
determine the best regression equation. Explain why you chose that particular result.