Statistic

profileKelvin52
1.docx

1. The “Q3” tab in the midterm excel data sheet contains data on the maximum August temperatures for Denver between 1958 and 2017. Calculate the means and the standard deviations for the entire sample, and then separately for 1958-1987 and 1988-2017. Give the equations used to calculate the values.

2.Transparency International compiles an index of corruption for over 150 world nations. The data for these nations for 2018 is provided in the midterm excel file. The 24 nations for whom oil and gas make up more than 50% of export revenues are separated from the other 127 nations. Note that higher values in the index indicate lower levels of corruption.

i.(4) Find the means and variances of the corruption index for each of the two groups of countries.

ii.(8) Construct a 95% confidence interval for the true mean difference between the two groups.

iii.(8) Using t-stat, determine whether the difference in mean corruption between the two groups of countries is statistically significant at the 95% confidence level. Give your answer in the form of a classical hypothesis test.

3.A “Happiness” statistic of different countries was compiled by the World Value Survey. The "Happiness (net)" statistic was calculated by the percentage of people who rated themselves as either "quite happy" or "very happy" minus the percentage of people who rated themselves as either "not very happy" or "not at all happy". Now we want to know if reading comprehension affects happiness. Conduct a regression analysis for predicting “Happiness” from “Reading Comprehension” by answering the following questions. You can use the regression tool in the data analysis toolpak, rather than performing regression manually in the spreadsheet.

i. (2) State the dependent and independent variables and explain your selection.

ii. (2) Make a scatterplot of the variables and comment on the relationship between X and Y evident from the scatterplot.

iii. (5) Perform a linear regression on the dataset. What is the regression equation? What is the r2? What does the r2 mean?

iv. (2) Add a regression line to the scatter

v. (2) Provide a .05 level of significance test for slope being 0

vi. (5) Show and comment on the residual plot. Are there any apparent violations of regression assumptions or outliers?

vii. (2) Calculate the expected net happiness of Hungary, which has a reading comprehension index of 470.

13. (20) In the worksheet, Q12 lists a data set that you want to investigate for the relationship between body weight and brain weight of some animals.

Show and comment on the scatterplot, discussing the apparent relationship between the two variables.

Conduct a regression analysis (in data analysis toolpak) for the two variables and report the results. In particular, comment on the strength of the relationship (i.e., coefficient of determination), the regression coefficients, their significance, and analyze the residual plot for violations of regression assumptions and outliers.

Try try to improve your initial regression by performing log transformations and/or eliminating outlier(s). After each change, perform a new regression analysis and report the results, making sure to discuss all of the points raised above. Pay particular attention to changes in the coefficient of determination and in the residual plot. After performing 2 regression analyses with transformed variables and/or removed outliers, compare your results and determine the best regression equation. Explain why you chose that particular result.

1.

The “Q3” tab in the midterm excel data sheet contains data on the maximum August temperatures for Denver

between 1958 and 2017. Calculate the

means

and the

standard deviations

for the

entire sample

, and then

separately for 1958

-

1987 and 1988

-

2017

. Give the equations used to calculate the values.

2.

Transparency International compiles an index of corruption f

or over 150 world nations. The data for

these nations for 2018 is provided in the midterm excel file. The 24 nations for whom oil and gas make up

more than 50% of export revenues are separated from the other 127 nations. Note that higher values in

the inde

x indicate lower levels of corruption.

i.(4)

Find the means and variances of the corruption index for each of the two groups of countries.

ii.(8)

Construct a 95% confidence interval for the true mean difference between the two groups.

iii.(8)

Using t

-

st

at, determine whether the difference in mean corruption between the two groups of countries is

statistically significant at the 95% confidence level. Give your answer in the form of a classical hypothesis test.

3.

A “Happiness” statistic of different countries was compiled by the World Value Surve

y. The

"Happiness (net)" statistic was calculated by the percentage of people who rated themselves as either

"quite happy" or "very happy" minus the percentage of people who rated themselves as either "not very

happy" or "not at all happy". Now we want to

know if reading comprehension affects happiness. Conduct

a regression analysis for predicting “Happiness” from “Reading Comprehension” by answering the

following questions.

You can use the regression tool in the data analysis toolpak, rather than

performin

g regression manually in the spreadsheet.

i. (2)

State the dependent and independent variables and explain your selection.

ii. (2)

Make a scatterplot of the variables and comment on the relationship between X and Y evident from

the scatterplot.

iii. (5)

Perform a linear regression on the dataset. What is the regression equation? What is the r

2

? What

does the r

2

mean?

iv. (2)

Add a regression line to the scatter

v. (2)

Provide a .05 level of significance test for slope being 0

vi. (5)

Show and comment

on the residual plot. Are there any apparent violations of regression

assumptions or outliers?

vii. (2)

Calculate the expected net happiness of Hungary, which has a reading comprehension index of

470.

13. (20)

In the worksheet, Q12 lists a data set that

you want to investigate for the relationship between

body weight and brain weight of some animals.

Show and comment on the scatterplot, discussing the apparent relationship between the two variables.

Conduct a regression analysis (in data analysis toolpa

k) for the two variables and report the results. In

particular, comment on the strength of the relationship (i.e., coefficient of determination), the regression

coefficients, their significance, and analyze the residual plot for violations of regression as

sumptions and

outliers.

Try try to improve your initial regression by performing log transformations and/or eliminating outlier(s). After

each change, perform a new regression analysis and report the results, making sure to discuss all of the points

raise

d above. Pay particular attention to changes in the coefficient of determination and in the residual plot. After

performing 2 regression analyses with transformed variables and/or removed outliers, compare your results and

determine the best regression equ

ation. Explain why you chose that particular result.

1. The “Q3” tab in the midterm excel data sheet contains data on the maximum August temperatures for Denver

between 1958 and 2017. Calculate the means and the standard deviations for the entire sample, and then

separately for 1958-1987 and 1988-2017. Give the equations used to calculate the values.

2.Transparency International compiles an index of corruption for over 150 world nations. The data for

these nations for 2018 is provided in the midterm excel file. The 24 nations for whom oil and gas make up

more than 50% of export revenues are separated from the other 127 nations. Note that higher values in

the index indicate lower levels of corruption.

i.(4) Find the means and variances of the corruption index for each of the two groups of countries.

ii.(8) Construct a 95% confidence interval for the true mean difference between the two groups.

iii.(8) Using t-stat, determine whether the difference in mean corruption between the two groups of countries is

statistically significant at the 95% confidence level. Give your answer in the form of a classical hypothesis test.

3.A “Happiness” statistic of different countries was compiled by the World Value Survey. The

"Happiness (net)" statistic was calculated by the percentage of people who rated themselves as either

"quite happy" or "very happy" minus the percentage of people who rated themselves as either "not very

happy" or "not at all happy". Now we want to know if reading comprehension affects happiness. Conduct

a regression analysis for predicting “Happiness” from “Reading Comprehension” by answering the

following questions. You can use the regression tool in the data analysis toolpak, rather than

performing regression manually in the spreadsheet.

i. (2) State the dependent and independent variables and explain your selection.

ii. (2) Make a scatterplot of the variables and comment on the relationship between X and Y evident from

the scatterplot.

iii. (5) Perform a linear regression on the dataset. What is the regression equation? What is the r2? What

does the r2 mean?

iv. (2) Add a regression line to the scatter

v. (2) Provide a .05 level of significance test for slope being 0

vi. (5) Show and comment on the residual plot. Are there any apparent violations of regression

assumptions or outliers?

vii. (2) Calculate the expected net happiness of Hungary, which has a reading comprehension index of

470.

13. (20) In the worksheet, Q12 lists a data set that you want to investigate for the relationship between

body weight and brain weight of some animals.

Show and comment on the scatterplot, discussing the apparent relationship between the two variables.

Conduct a regression analysis (in data analysis toolpak) for the two variables and report the results. In

particular, comment on the strength of the relationship (i.e., coefficient of determination), the regression

coefficients, their significance, and analyze the residual plot for violations of regression assumptions and

outliers.

Try try to improve your initial regression by performing log transformations and/or eliminating outlier(s). After

each change, perform a new regression analysis and report the results, making sure to discuss all of the points

raised above. Pay particular attention to changes in the coefficient of determination and in the residual plot. After

performing 2 regression analyses with transformed variables and/or removed outliers, compare your results and

determine the best regression equation. Explain why you chose that particular result.