CORRELATION AND SIMPLE REGRESSION

profileboyupe
example2_2012_midterm_exam_answers.docx

Name ______________________________ Signature ___________________________

QSO 510 Quantitative Analysis for Decision Making

Winter 2012 Mid-term Exam - Instructor: Dr. Derek Kane

Instructions

1. This exam is taken in-class, you may use any notes or books that you care to. You may only use your own notes.

2. The exam only requires the use of a hand calculator. You may use notes you have taken on the

3. Answer all questions in the context of the problem. General answers are not expected.

4. You must show all steps including formulas used and all calculations done to arrive at the final answers. Incomplete solutions will receive partial credit.

5. Use at least four significant digits at all intermediate steps. Round off the final answers appropriately. Note: 0.0042 is only two significant digits as leading zeros are not considered significant. Trailing zeros are considered significant.

6. You only need to do four problems. If you do all five problems indicate which four you want me to grade.

7. You are welcome to ask questions you have on the problems. Please do not ask any questions relating to the solution of any problem.

8. You must work on the exam by yourself.

(For Instructor’s use)

Problem

Points

1

2

3

4

5

Total

Problem 1 (25 points)

You have an assembly line which produces 5kg bags of flour with a standard deviation of 0.5kg.

a) Assuming the distribution of weight is normal, what is the chance any single bag’s weight is between 4.8kg and 5.2kg?

b) If you chose 60 bags at random, what would be the expected average weight of the bags in your sample? What would the standard deviation of the sample average be? What is the shape of this distribution? Give reasons for your answers .

c) If you pulled a sample of 64 bags, what is the chance that you would find the average weight was less than 4.95kg?

Write your answers in the space below and continue on the next page.

a) We have

Calculate the z-scores

Look up the probabilities and finish.

b) The mean is still 5, the standard deviation is

The shape is Gaussian because we have more than 30 items in the sample and the Central Limit Theorem applies.

c) We have

Calculate the z-scores

Look up the probabilities and finish.

Problem 2 (25 points)

We want to look at US attitudes toward the federal government. Suppose that we sample 1011 US citizens about whether they are satisfied with the size and power of the federal government and find that 293 are satisfied with the size and power of the federal government.

a) Compute the sample proportion and sample standard deviation of people who are satisfied with the federal government’s size and power.

b) Construct a 95% confidence interval for people who support President Obama’s decision.

c) If you are working for CNN, how would you explain the confidence interval to your audience? Does the confidence interval seem small enough? If not, how would you make the interval smaller? Provide a clear and complete answer.

Begin your answer below and continue on the next page.

a) The sample proportion is:

0.2898

The standard deviation is

b) The 95% confidence interval is

c) We are 95% sure that the true portion of the population satisfied with the federal government’s size and power is in this range. Make the interval smaller by using a larger sample.

Problem 3 (25 points)

According to a recent Gallup Poll 63% of all US adults are dissatisfied with the size and influence of major corporations. You perform a survey of 100 US citizens who identify themselves as political independents and find that 71 are dissatisfied with the size and influence of major corporations. Perform a hypothesis test to determine whether your survey indicates that political independents have a different view of corporate influence than the general US population.

Begin your answer below and continue on the next page.

Step 1:

H0:

Ha:

Step 2:

z-test because it is a population proportion problem.

Step 3:

Step 4:

Threshold is (95% confidence, two-sided test):

Accept Ha if z-score < -1.96

Accept H0 if -1.96 ≤ z-score ≤ 1.96

Accept Ha if z-score > 1.96

Step 5:

Step 6:

Calculate the z-score

Accept the null hypothesis, H0.

Step 7:

We don’t have statistical evidence indicating that political indenpendents differ from the rest of the population.

Problem 4 (25 points)

The following table shows the life expectancy for 5 countries and the logarithm to the base 10 (log10) of income in these same countries.

Log10 income

Life Expectancy

SSXX

SSXY

SSYY

2.88

48.5

0.813604

16.70504

342.9904

3.17

53.5

0.374544

8.27424

182.7904

3.78

75.9

4E-06

-0.01776

78.8544

4.40

80.3

0.381924

8.20704

176.3584

4.68

76.9

0.806404

8.87224

97.6144

3.782

67.02

2.37648

42.0408

878.608

(a)

Compute the values of, , SSXX, SSYY, SSXY.

(b) Determine the regression equation using life expectancy as the dependent variable.

(c) Compute the standard error of the estimate se.

(d)

Determine 95% confidence interval for life expectancy of a country which has a log10 income of 3.0.

Begin your answer in the space below and continue on the next page

b) The regression equation is given by:

So the equation is

c) The standard error is:

Part (c) and (d) are not in the upcoming midterm.

d) The confidence interval is

We have:

x0 = 3.0

n = 5

se = 6.705

t in the 0.025 column for 3 degrees of freedom is 3.1824

Putting it together is:

Problem 5 (25 points)

The following is a portion of the data collected to investigate the variables affecting meat consumption. The data includes whether the person is a man (=1) or woman, their age, their average income, and whether they live in an urban area:

Man

age

Income ($1000)

Weight (lbs)

urban

Meat consumption

0

16

67.86

145.8

0

282.8

0

41

30.03

188.2

1

260.7

1

45

-14.43

111.2

1

236.6

0

55

72.91

144.7

1

238.5

0

38

53.46

161.1

1

278.6

1

68

61.19

124.0

0

262.2

1

45

80.63

237.9

1

364.5

0

35

22.22

121.2

1

221.7

0

60

68.32

219.6

0

278.4

1

47

81.75

190.3

1

362.0

0

17

51.44

161.6

1

286.3

1

48

21.21

145.8

0

297.4

1

55

40.85

119.9

1

292.6

1

62

86.29

157.4

0

315.6

1

65

77.00

161.6

1

315.7

0

34

-18.17

188.4

1

211.8

1

39

49.47

186.9

1

322.4

0

20

3.18

144.1

1

235.7

0

58

14.17

158.3

1

228.3

0

30

46.30

165.4

1

279.8

0

20

32.66

128.6

1

229.9

0

31

32.66

121.3

1

250.3

1

43

96.00

161.2

0

327.8

0

41

58.27

207.9

0

292.6

Regression Statistics

Multiple R

0.9411

R Square

0.8857

Adjusted R Square

0.8796

Standard Error

14.72

Observations

100

ANOVA

 

df

SS

MS

F

Significance F

Regression

5

157833

31566.64

145.70

1.12E-42

Residual

94

20366

216.66

Total

99

178199

 

 

 

 

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Intercept

200.55

9.01

22.26

2.97E-39

182.66

218.44

Man

56.91

3.07

18.53

3.82E-33

50.82

63.01

age

-0.97

0.09

-10.69

6.32E-18

-1.15

-0.79

income

0.83

0.05

15.53

1.09E-27

0.72

0.93

weight

0.34

0.04

8.31

7.15E-13

0.26

0.42

urban

4.41

3.26

1.35

1.80E-01

-2.07

10.88

Answer the following questions based on the Excel output report. Support your answers with numbers from the output report. Use level of significance = 0.05.

a) Write the estimated multiple regression equation. Note: Use actual variable names and numbers. If using symbols, define them before using in the equation.

b) Clearly explain the meaning of b1 (the coefficient of Man). Note: Use actual variable names and numbers in answering your question. b1 is the slope is not a sufficient answer.

If the person is a man, they consume an additional 56.91kg of meat.

c) Clearly explain the meaning of b2 (the coefficient of age). Note: Use actual variable names and numbers in answering your question. b2 is the slope is not a sufficient answer.

For every year of age, the person eats 0.97kg less meat.

d) Clearly explain the meaning of b3 (the coefficient of income). Note: Use actual variable names and numbers in answering your question. b3 is the slope is not a sufficient answer.

For every additional $1000 in income, the person eats an additional 0.83kg of meat.

e) Clearly explain the meaning of b4 (the coefficient of weight). Note: Use actual variable names and numbers in answering your question. b4 is the slope is not a sufficient answer.

For every additional pound of weight, the person eats an additional 0.34kg of meat.

f) Clearly explain the meaning of b5 (the coefficient of urban). Note: Use actual variable names and numbers in answering the question. b5 is the slope is not sufficient.

If the person lives in an urban area, they eat 4.41kg more meat.

g) Is the regression equation significant? Give reasons for your answer. (Hint: The answer to this question requires test of the hypothesis: Ho: 1 = 2 = 3 = 4 = 0 vs. Ha: At least one j is not equal to zero, where j = 1, 4)

The regression equation is significant, because four of the variables are signficant

h) Which variables in the current equation are significant and which are not significant? Give reason for your answer. (Hint: The answer to this question requires test of hypothesis: Ho: j = 0 vs. Ha: j 0 for j = 1, 4).

Every variable but Urban is significant

i) Is there anything in this model you find questionable?

Begin your answer in the space below and continue on the next page

Y

Y

ˆ

X