statistics 2

profileStudent6423
bus305practiceexam2solutions.docx

BUS 305: SOLUTIONS TO

PRACTICE PROBLEMS EXAM 2

1) B

2) B

3) No, fan pattern (heteroscedasticity)

4) No, nonlinear relationship between X and Y

5) The black line is the regression line because it get closest to the sample points (minimizes error between the points and the line). The red line has a larger error; that is, larger total distance from points to the line.

6) Because it is reasonable to suppose that costs are dependent on production volume (since units are produced, directly resulting in costs), then regression is more appropriate for this data since regression is appropriate when an cause-and-effect relationship is assumed.

7) C

8) a) r = 0.8;

b) T = 1.31;

c) p = 0.117

d) There is no evidence of a significant correlation between X and Y in the population because we did not reject the null of H0: = 0.

9) Note: the following are not complete answers to Question 11; they are just enough for you to know whether your short answer addressed the correct things.

a) 1 = population slope, b1 = sample slope. On exam, would also want to address what you know (or don’t know) about each of these and how each is found.

b) An outlier can “drag” the regression line toward it. On the exam, also think about how this would affect the quality of your regression model and the predictions.

10) Yes, there appears to be a straight line relationship between the variables. Linear regression appears to be appropriate. The regression output is:

11) a) T = -0.09, p = 0.929, do not reject Ho, conclude there is no evidence of a relationship

b) R2 = 0.002 = 0.2%, No because value is very close to zero

c) Correlation = r = -0.0421. No, there is not a strong relationship between these variables. The correlation is nearly 0.

d) Regression line is Y^ = 1.26 – 0.035X.

Y^ = 1.26 – 0.035(100) = 1.26 – 3.5 = -2.24. No this does not make sense because you cannot have a negative number of near misses. It is not wise to predict with this model. The R-squared value is extremely low (essentially 0%), which means that there is no relationship at all between near misses and flights in this data. Therefore, predicting misses from flights is meaningless.

e) b1 = -0.035. As Number of flights increases by 1, we expect number of near misses to go down by 0.035. Or, put another way, as flights increases by 1000, we expect number of near misses to go down by 35. No, this does not make sense. We would assume that as flights increase, so would near misses.

12) a. Multiple regression is a direct extension of simple regression, except that now we have more than one independent (X) variable.

b. Note: the following is not a complete answer; it is just enough for you to know whether your short answer addressed the correct things: Multicollinearity is when the independent variables are highly correlated with one another. On the exam, also indicate how this affects the model, how one can identify if it is present, and what can be done to correct it.

c. Dummy variables are used to incorporate categorical variables into a regression model. A dummy variable is added that is “1” if the person/item has the characteristic and “0” if it does not.

13) B

14)

15) a) The since the p-value associated with the F-statistic is very small (note: 2.45E-10 means to move the decimal point 10 places to the LEFT, i.e. 0.000000000245), we would reject the null that says that none of the independent variables (Orig_Price and MSRP) have an effect on price. Therefore, we conclude at least one of these X variables does have an effect or relationship with price.

b) Orig_Price does affect Price, since p = 1.031E-09 = 0.000000001031 < 0.01, reject Ho: = 0

MSRP does NOT since p = 0.475 > 0.10, do not reject Ho: = 0

c) Regression equation: Y^ = -7.62 + 1.01X1 – 0.08X2; prediction: 65.18

d) MSRP -0.08, Orig_Price 1.01

e) R-squared = 0.866. This is a good model because r-square is close to 1 (100%), thus I would feel pretty confident that my predictions would be fairly accurate in this case.

16) Model 1: The first model run states that MPG is a linear function of: EngineSize, CabSpace, HorsePower, TopSpeed, and Weight. When that model is run, we find:

· R-square = 0.873

· Adjusted r-square = 0.865

· Significant variables: Horsepower, TopSpeed, Weight

· Insignificant variables: EngineSize, CabSpace

Because we have two insignificant variables, take them out.

Model 2: This model states that MPG is a linear function of HorsePower, TopSpeed, and Weight. We find that:

· R-square = 0.873

· Adjusted r-square = 0.868

· Significant variables: Horsepower, TopSpeed, Weight

· Insignificant variables: none

Taking out EngineSize and CabSpace did not change the R-squared value at all. Apparently, CabSpace did not explain any variation in MPG, so removing it clearly results in a better model (simpler with no loss of explanatory power). Since all of the independent variables left are significant, we find that this is the best possible model (removing any more would surely decrease R-squared).

Page 3

SUMMARY OUTPUT

Regression Statistics

Multiple R0.9583

R Square0.9183

Adjusted R Square0.9020

Standard Error4.1442

Observations7

ANOVA

dfSSMSFSignificance F

Regression1965.556965.55656.2210.00067

Residual585.87217.174

Total61051.429

CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

Intercept162.70076.385425.48020.0000146.2865179.1148

ProdVolume-1.45700.1943-7.49800.0007-1.9565-0.9575

SUMMARY OUTPUT

Regression Statistics

Multiple R0.9346

R Square0.8734

Adjusted R Square0.8651

Standard Error3.6750

Observations82

ANOVA

dfSSMSFSignificance F

Regression57081.0473441416.209104.8621.19E-32

Residual761026.41521713.50546

Total818107.462561

CoefficientsStandard Errort StatP-valueLower 95%Upper 95%

Intercept192.81223.7168.1300.000145.578240.047

EngineSize-0.1040.387-0.2670.790-0.8750.668

CabSpace-0.0150.023-0.6680.506-0.0610.030

HorsePower0.3930.0824.7960.0000.2300.556

TopSpeed-1.2980.246-5.2650.000-1.789-0.807

Weight-1.8470.220-8.4020.000-2.285-1.409