statistics 2
BUS 305: PRACTICE PROBLEMS EXAM 2
Simple Regression Problems
1) Of the following two graphs, indicate which one has a correlation coefficient that is closer to 0.
Scatterplot A Scatterplot B
2) Which of the following describes the relationship between the variables in the graphs in #1 above?
A. positive correlation
B. negative correlation
C. perfect positive correlation
D. perfect negative correlation
E. no correlation
3) If the scatterplot below depicted a set of bivariate data with independent variable X and dependent variable Y, would a regression model be appropriate for this data? Why or why not?
4) If the scatterplot below depicted a set of bivariate data with independent variable X and dependent variable Y, would a regression model be appropriate for this data? Why or why not?
5) Which of the following would represent the regression line for this data set? Why? Explain what characteristic of the line makes it the regression line.
6) Suppose your company is interested in discovering if there is a relationship or correlation between production volume (in number of units) and costs (in $). Which would be more appropriate for this data – to run a correlation analysis or to run a regression analysis? Explain.
7) Suppose your company is interested in discovering if there is a relationship between production volume (in number of units) and costs (in $).
|
ProdVolume |
Cost |
|
20 |
137 |
|
25 |
122 |
|
28 |
123 |
|
30 |
120 |
|
35 |
106 |
|
40 |
109 |
|
45 |
97 |
Which of the following is the most appropriate statistical analysis to run?
A. ANOVA
B. Multiple linear regression
C. Simple linear regression
D. T-test for the mean of a single population
8) Suppose you run a regression for variables X and Y, and find that r2 = 0.64, that the t-statistic for the hypothesis test H0: 1 = 0 is 1.31, and that the p-value for that test is 0.117. Then:
a) r = ______________
b) t-statistic for the hypothesis test H0: = 0 equals (give a number):_____________
c) p-value for the hypothesis test H0: = 0 equals (give a number): _________________
d) What do you conclude about the existence of a significant correlation between X and Y in the population? Explain.
9) Provide about one or two sentences to answer each question.
a) In a simple regression model, what is the difference between the 1 and b1?
b) Why are outliers problematic in a multiple regression model?
10) Given the following data and scatterplot, determine if a simple linear regression model is appropriate for this data. If so, generate the regression output using StatCrunch or Excel. If not, explain why linear regression is not appropriate.
|
ProdVolume |
Cost |
|
|
|
|
|
|
|
20 |
137 |
|
|
|
|
|
|
|
25 |
122 |
|
|
|
|
|
|
|
28 |
123 |
|
|
|
|
|
|
|
30 |
120 |
|
|
|
|
|
|
|
35 |
106 |
|
|
|
|
|
|
|
40 |
109 |
|
|
|
|
|
|
|
45 |
97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11) When answering questions (a) and (b) below, refer to the following StatCrunch output from a regression model that asserts that the number of near misses per year (Y) of commercial airliners is a linear function of the number of flights per year (X).
(a) Test for a linear relationship between near_misses and num_flights by reading the appropriate values from the output above. Be sure to indicate a test statistic, a p-value, and a conclusion as to whether or not there is a relationship.
(b) What percentage of the variation in the number of near misses is explained by the number of flights? Do you think this is a good regression model?
(c) What is the correlation between misses and flights? Is there a strong relationship between these variables? Explain.
(d) Write the regression line and then use it to calculate the predicted number of near misses if the number of flights is 100. Does this prediction make sense? Explain. Is it wise to make predictions with this model? Why or why not? (Refer to a part of the output to back up your conclusions.)
(e) Interpret the value of b1, the sample slope. Does this value appear to make sense? Explain.
Multiple Regression Problems
12) Provide one or two sentences to answer each of these questions.
a. Briefly explain the difference between multiple and simple regression.
b. What is multicollinearity in a multiple regression model, and why is it problematic?
c. How do you incorporate qualitative/categorical variables into a regression model? Be specific about what kind of variable is added to the model and what values that variable can be.
13) Suppose you want to try to estimate the miles per gallon of various car types by using their engine size (number of cylinders), cab space, horsepower, top speed and weight.
Which of the following is the most appropriate statistical analysis to run?
A. ANOVA
B. Multiple linear regression
C. Simple linear regression
D. T-test for the mean of a single population
14) Given the following data set, generate the multiple regression output for the model that states that MPG of a car is a linear function of EngineSize, CabSpace, Horsepower, TopSpeed, and Weight . Use StatCrunch or Excel. (See Excel file, PracticeExam2data.xlsx to copy the entire data set.)
|
MAKE/Model |
EngineSize |
CabSpace |
HorsePower |
TopSpeed |
Weight |
MPG |
|
GM/GeoMetroXF1 |
4 |
89 |
49 |
96 |
17.5 |
65.4 |
|
GM/GeoMetro |
4 |
92 |
55 |
97 |
20 |
56 |
|
GM/GeoMetroLSI |
4 |
92 |
55 |
97 |
20 |
55.9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
BMW750IL |
6 |
119 |
295 |
157 |
45 |
16.7 |
|
Rolls-RoyceVarious |
8 |
107 |
236 |
130 |
55 |
13.2 |
15) Use the following Excel output from a multiple regression model to answer questions (a) - (d). The model asserts that the sale price of an item is a function of both the original price, and the manufacturer’s suggested retail price (MSRP).
a) What does the F-statistic and its p-value tell you about the overall significance of the model in terms of the effects of Orig_Price and MSRP on the price of an item?
b) Which, if any, of the independent variables appear to affect the sale price (Y)? Indicate any numbers from the table you used to arrive at this conclusion.
c) State the regression equation and use it to predict the value of Y (sale price) corresponding to Original Price = 80 and MSRP = 100.
d) How much can you expect the sale price (Y) to increase as the MSRP increases by 1 unit? As Orig_Price increases by one unit?
e) How good/effective is this model? Are you comfortable using this regression equation to predict prices? Why or why not?
16) Consider the data in the file PracticeExam2data.xls. This data shows 82 cars and measures several characteristics of each. Use this data to develop the BEST/most efficient multiple regression model for predicting how many miles per gallon (MPG) that vehicles get (you may have to run more than one). Once you have your final model, explain why this was the best model possible using the discussion points from class.
7 10 11 12 13 14 15 16 17 18 19 20 21 22 23 8 9 11 12 13 14 14 16 17 16 19 20 21 21 23 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 8 8 8 7 7 7 6 504 503 506 504 500 490 486 476 464 450 434 416 396 374 350 515 515 525 518 515 506 500 494 483 480 460 438 419 398 375 496 497 508 502 500 492 487 482 472 470 451 430 412 380 370 500 470 486 450 440 478 469 TopSpeed 20 20 20 20 20 20 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 22.5 25 25 25 25 25 25 25 25 25 25 27.5 27.5 27.5 27 .5 27.5 27.5 27.5 27.5 27.5 30 30 30 30 30 30 30 30 30 30 30 30 35 35 35 35 35 35 35 35 35 35 35 35 40 40 40 40 40 40 40 40 40 40 40 40 45 45 45 45 45 45 45 97 97 105 96 105 97 98 98 107 103 113 113 103 100 103 106 113 106 109 110 101 111 105 111 110 110 110 109 105 112 103 103 111 111 102 106 106 109 109 120 106 106 109 106 105 108 108 107 120 109 109 109 109 123 125 115 102 109 104 105 120 107 114 114 117 122 122 122 122 118 130 121 121 110 110 121 125 140 137 138
Weight
TopSpeed
930 903.3820988091868 884.45970502358307 947.63033660888789 910 894.06482498743821 880 870.96684087367044 834.21152031879058 860.20835084087196 814.07540116065684 865.08090820166865 828.46674858822792 840.80518498698302 816.58905716279833 789.99769584279852 736.238172127947 763.9557685650376 778.4703772327498 726.44210283383939
730 1103.3820988091863 784.45970502358307 1047.6303366088866 710 1094.064824987438 780 970.96684087367044 634.21152031879058 1060.2083508408718 714.07540116065684 665.08090820166865 1028.466748588228 740.80518498698302 916.58905716279833 589.99769584279852 936.238172127947 663.9557685650376 878.4703772327498 526.44210283383939
X 233 266 400 266 300 233 300 266 233 266 233 300 333 266 266 266 333 400 266 367 367 233 500 1800 2599.92 1000 2000 750 1500 1399.99 1600 1649.93 1099.97 1799.99 2199.9899999999998 1499.93 1199.95 1399.99 1999.99 2599.9899999999998 1299.99 2200 2300 1349.7
Page 5
MAKE/ModelEngineSize
CabSpaceHorsePowerTopSpeedWeightMPG
GM/GeoMetroXF14
89499617.565.4
GM/GeoMetro4
9255972056
GM/GeoMetroLSI4
9255972055.9
SuzukiSwift4
92701052049
DaihatsuCharade4
9253962046.5
GM/GeoSprintTurbo4
89701052046.2
GM/GeoSprint4
9255972045.4
HondaCivicCRXHF4
50629822.559.2
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.931
R Square
0.866
Adjusted R Square
0.854
Standard Error
16.991
Observations
25
ANOVA
df
SS
MS
F
Significance F
Regression
2
41129.41
20564.7
71.23
2.45E-10
Residual
22
6351.15
288.7
Total
24
47480.56
Coefficients
Standard Error
t Stat
P-value
Lower 95%
Upper 95%
Intercept
-7.62
31.05
-0.245
0.808
-72.00
56.77
Orig_Price
1.01
0.10
10.087
1.031E-09
0.81
1.22
MSRP
-0.08
0.11
-0.727
0.475
-0.30
0.15