Statistics

profileKatep50
Exam.pdf

Stat 423 Section 02 Spring 2020 Name ______________________________________ Exam 3 (100 points) ID Number __________________________ Part I. Workout Problems. Show solution in support of your answers. Unsupported answers will not receive full credit. (61 points) 1. A 2!"# fractional factorial involving factors A, B, C, D, E and F is to be run. Practitioners have these two sets of

generators in mind: Design 1 Generators: E=ABD and F=ACD Design 2 Generators: E=ABCD and F=ABD

a. Consider Design 1. Which treatments in this experiment will have both factors A and B at their high (+) levels? [6 pts]

b. Consider Design 1. Derive its defining relation and determine its resolution. [8 pts] c. The defining relation for Design 2 is I=CEF=ABDF=ABCDE. Which design (1 or 2) is better? Explain briefly

and give at least one reason for your choice. [3 pts]

2. A 2$"% fractional factorial was conducted to study the effects of four factors on the bond strength of an integrated circuit mounted on metallized glass substrate. The four factors (and their levels) that engineers identified as potentially important determiners of bond strength are listed in the table below.

Factor Levels A – Adhesive Type D2A (−) vs. H-1-E (+) B – Conductor Material Copper (−) vs. Nickel (+) C – Cure Time at 90°C 90 min (−) vs. 120 min (+) D – Deposition Material Tin (−) vs. Silver (+)

Let 𝛼& = main effect of A, 𝛽'= main effect of B, 𝜒( = main effect of C, 𝛿) = main effect of D, and 𝛾 = interaction effect. Summary statistics and the results of the Yates algorithm for computing fitted effects are given below.

Treatment

Replication

Sample Variance 𝒔𝟐

Sample Mean 𝒙+

Yates Algorithm Cycle 1 Cycle 2 Cycle 3 Fitted Effect

(1) 5 2.452 73.48 157.36 314.54 650.84 81.355 ad 5 4.233 83.88 157.18 336.30 7.84 0.980 bd 5 0.647 81.58 166.60 4.42 2.92 0.365 ab 5 26.711 75.60 169.70 3.42 2.08 0.260 cd 5 0.503 87.06 10.40 −0.18 21.76 2.720 ac 5 8.562 79.54 −5.98 3.10 −1.00 −0.125 bc 5 1.982 79.38 −7.52 −16.38 3.28 0.410

abcd 5 3.977 90.32 10.94 18.46 34.84 4.355

a. The replications and the sample variances of the 8 treatment combinations are given in the 2nd and 3rd columns, respectively, in the table above. Compute 𝑟(0.05) for judging if a fitted effect is statistically significant at the 𝛼 = 0.05 level. Note that the sum of the variances is 49.067. [8 pts]

b. The generator and defining relation were D=ABC and I=ABCD, respectively. If you have no answer in (a), use 𝒓(𝟎.𝟎𝟓) = 𝟎.𝟒𝟎𝟎. i. Based on your answer in (a), is the fitted effect 0.980 statistically significant? [2 pts]

Select one: NO YES

ii. What sum of effects does the fitted effect 0.980 estimate? Your answer should be a sum of subscripted/superscripted Greek letters (e.g., 𝛼# + 𝛾##+,). [4 pts]

3. The diameter 𝑥 of a tree at breast height (in cm, relatively easy to measure) is used to predict the height 𝑦 of a tree (in m, difficult to measure). Summary data on 𝑛 = 36 white spruce trees (in British Columbia) are given below.

B𝑥 = 655.1, B𝑥# = 12711.47, B𝑦 = 644.7, B𝑦# = 11824.45,

B𝑥𝑦 = 12112.34, 𝑆-- = 790.4697, 𝑆𝑆𝑇 = 𝑆.. = 278.9475, �̅� = 18.1972, 𝑦G = 17.9083.

a. Do some calculations to show that the least-squares line is 𝑦H = 9.1468 + 0.4815𝑥. [10 pts] b. Compute the sample correlation 𝑟 between 𝑥 and 𝑦. Give a quick interpretation. [6 pts] Interpretation: c. Construct an interval with 95% confidence for the height of a new spruce tree with a breast height diameter 𝑥

= 19 cm. Plug in numbers in a formula and do not simplify. Use 𝑛 = 36, �̅� = 18.1972, 𝑆-- = 790.4697, 𝑠# = 𝑀𝑆𝐸 = 2.815. [8 pts]

Problem 3 (continued). d. A scatterplot of the data and 𝑆𝑆𝐸 values for the linear and quadratic model fits are given below. Also, the tota

l sum of squares for either model is 𝑆𝑆𝑇 = 1824.45. Which of the two models provides a better description o f the data? Explain briefly. In your explanation, use both graphical AND numeric results [6 pts]

Part II. Multiple Choice. Circle the letter of the correct/best answer. (39 points) 1. Which of the following statements is NOT true?

A. The simple linear regression model is 𝑦 = 𝛽/ + 𝛽%𝑥 + 𝜖 where the 𝜖 is a random variable that is normally distributed with mean 0 and variance 𝜎#.

B. In simple linear regression, the independent variable 𝑥 is also referred to as the predictor or explanatory variable.

C. The goal of least-squares regression is to find the curve that maximizes the sum of the squared distances between the curve and the data points.

D. A first step in a regression analysis involving two variables is to construct a scatter plot.

2. In fitting 𝑦 = 𝛽/ + 𝛽%𝑥 + 𝜖 through data, (1.7,2.5) is a 90% confidence interval for 𝛽%. What is a 90% confidence interval for the mean change in 𝑦 when we reduce 𝑥 by 0.65. A. (−1.625,−1.105) B. (1.05,1.85) C. (1.105,1.625) D. (2.35,3.15)

3. Which of the following is/are TRUE about the correlation coefficient 𝑟 between 𝑥 and 𝑦? A. For the simple linear regression, 100% × 𝑟# = 𝑅# where 𝑅# is the coefficient of determination (in %). B. A correlation of 𝑟 = −0.87 is weaker than a correlation of 𝑟 = 0.25. C. The correlation 𝑟 is a measure of the strength of the linear relationship between 𝑥 and 𝑦. D. If 𝑟 = −0.1, and we convert 𝑥 (in inches) to centimeters (1 in = 2.54 cm), then the correlation becomes

2.54 × (−0.1) = −0.254. E. Both (A) and (C).

Model 𝑺𝑺𝑬 𝑦 = 𝛽/ + 𝛽%𝑥 + 𝜖 95.703 𝑦 = 𝛽/ + 𝛽%𝑥 + 𝛽#𝑥# + 𝜖 63.007

5 10 15 20 25 30

8 10

12 14

16 18

20 22

Breast-Height Diameter x

H ei

gh t

y

4. Is 𝑦 = 𝛽/ ⋅ 𝛽%0 intrinsically linear? If yes, what is appropriate transformation to obtain a linear model? Recall: log(𝑎𝑏) = log(𝑎) + log(𝑏), log(𝑎1) = 𝑏 ⋅ log(𝑎) A. No. B. Yes, log(𝑦) = log(𝛽/) + log(𝛽%) ⋅ 𝑥 C. Yes, log(𝑦) = log(𝛽/) + 𝛽% ⋅ log (𝑥) D. Yes, log(𝑦) = log(𝛽/) + 𝛽% ⋅ 𝑥

For Problems 5 to 8: A study investigated the effects of 𝑥% = Seal Temperature, 𝑥# = Cooling Bar Temperature, and 𝑥2 = % Polyethylene Additive on the seal strength 𝑦. The three models in column of the table below were fit to the data.

There were 𝑛 = 20 observations, and the total sum of squares (for all 3 models) is 𝑆𝑆𝑇 = 82.17 (total df = 19). 5. What is 𝑆𝑆𝐸 for Model (1)?

A. 30.96 B. 51.21 C. 21.36 D. 60.81

6. What is 𝑅34'

# for Model (2)? A. 49.42% B. 76.66% C. 23.34% D. 84.03%

7. What is the F statistic for testing 𝐻/: {𝛽% = 𝛽# = ⋯ = 𝛽5 = 0} versus 𝐻3: {𝐻/ is false.} with model (3).

A. 6.59 B. 9.69 C. 3.23 D. 5.36

8. In the fit of Model (2), we get �̂�6 = −0.5 and 𝑠78! = 0.3552 and find that the P-value is 0.1827 for testing

𝐻/:𝛽6 = 0 versus 𝐻3: 𝛽6 ≠ 0. What are the 𝑡 test statistic and conclusion at 𝛼 = 0.10 significance level? A. 𝑡 = −1.41. There is NO significant interaction between 𝑥% and 𝑥2. B. 𝑡 = 1.41. The predictor 𝑥6 has NO significant effect on the response 𝑦. C. 𝑡 = −0.84. There is NO significant interaction between 𝑥% and 𝑥2. D. 𝑡 = −1.41. There is significant interaction between 𝑥% and 𝑥2.

Model 𝑹𝟐 𝑹𝒂𝒅𝒋 𝟐 𝑺𝑺𝑬

(1) 𝑦 = 𝛽/ + 𝛽%𝑥% + 𝛽#𝑥# + 𝛽2𝑥2 + 𝜖

37.68%

25.99%

?

(2) 𝑦 = 𝛽/ + 𝛽%𝑥% + 𝛽2𝑥2 + 𝛽$𝑥%# + 𝛽<𝑥## + 𝛽!𝑥2# + 𝛽6𝑥%𝑥2 + 𝜖

84.03%

?

13.1231

(3) 𝑦 = 𝛽/ + 𝛽%𝑥% + 𝛽#𝑥# + 𝛽2𝑥2 + 𝛽$𝑥%# + 𝛽<𝑥## + 𝛽!𝑥2# + 𝛽=𝑥%𝑥# + 𝛽6𝑥%𝑥2 + 𝛽5𝑥#𝑥2 + 𝜖

85.57%

72.58%

11.8593

9. Which of the following is not true about 2>"? fractional factorial studies? A. The loss of information and ambiguity (confounding) can be held to a minimum by careful planning and

wise analysis. B. A loss of information is usually expected because we are unable to observe responses at all of the 2>

factor combinations. C. If two effects are aliased or confounded together, it means that we can discuss their significance together

but not apart from each other. D. None of the above.

10. A fitted multiple regression model is 𝑦H = 10 − 4𝑥% + 3𝑥#. If 𝑥% is decreased by 2, while holding 𝑥# fixed, then then we can expect 𝑦 A. to increase by 8 B. to decrease by 6 C. to increase by 6 D. to decrease by 8 E. remain the same

11. Suppose that the least-squares line is 𝑦H = −2.12 + 15.75𝑥. If the 𝐹 test statistic for testing 𝐻/: 𝛽% = 0 against 𝐻3: 𝛽% ≠ 0 is 𝐹 = 2.1 (from the ANOVA table), what is the 𝑡 test statistic for testing the same hypotheses? A. 𝑡 = 1.45 B. 𝑡 = −4.41 C. 𝑡 = −1.45 D. 𝑡 = 4.41

12. Which of the following statements is true? A. Model 1 with more predictor terms may not necessarily be a better than Model 2 with fewer predictor

terms even though Model 1’s coefficient of multiple determination 𝑅# is larger. B. To balance the cost of using more parameters against the gain in the coefficient of multiple determination

𝑅#, many statisticians use 𝑅34' # = {the adjusted 𝑅#}.

C. An objective of regression analysis is to find a model that is simple (relatively few parameters) and provides a good fit to the data.

D. All of the above.

13. A study investigated the effects of three explanatory variables 𝑥%, 𝑥#, and 𝑥2 on the response 𝑦. The model 𝑦 = 𝛽/ + 𝛽%𝑥% + 𝛽#𝑥# + 𝛽2𝑥2 + 𝜖 provided a good 𝑅# value. Which of the following is NOT appropriate in assessing the (statistical) significance of the relationship between 𝑥2 and 𝑦? A. a 𝑡 test of 𝐻/: 𝛽2 = 0 versus 𝐻3: 𝛽2 ≠ 0 B. a prediction interval C. a confidence interval for 𝛽2 D. the sample correlation between 𝑥2 and 𝑦 E. a comparison of 𝑅34'

# values for 𝑦 = 𝛽/ + 𝛽%𝑥% + 𝛽#𝑥# + 𝛽2𝑥2 + 𝜖 and 𝑦 = 𝛽/ + 𝛽%𝑥% + 𝛽#𝑥# + 𝜖