Linear Model
STATGR5205 Midterm - Fall 2018 - October 17
Name:
UNI:
The GR5205 midterm is closed notes and closed book. Calculators are allowed. Tablets,
phones, computers and other equivalent forms of technology are strictly prohibited. Students
are not allowed to communicate with anyone with the exception of the TA and the professor.
Students must include all relevant work in the handwritten problems to receive full credit.
The exam time is 75 minutes and once the exam time has expired, please do not discuss the
midterm until after you leave the room (or log o↵ of Zoom).
By signing below, you are acknowledging that if you do not follow the guidelines, then
you will receive a score of zero on the midterm and potentially face more severe conse-
quences.
Signature: Date:
Problem Points Student’s Score
1 10
2 20
3 10
4.i 15
4.ii 15
5.i 10
5.ii 20
Total 100
1
Problem 1 [10 pts]
Consider the simple linear regression and regression through the origin models
defined respectively by
(1) Yi = �0 + �1xi + ✏i, i = 1, 2, . . . , n, ✏i iid⇠ N(0, �2),
and
(2) Yi = �1xi + ✏i, i = 1, 2, . . . , n, ✏i iid⇠ N(0, �2).
Assume that the true data generating process comes from the simple linear regression model
model (1). However, suppose that you estimate �1 using the regression through the origin
least squares estimator �̂1 = P
xiyi/ P
x 2 i . Compute the bias of �̂1., i.e., compute
E[�̂1] � �1.
2
Problem 2 [20 pts]
Consider the least squares estimated simple linear regression model
(3) Ŷi = �̂0 + �̂1xi = nX
j=1
hijYj,
where hij are the hat-values.
Consider vectors 1 = � 1 1 · · · 1
�T , x =
� x1 x2 · · · xn
�T and e =
� e1 e2 · · · en
�T .
Note that e is the vector of residuals based on the least squares estimated model Ŷi defined
in (3). Using properties of the hat-values, prove that any vector in the span of 1 and x is
orthogonal to the residual vector e.
Note: You are not allowed to use the hat-matrix in this computation. The results must be
proved using the scalar form.
Note: For partial credit, you can directly use properties of residuals. For full credit, you
are required to re-prove the residual properties using the hat-values hij.
3
4
Problem 3 [10 pts]
Consider the least squares estimated multiple linear regression model
(4) Ŷ = X�̂ = HY,
where H is the hat-matrix and X is the design matrix of dimensions (n⇥p). Using properties of the hat-matrix, prove that any vector in the column space of X is orthogonal to the
residual vector e.
5
Problem 4 [30 pts]
Consider the following toy dataset displayed in the scatter plot below. Let the predictor
variable be assigned as x, the response as Y and assign n as the sample size. Note that there
are n = 30 cases in this dataset.
Consider testing if the slope statistically di↵ers from -3, i.e., test the null/alternative
pair
H0 : �1 = �3 versus HA : �1 6= �3.
4.i Assuming the null hypothesis (H0 : �1 = �3) is true, derive the maximum likelihood estimator of the parameter �0. Note that you are not required to derive the maximum
likelihood estimator for �2 and you are not required to perform the second-derivative
check.
6
4.ii Using the R code and output displayed on page 9, test the if the slope statistically
di↵ers from -3, i.e., test the null/alternative pair:
H0 : �1 = �3 versus HA : �1 6= �3.
To receive full credit, compute both the t-statistic and f-statistic for testing the above
null/alternative pair. Also compute the correct p-value and state the statistical con-
clusion.
Note: pt(t.calc,28)=0.9763728
7
8
R code and Output:
> # Model 1 with Summary and ANOVA #-----------------------------------------------
> summary(lm(Y~x))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 34.7573 1.4638 23.75 < 2e-16 ***
x -2.4985 0.2416 -10.34 4.57e-11 ***
---
Residual standard error: 3.546 on 28 degrees of freedom
Multiple R-squared: 0.7925,Adjusted R-squared: 0.7851
F-statistic: 106.9 on 1 and 28 DF, p-value: 4.574e-11
> anova(lm(Y~x))
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
x 1 1344.46 1344.46 106.92 4.574e-11 ***
Residuals 28 352.07 12.57
> # Model 2 with ANOVA #-----------------------------------------------
> x.new <- -3*x
> anova(lm(Y~x.new))
Response: Y
Df Sum Sq Mean Sq F value Pr(>F)
x.new 1 1344.46 1344.46 106.92 4.574e-11 ***
Residuals 28 352.07 12.57
> # Model 3 with ANOVA #-----------------------------------------------
> Y.new <- Y+3*x
> anova(lm(Y.new~1))
Response: Y.new
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 29 406.23 14.008
> # Model 4 with ANOVA #-----------------------------------------------
> Y.new <- Y-3*x
> anova(lm(Y.new~1))
Response: Y.new
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 29 6863.4 236.67
9
Problem 5 [30 pts]
Consider the following study examining the e↵ects of di↵erent amounts of THC, the major
ingredient in marijuana, injected directly in the brain. The response variable (Y ) is locomo-
tor activity. In this approach, the researchers run an ANCOVA model (or multiple linear
regression model) on the pos-injection scores, partialling out pre-injection di↵erences. Such
a procedure would adjust for the fact that much of the variability in post-injection activity
could be accounted for by the variability in pre-injection activity. Note that variables D1 through D4 are indicator variables representing the di↵erent dosage levels and xi is the
continuous variable pre-injection activity. Hint: this is the exact problem statement
from the sample midterm but I am asking di↵erent research questions.
control .1 micro g (D1) .5 micro g (D2) 1 micro g (D3) 2 micro g (D4)
X Y X Y X Y X Y X Y
Pre Post Pre Post Pre Pos Pre Pos Pre Pos
4.34 1.30 1.55 0.93 7.18 5.10 6.94 2.29 4.00 1.44
3.50 0.94 10.56 4.44 8.33 4.16 6.10 4.75 4.10 1.11 ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... 7.35 2.35 ...
... ...
... ...
... 6.30 4.84 ...
...
1.90 0.93 9.58 4.22 5.54 2.93
n1 = 10 n2 = 10 n3 = 9 n4 = 8 n5 = 10
The statistical model used in our setting is:
Yi = �0 + �1Di1 + �2Di2 + �3Di3 + �4Di4 + �5xi + ✏i
i = 1, . . . , 47, ✏i iid⇠ N(0, �2),
where
Di1 =
( 1 if .1 micro grams
0 if otherwise , Di2 =
( 1 if .5 micro grams
0 if otherwise
Di3 =
( 1 if 1 micro grams
0 if otherwise , Di4 =
( 1 if 2 micro grams
0 if otherwise
and xi is the respondent’s pre-injection locomotor activity.
10
5.i Run a hypothesis testing procedure to see if the low THC dosage group (.1 micro g)
statistically influences post-locomotor activity, after controlling for all other covariates.
To receive full credit, state the correct null/alternative pair, state the test statistic
and identify the correct P-value. To complete this exercise, use the R code & matrix
computations displayed on Page 13. Assume ↵ = 0.05 significance. Don’t worry
about a Bonferroni adjustment.
5.ii Run a hypothesis testing procedure to see if the low TCH dosage group (.1 micro g)
has the same impact on post-locomotor activity as the high dosage group (2 micro g).
To receive full credit, state the correct null/alternative pair, compute the test statistic
and identify the correct P-value. To complete this exercise, use the R code & matrix
computations displayed on Page 13. Assume ↵ = 0.05 significance. Don’t worry
about a Bonferroni adjustment.
11
12
R code, summary output and p-value:
# Full model SSE and Summary #-----------------------------------------------
> full.model <- lm(Y~D1+D2+D3+D4+X,data=THC.Data)
> summary(full.model)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.37384 0.27689 -1.350 0.1844
D1 0.61834 0.32855 1.882 0.0669 .
D2 1.45653 0.35656 4.085 0.0002 ***
D3 0.66599 0.34231 1.946 0.0586 .
D4 0.21998 0.31456 0.699 0.4883
X 0.43466 0.04918 8.838 4.84e-11 ***
---
Residual standard error: 0.7006 on 41 degrees of freedom
Multiple R-squared: 0.8042,Adjusted R-squared: 0.7803
F-statistic: 33.67 on 5 and 41 DF, p-value: 1.704e-13
# P-value #-----------------------------------------------
2*(1-pt(abs(t.calc),41))=0.2220547
Matrix computations:
(XT X)�1XT Y =
0
BBBBBBB @
�0.37384 0.61834
1.45653
0.66599
0.21998
0.43466
1
CCCCCCC A
(Y � Xb�)T (Y � Xb�) = 20.12544
(XT X)�1 =
0
BBBBBBB @
0.15620 �0.06655 �0.04812 �0.07224 �0.09058 �0.01664 �0.06655 0.21991 0.13088 0.11652 0.10561 �0.00990 �0.04812 0.13088 0.25900 0.12562 0.10869 �0.01536 �0.07224 0.11652 0.12562 0.23871 0.10465 �0.00822 �0.09058 0.10561 0.10869 0.10465 0.20158 �0.00279 �0.01664 �0.00990 �0.01536 �0.00822 �0.00279 0.00493
1
CCCCCCC A
13