STAT305
STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST
Midterm Exam
Honour Code
In taking this midterm exam you are required to affirm your willingness
to abide by the spirit of academic integrity at SFU. Provide your name
and student number if you can affirm this statement: I understand that the
following activities are prohibited and will be considered cheating and agree
to not participate in any of the following activities.
• Looking at or copying from another SFU student’s midterm exam or materials while writing this midterm exam.
• Conferring or conversing with any party regarding the material content of this midterm exam during the course of this midterm exam.
• Having someone else take this midterm exam in my place.
• Distributing this midterm exam materials in any way.
• Misrepresenting the considerations that this midterm exam may be done within the usual time limitation. (Students with CAL or other
exceptions have deadlines not specified in this document or on canvas.)
This honour code is an undertaking for students to abide by both indi-
vidually and collectively. You must uphold both the spirit and letter of this
honour code.
1. Full Name:
2. Student number:
1 of 9
STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST
• This midterm exam is open book. You may consult any text, website or resource during the completion of this midterm as long as you do
not partake in the activities prohibited by the honour code above. To
be clear: Any content from students, TAs or instructors created be-
fore the beginning of this midterm exam (on discord, facebook,
emails, Piazza, canvas, videos, etc.) is allowed to be accessed and
reviewed during the midterm. If you require clarification about these
rules and instructions you may email me or post on Piazza.
• This midterm begins at October 21st 2:30PM PST and is due at Oc- tober 23rd 5:00PM and is to be submitted on crowdmark. You may
take any amount of time to complete this midterm provided that you
submit it before it is due. This due date constitutes the only time
limitation, and there is no other usual time limitation aside from this.
Students with CAL or other considerations may receive another due
date conforming to their requirements (in which case that due date
supersedes what is written here).
Instructions
Complete the first Problem of this midterm exam. And then, com-
plete only one of the remaining two Problems. This midterm exam
is marked out of 30. This midterm exam has three Problems each worth 15
marks. You must complete Problem 1 and also complete one (1) of Prob-
lem 2 or Problem 3. If you incorrectly complete all three problems, your
mark will be for the first Problem plus the minimum your marks for Prob-
lems 2 and 3. To complete a Problem, upload to crowdmark in the section
corresponding to the Problem. If you choose not to complete a problem, do
not upload. When numerical answers are required, partial marks will only
be awarded if work is shown. You may provide answers through any method
(screenshot, photograph, export from a word document, R markdown, etc.),
provided that an export to pdf is uploaded to crowdmark. You must also
upload a response to the honour code on crowdmark (also in pdf format).
2 of 9
STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST
„DO„ Problem 1: Which test is right?
For each of the scenarios below, indicate the single most correct answer.
i) Let X “ pX1, . . . ,Xnq be the blood pressure (measured in mmHg )
and let Y “ pY1, . . . ,Ynq be the cortisol level (measured in mcg/dL)
recorded for n “ 79 patients recruited for a study in a hospital (Xi
and Yi are measurements for the same patient). What test is most
appropriate to gather evidence towards the alternative hypothesis that
blood pressure is associated with cortisol level?
A) The two-sample paired t-test with the null hypothesis that the
means of X and Y differ.
B) The test with the null hypothesis that the Pearson correlation
coefficient between X and Y is zero.
C) The test with the null hypothesis that the regression coefficient
is zero in a linear regression with response variable X (blood
pressure) and explanatory variable Y (cortisol level).
(5 points)
ii) Suppose that a treatment is proposed to reduce the duration from
the time of infection date, to the time at which a first negative test
is recorded in people with mild COVID-19 (call this time period the
duration). Suppose that 27 people with mild COVID-19 (the study
population) are administered the treatment and 73 people with mild
COVID-19 are not administered the treatment (the control popula-
tion). Both populations are are sampled from patients tested at the
same clinic over the same period. Let the durations for the study
sample be X “ pX1,X2, . . .q, and the durations for the control sample
be Y “ pY1,Y2, . . .q. What test is most appropriate to gather evi-
dence towards the alternative hypothesis that the treatment reduces
the duration?
A) The one-sided two-sample unpaired t-test with H0: The mean of
X is greater than or equal to the mean of Y.
3 of 9
STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST
B) The one-sided two-sample unpaired t-test with the null hypothe-
sis that the mean of X is less than or equal to the mean of Y.
C) The test against the null hypothesis that the Spearman’s ranked
correlation coefficient between X and Y is zero.
D) The one-sided two-sample paired t-test against H0: The mean
difference between Xi and Yi is less than or equal to zero.
E) The two-sided two-sample paired t-test with the null hypothesis
that the mean difference between Xi and Yi is zero.
(5 points)
iii) Road vehicle accidents involving ambulances have more detrimental
outcomes than accidents involving other similarly sized vehicles (Ray
and Kupas, 2005). Measures to avoid such accidents are continually
being refined by organizations involved in emergency medical services.
Suppose that a city council is interested in knowing if adoption of such
measures has lead to an improvement over the last decade. Suppose
that the ratio between the number of accidents involving ambulances
(the numerator) and the number of kilometers driven by ambulances
(the denominator) has been recorded (rt with units number of acci-
dents per kilometer year) for each year t over the past decade. Which
single one of the following statistical quantities is most relevant for
investigating whether or not measures are leading to improvements?
A) The sample standard deviation of rt.
B) The sample mean of rt.
C) The Pearson correlation coefficient ρ between rt and t.
D) The regression coefficient for t in a linear regression with rt as
the response variable and t as the explanatory variable.
E) The regression coefficient for rt in a linear regression with rt as
the explanatory variable and t as the response variable.
(5 points)
4 of 9
STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST
„AND EITHER„ Problem 2: Bayes’ rule
A study was conducted to assess the sensitivity and specificity of four dif-
ferent human immunodeficiency virus (HIV) serology tests (Koblavi-Dème
et al. 2001). The Determine test was among the four, it was developed by
Abbott Laboratories (an American provider of health care, medical devices
and pharmaceuticals) and was found to have a true negative rate (the true
negative rate is also called specificity) of 99.4% and a true positive rate (the
true positive rate is also called sensitivity) of 100%. The true negative rate
of a test for a disease is the probability that someone without the disease
tests negative. The true positive rate of a test for a disease is the probabil-
ity that someone with the disease tests positive. HIV may be transmitted
from an expecting parent to their child by transmission during childbirth
or by transmission to the fetus during pregnancy (throughout, assume that
there’s no other way for a newborn to be infected). Treatment by the drugs
zidovudine or nevirapine has been shown to reduce the rate of these sorts
of transmission of HIV by 38% to 50% in the absence of other intervention
(Koblavi-Dème et al. 2001).
a) Suppose that an expecting parent is infected with HIV and they are
treated with zidovudine or nevirapine during pregnancy. Suppose that
after they give birth, a Determine serology test reports a positive test
for HIV. What is the probability that the child does not have HIV?
Round your answer to the nearest 10-th of a percent.
(6 points)
b) UNAIDS (an organization established by the United Nations Economic
and Social Council) estimates the prevalence of HIV in Côte d’Ivoire
among people aged 15-49 to be 2.6%. If a Determine serology test re-
ported a positive test for HIV in someone selected uniformly at random
among all people in Côte d’Ivoire aged 15-49, what is the probability
that the person does not have HIV? Round your answer to the nearest
10-th of a percent.
(4 points)
5 of 9
STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST
c) In the USA, according to the Centers for Disease Control (a public
health institute within the United States Department of Health and
Human Services), if someone has a positive serology test for HIV they
are not diagnosed as HIV-positive until a second follow-up test also
yields a positive test result. What is the probability that someone is
incorrectly diagnosed as HIV-positive (i.e., if someone is not infected
with HIV, what is the probability that their first test and also their
second follow-up test are both positive)? Suppose that both tests are
Determine serology tests, and also assume that the test results are
statistically independent. Express your answer in expected number of
events in a million (i.e. something like ‘a 36 in a million chance’ or
‘a one in a million chance’). Also: In one sentence, what is a possible
argument as to why the assumption of independence of the two test
results might be wrong? (Your argument does not have to be sound,
but it must be valid without being tautological).
(3 points)
d) What is the probability that an HIV infected expecting parent trans-
mits HIV to their child either during childbirth or through transmit-
ting HIV to the fetus during pregnancy, given that the parent has not
received treatment with the drugs zidovudine or nevirapine, and in
the absence of other intervention, according to the preamble of this
problem (in concordance with Koblavi-Dème et al. 2001)?
(2 points)
6 of 9
STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST
„OR„ Problem 3: Some t-tests in varsity sports
Suppose that you are doing a research assistantship in a lab that has de-
veloped a new curriculum for elementary school sports, designed to increase
participation in organized sports in undergraduate university programs, for
students that go on to enroll in university. We define participation through-
out this question as the number of months spent as a member of a varsity
sports team. Your supervisor has recruited two classes of elementary school
students. One class has 23 students and is offered the new curriculum (the
study class). The other class has 19 students and is not offered the new cur-
riculum (the control class). The classes are each taught exclusively by two
‘home room’ teachers (one teacher for the study class, and one for the control
control class) throughout the duration of the provision of the new curriculum
to the study class. Both classes are matched for age, and the median age at
which they might be expected to finish undergrad (should they pursue it)
is around 10 years later (Statistics Canada 2010). They are all at the same
elementary school in Vancouver. Students from each class receive follow-up
surveys 10 years after the beginning of the experiment, provided that they
have studied at an undergraduate level at any university during the interval.
Let X “ pX1, . . . ,Xnq be the participation (in months) of the students of
the study class responding to the final survey (with n indicating the number
of people responding to the survey from the study class, with n ď 23), and
let Y “ pY1, . . . ,Ymq be the participation of the students in the control class
(with m ď 19). Let X̄ and Ȳ be the sample means of X and Y , respectively.
A two-sided, unpaired two-sample t-test with unequal variances (the Welch
test) would provide evidence against the null hypothesis H0 : X̄ “ Ȳ , under
the assumptions of the test.
a) Ten years later, you are still collaborating on this project (your former
supervisor is now your colleague and it’s a long project) and you both
have now received the results. Imagine that you find n “ 10 and m “ 9
and that X and Y are given by the following column vectors:
7 of 9
STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST
X = ( Y = (
4 5 , 1 0 ,
0 , 0 ,
4 0 , 4 ,
4 1 , 0 ,
7 , 6 ,
4 4 , 0 ,
0 , 4 6 ,
3 8 , 0 ,
2 7 , 0
2 )
)
Note that these column vectors vertically indicate the coordinates of X
(on the left) and Y (on the right). The coordinates of each vector are
provided within round parentheses, separated by commas. Compute
the p-value for a two-sided, unpaired two-sample t-test with unequal
variances against the null hypothesis H0 : X̄ “ Ȳ towards the alter-
native hypothesis that X̄ “ Ȳ . Does the p-value you compute reject
the null hypothesis at an alpha level of 0.05?
(7 points)
b) Suppose that the p-value calculated in question a) of this Problem
rejects the null at an alpha level of 0.05. Your colleagues want to write:
‘Our results indicate that this new curriculum should be deployed across
Canada.’ Provide four criticisms of this statement, with at most two
sentences per criticism. You may criticise the aspects of the statistics
described in the experiment, or the scope of the claim.
(4 points)
c) Socioeconomic background may modulate the extent of post-secondary
attainment for a student due to factors external to the student. Also,
people may return to post-secondary studies at any age. The experi-
ment described in this Problem seems to ignore these considerations.
How could this experiment be modified to be more socially relevant
8 of 9
STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST
and inclusive? You may suggest modifications of the operationaliza-
tion of the variable participation, or modification of the recruitment or
the control or detail a matched-paired design (no marks for reference
to new statistical tests or literature review: Valid arguments are re-
quired, sound arguments are not required). Answer in two sentences.
(2 points)
Definitions for part c) of this Problem:
• A young person’s socioeconomic background includes a measure of the income and occupation of their parents, and aspects of the community
and household that they grew up in (Townsend et al. 1988).
• A factor is external to someone if it is something that affects them, but also something that they have no control over.
• Post-secondary studies includes pursuit and completion of college or university degrees, and post-secondary attainment indicates the extent
of such pursuits.
d) Suppose that your colleagues want to provide a one-sample t-test
against the null hypothesis that the mean of the Y values listed in
question a) of this Problem is equal to a given value µ. (They may
want to test if the mean participation of the control class is different
from a Canada-wide mean participation). A t-test with such a low
value of m with m ă 15 is only indicated if the data look normally
distributed. The Shapiro-Wilk hypothesis test for normality provides
a p-value for the null hypothesis that a collection of values Y1, . . . ,Ym
are normally distributed (Shapiro and Wilk, 1965). This test can be
applied using the R code shapiro.test(Y), where Y is a variable in
R specifying a vector with coordinates Y1,Y2,Y3, . . .. (Such a variable
may be created in R with the code Y = c(10, 0, 4, ...)). Does
the Shapiro-Wilk test reject the null hypothesis that Y is normally
distributed at alpha level 0.05, for the Y provided in part a) of this
Problem? And so, is the described test for the mean of Y indicated? (2 points)
9 of 9