STAT305

Femina1629
midterm.pdf

STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST

Midterm Exam

Honour Code

In taking this midterm exam you are required to affirm your willingness

to abide by the spirit of academic integrity at SFU. Provide your name

and student number if you can affirm this statement: I understand that the

following activities are prohibited and will be considered cheating and agree

to not participate in any of the following activities.

• Looking at or copying from another SFU student’s midterm exam or materials while writing this midterm exam.

• Conferring or conversing with any party regarding the material content of this midterm exam during the course of this midterm exam.

• Having someone else take this midterm exam in my place.

• Distributing this midterm exam materials in any way.

• Misrepresenting the considerations that this midterm exam may be done within the usual time limitation. (Students with CAL or other

exceptions have deadlines not specified in this document or on canvas.)

This honour code is an undertaking for students to abide by both indi-

vidually and collectively. You must uphold both the spirit and letter of this

honour code.

1. Full Name:

2. Student number:

1 of 9

STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST

• This midterm exam is open book. You may consult any text, website or resource during the completion of this midterm as long as you do

not partake in the activities prohibited by the honour code above. To

be clear: Any content from students, TAs or instructors created be-

fore the beginning of this midterm exam (on discord, facebook,

emails, Piazza, canvas, videos, etc.) is allowed to be accessed and

reviewed during the midterm. If you require clarification about these

rules and instructions you may email me or post on Piazza.

• This midterm begins at October 21st 2:30PM PST and is due at Oc- tober 23rd 5:00PM and is to be submitted on crowdmark. You may

take any amount of time to complete this midterm provided that you

submit it before it is due. This due date constitutes the only time

limitation, and there is no other usual time limitation aside from this.

Students with CAL or other considerations may receive another due

date conforming to their requirements (in which case that due date

supersedes what is written here).

Instructions

Complete the first Problem of this midterm exam. And then, com-

plete only one of the remaining two Problems. This midterm exam

is marked out of 30. This midterm exam has three Problems each worth 15

marks. You must complete Problem 1 and also complete one (1) of Prob-

lem 2 or Problem 3. If you incorrectly complete all three problems, your

mark will be for the first Problem plus the minimum your marks for Prob-

lems 2 and 3. To complete a Problem, upload to crowdmark in the section

corresponding to the Problem. If you choose not to complete a problem, do

not upload. When numerical answers are required, partial marks will only

be awarded if work is shown. You may provide answers through any method

(screenshot, photograph, export from a word document, R markdown, etc.),

provided that an export to pdf is uploaded to crowdmark. You must also

upload a response to the honour code on crowdmark (also in pdf format).

2 of 9

STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST

„DO„ Problem 1: Which test is right?

For each of the scenarios below, indicate the single most correct answer.

i) Let X “ pX1, . . . ,Xnq be the blood pressure (measured in mmHg )

and let Y “ pY1, . . . ,Ynq be the cortisol level (measured in mcg/dL)

recorded for n “ 79 patients recruited for a study in a hospital (Xi

and Yi are measurements for the same patient). What test is most

appropriate to gather evidence towards the alternative hypothesis that

blood pressure is associated with cortisol level?

A) The two-sample paired t-test with the null hypothesis that the

means of X and Y differ.

B) The test with the null hypothesis that the Pearson correlation

coefficient between X and Y is zero.

C) The test with the null hypothesis that the regression coefficient

is zero in a linear regression with response variable X (blood

pressure) and explanatory variable Y (cortisol level).

(5 points)

ii) Suppose that a treatment is proposed to reduce the duration from

the time of infection date, to the time at which a first negative test

is recorded in people with mild COVID-19 (call this time period the

duration). Suppose that 27 people with mild COVID-19 (the study

population) are administered the treatment and 73 people with mild

COVID-19 are not administered the treatment (the control popula-

tion). Both populations are are sampled from patients tested at the

same clinic over the same period. Let the durations for the study

sample be X “ pX1,X2, . . .q, and the durations for the control sample

be Y “ pY1,Y2, . . .q. What test is most appropriate to gather evi-

dence towards the alternative hypothesis that the treatment reduces

the duration?

A) The one-sided two-sample unpaired t-test with H0: The mean of

X is greater than or equal to the mean of Y.

3 of 9

STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST

B) The one-sided two-sample unpaired t-test with the null hypothe-

sis that the mean of X is less than or equal to the mean of Y.

C) The test against the null hypothesis that the Spearman’s ranked

correlation coefficient between X and Y is zero.

D) The one-sided two-sample paired t-test against H0: The mean

difference between Xi and Yi is less than or equal to zero.

E) The two-sided two-sample paired t-test with the null hypothesis

that the mean difference between Xi and Yi is zero.

(5 points)

iii) Road vehicle accidents involving ambulances have more detrimental

outcomes than accidents involving other similarly sized vehicles (Ray

and Kupas, 2005). Measures to avoid such accidents are continually

being refined by organizations involved in emergency medical services.

Suppose that a city council is interested in knowing if adoption of such

measures has lead to an improvement over the last decade. Suppose

that the ratio between the number of accidents involving ambulances

(the numerator) and the number of kilometers driven by ambulances

(the denominator) has been recorded (rt with units number of acci-

dents per kilometer year) for each year t over the past decade. Which

single one of the following statistical quantities is most relevant for

investigating whether or not measures are leading to improvements?

A) The sample standard deviation of rt.

B) The sample mean of rt.

C) The Pearson correlation coefficient ρ between rt and t.

D) The regression coefficient for t in a linear regression with rt as

the response variable and t as the explanatory variable.

E) The regression coefficient for rt in a linear regression with rt as

the explanatory variable and t as the response variable.

(5 points)

4 of 9

STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST

„AND EITHER„ Problem 2: Bayes’ rule

A study was conducted to assess the sensitivity and specificity of four dif-

ferent human immunodeficiency virus (HIV) serology tests (Koblavi-Dème

et al. 2001). The Determine test was among the four, it was developed by

Abbott Laboratories (an American provider of health care, medical devices

and pharmaceuticals) and was found to have a true negative rate (the true

negative rate is also called specificity) of 99.4% and a true positive rate (the

true positive rate is also called sensitivity) of 100%. The true negative rate

of a test for a disease is the probability that someone without the disease

tests negative. The true positive rate of a test for a disease is the probabil-

ity that someone with the disease tests positive. HIV may be transmitted

from an expecting parent to their child by transmission during childbirth

or by transmission to the fetus during pregnancy (throughout, assume that

there’s no other way for a newborn to be infected). Treatment by the drugs

zidovudine or nevirapine has been shown to reduce the rate of these sorts

of transmission of HIV by 38% to 50% in the absence of other intervention

(Koblavi-Dème et al. 2001).

a) Suppose that an expecting parent is infected with HIV and they are

treated with zidovudine or nevirapine during pregnancy. Suppose that

after they give birth, a Determine serology test reports a positive test

for HIV. What is the probability that the child does not have HIV?

Round your answer to the nearest 10-th of a percent.

(6 points)

b) UNAIDS (an organization established by the United Nations Economic

and Social Council) estimates the prevalence of HIV in Côte d’Ivoire

among people aged 15-49 to be 2.6%. If a Determine serology test re-

ported a positive test for HIV in someone selected uniformly at random

among all people in Côte d’Ivoire aged 15-49, what is the probability

that the person does not have HIV? Round your answer to the nearest

10-th of a percent.

(4 points)

5 of 9

STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST

c) In the USA, according to the Centers for Disease Control (a public

health institute within the United States Department of Health and

Human Services), if someone has a positive serology test for HIV they

are not diagnosed as HIV-positive until a second follow-up test also

yields a positive test result. What is the probability that someone is

incorrectly diagnosed as HIV-positive (i.e., if someone is not infected

with HIV, what is the probability that their first test and also their

second follow-up test are both positive)? Suppose that both tests are

Determine serology tests, and also assume that the test results are

statistically independent. Express your answer in expected number of

events in a million (i.e. something like ‘a 36 in a million chance’ or

‘a one in a million chance’). Also: In one sentence, what is a possible

argument as to why the assumption of independence of the two test

results might be wrong? (Your argument does not have to be sound,

but it must be valid without being tautological).

(3 points)

d) What is the probability that an HIV infected expecting parent trans-

mits HIV to their child either during childbirth or through transmit-

ting HIV to the fetus during pregnancy, given that the parent has not

received treatment with the drugs zidovudine or nevirapine, and in

the absence of other intervention, according to the preamble of this

problem (in concordance with Koblavi-Dème et al. 2001)?

(2 points)

6 of 9

STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST

„OR„ Problem 3: Some t-tests in varsity sports

Suppose that you are doing a research assistantship in a lab that has de-

veloped a new curriculum for elementary school sports, designed to increase

participation in organized sports in undergraduate university programs, for

students that go on to enroll in university. We define participation through-

out this question as the number of months spent as a member of a varsity

sports team. Your supervisor has recruited two classes of elementary school

students. One class has 23 students and is offered the new curriculum (the

study class). The other class has 19 students and is not offered the new cur-

riculum (the control class). The classes are each taught exclusively by two

‘home room’ teachers (one teacher for the study class, and one for the control

control class) throughout the duration of the provision of the new curriculum

to the study class. Both classes are matched for age, and the median age at

which they might be expected to finish undergrad (should they pursue it)

is around 10 years later (Statistics Canada 2010). They are all at the same

elementary school in Vancouver. Students from each class receive follow-up

surveys 10 years after the beginning of the experiment, provided that they

have studied at an undergraduate level at any university during the interval.

Let X “ pX1, . . . ,Xnq be the participation (in months) of the students of

the study class responding to the final survey (with n indicating the number

of people responding to the survey from the study class, with n ď 23), and

let Y “ pY1, . . . ,Ymq be the participation of the students in the control class

(with m ď 19). Let X̄ and Ȳ be the sample means of X and Y , respectively.

A two-sided, unpaired two-sample t-test with unequal variances (the Welch

test) would provide evidence against the null hypothesis H0 : X̄ “ Ȳ , under

the assumptions of the test.

a) Ten years later, you are still collaborating on this project (your former

supervisor is now your colleague and it’s a long project) and you both

have now received the results. Imagine that you find n “ 10 and m “ 9

and that X and Y are given by the following column vectors:

7 of 9

STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST

X = ( Y = (

4 5 , 1 0 ,

0 , 0 ,

4 0 , 4 ,

4 1 , 0 ,

7 , 6 ,

4 4 , 0 ,

0 , 4 6 ,

3 8 , 0 ,

2 7 , 0

2 )

)

Note that these column vectors vertically indicate the coordinates of X

(on the left) and Y (on the right). The coordinates of each vector are

provided within round parentheses, separated by commas. Compute

the p-value for a two-sided, unpaired two-sample t-test with unequal

variances against the null hypothesis H0 : X̄ “ Ȳ towards the alter-

native hypothesis that X̄ ­“ Ȳ . Does the p-value you compute reject

the null hypothesis at an alpha level of 0.05?

(7 points)

b) Suppose that the p-value calculated in question a) of this Problem

rejects the null at an alpha level of 0.05. Your colleagues want to write:

‘Our results indicate that this new curriculum should be deployed across

Canada.’ Provide four criticisms of this statement, with at most two

sentences per criticism. You may criticise the aspects of the statistics

described in the experiment, or the scope of the claim.

(4 points)

c) Socioeconomic background may modulate the extent of post-secondary

attainment for a student due to factors external to the student. Also,

people may return to post-secondary studies at any age. The experi-

ment described in this Problem seems to ignore these considerations.

How could this experiment be modified to be more socially relevant

8 of 9

STAT305/605 Fall 2020 SFU Midterm Due October 23rd 5:00PM PST

and inclusive? You may suggest modifications of the operationaliza-

tion of the variable participation, or modification of the recruitment or

the control or detail a matched-paired design (no marks for reference

to new statistical tests or literature review: Valid arguments are re-

quired, sound arguments are not required). Answer in two sentences.

(2 points)

Definitions for part c) of this Problem:

• A young person’s socioeconomic background includes a measure of the income and occupation of their parents, and aspects of the community

and household that they grew up in (Townsend et al. 1988).

• A factor is external to someone if it is something that affects them, but also something that they have no control over.

• Post-secondary studies includes pursuit and completion of college or university degrees, and post-secondary attainment indicates the extent

of such pursuits.

d) Suppose that your colleagues want to provide a one-sample t-test

against the null hypothesis that the mean of the Y values listed in

question a) of this Problem is equal to a given value µ. (They may

want to test if the mean participation of the control class is different

from a Canada-wide mean participation). A t-test with such a low

value of m with m ă 15 is only indicated if the data look normally

distributed. The Shapiro-Wilk hypothesis test for normality provides

a p-value for the null hypothesis that a collection of values Y1, . . . ,Ym

are normally distributed (Shapiro and Wilk, 1965). This test can be

applied using the R code shapiro.test(Y), where Y is a variable in

R specifying a vector with coordinates Y1,Y2,Y3, . . .. (Such a variable

may be created in R with the code Y = c(10, 0, 4, ...)). Does

the Shapiro-Wilk test reject the null hypothesis that Y is normally

distributed at alpha level 0.05, for the Y provided in part a) of this

Problem? And so, is the described test for the mean of Y indicated? (2 points)

9 of 9