5 hours

profilemarylyn50
statistics.pdf

1

LAB 3 ASSIGNMENT

INFERENCES FOR TWO-SAMPLE PROBLEMS

In this lab assignment you will learn how to examine data produced in a simple statistical experiment. In

particular, you will examine the experiment design and apply graphical, numerical, and inferential tools

available in StatCrunch to compare two distributions produced by the study. Before you start working on

the assignment, you should review the course material about designing experiments and comparing two

population means.

Caffeine Dependence Experiment

Caffeine is the world's most widely consumed mood-altering substance. In North America, about 90% of

adults consume caffeine daily. Coffee is the leading dietary source of caffeine among adults in Canada,

while soft drinks represent the largest source of caffeine for children. People who consume large amounts

of caffeine each day may experience physical withdrawal symptoms if they stop taking in their usual

amounts of the substance.

In this lab assignment you will follow an experiment on caffeine dependency conducted on a group of

volunteers by researchers from Johns Hopkins University School of Medicine, in Baltimore and

summarized in the paper “Caffeine Dependence Syndrome. Evidence from Case Histories and

Experimental Evaluations”, JAMA (The Journal of the American Medical Association), Vol. 272, No. 13,

1994.

The researchers recruited twenty-seven volunteers who believed they were psychologically or physically

addicted to caffeine but otherwise were in good health. Of these twenty-seven volunteers, sixteen were

diagnosed as caffeine dependent based on some general substance dependence criteria. Of the sixteen

subjects who were diagnosed as caffeine dependent, eleven agreed to participate in a study to evaluate their

caffeine dependency. Before the experiment was conducted, daily caffeine intake measurements for each

subject were obtained based on food diaries of the participants.

The experiment was conducted on two 2-day periods which occurred exactly one week apart. During one of

the 2-day periods, the subjects were given a set of capsules containing the amount of caffeine normally

ingested by the subject in one day. During the other study period, the subjects were given placebos. The

order in which each subject received the two types of capsules was randomized. At the end of each 2-day

study period, subjects were evaluated in three areas: depression symptoms, fatigue and vigor. The

experimenters were blinded to whether the subject was receiving the caffeine pills or the caffeine-free pills.

The data are available in the StatCrunch file lab4.txt located on the STAT 151 Laboratories website at

http://www.stat.ualberta.ca/statslabs/stat151/index.htm (click Stat 151 link, and Data for Lab 4). The data

are not to be printed in your submission. The following is a description of the variables in the data file:

Variable Name Description of Variable

SUBJECT Subject number (a whole number from 1 to 11),

DEPR-CAF Depression score during caffeine period,

DEPR-NC Depression score during no-caffeine period,

FATIGUE-CAF Fatigue score during caffeine period,

FATIGUE-NC Fatigue score during no-caffeine period,

VIGOR-CAF Vigor score during caffeine period,

VIGOR-NC Vigor score during no-caffeine period,

SMOKER Smoker status (Y if smoked cigarettes daily, N otherwise),

CAFFEINE Daily intake of caffeine (in mg).

2

Use the data provided to answer the following questions:

1. First you will examine the experiment design.

(a) Can we treat the 11 subjects who agreed to participate in the study as a random sample from the population of all caffeine dependent individuals? Explain why or why not? Can you generalize the

results of the study to the population of all caffeine dependent individuals? Explain briefly.

(b) Why were both the subjects and the experimenters interviewing the subjects blinded to whether the subject was receiving the caffeine pills or the caffeine-free pills?

(c) Why was the order in which the two series of capsules were taken randomized?

(d) Why were the two study periods held one week apart instead of using two consecutive 2-day periods?

2. Now you will use inferential tools in StatCrunch to compare the levels of depression for the caffeine and no-caffeine periods.

(a) Do the data give evidence that being deprived of caffeine raises depression scores? Use an appropriate test to answer the question. In particular, state the hypotheses, report the value of the

test statistic from the output, specify the distribution of the test statistic under the null hypothesis,

provide the p-value, and state your conclusion.

(b) Obtain also a 95% confidence interval for the mean increase in the depression scores between no- caffeine and caffeine periods. Is the confidence interval consistent with the outcome of the test in

part (a)? Explain briefly.

(c) What assumptions must be satisfied to justify the procedures you used in (a) and (b)? Are the assumptions met in this case? Obtain the appropriate plot to verify the assumption and paste it into

your report. What is the chief threat to the validity of the results obtained in parts (a) and (b)?

3. Now you will compare the changes in depression levels for smokers and non-smokers.

(a) Use an appropriate test in StatCrunch to see whether the change in the depression scores is different for smokers and non-smokers. State the null and alternative hypotheses, specify the

distribution of the test statistic under the null hypothesis, report the value of the test statistic, and

the p-value of the test. What is your conclusion?

(b) Explain the choice of your test in (a) and specify the assumptions necessary to apply the test. You do not need to verify the assumptions.

(c) Obtain a 95% confidence interval for the mean difference in the in the depression scores for non- smokers and smokers. Interpret the 95% confidence interval. Is the confidence interval consistent

with the test in part (a)?

4. You have studied the change in depression scores exhibited by caffeine-dependent individuals when

they are deprived of caffeine. Now you will explore changes in fatigue and vigor.

(a) Do the data give evidence that being deprived of caffeine raises fatigue scores? Use an appropriate test to answer the question. In particular, state the hypotheses, report the value of the test statistic

and p-value from the output, and state your conclusion.

3

(b) Do the data give evidence that being deprived of caffeine lowers vigor scores? Use an appropriate test to answer the question. In particular, state the hypotheses, report the value of the test statistic

and the p-value from the output, and state your conclusion.

5. Summarize briefly your findings in Questions 1-5 in a form of a brief report. In particular, indicate

which of the three withdrawal symptoms (more depressive mood, more fatigue, less vigor) seems to be

the most intense. Refer to the plots and inferences in your summary.

LAB 3 ASSIGNMENT: MARKING SCHEMA

Proper Header and appearance: 10 marks

Question 1

(a) Random sample or not: 2 marks

Generalizations: 2 marks

(b) Double-blind study discussion: 2 marks (c) Order of administration of pills: 2 marks (d) Timing of the two study periods: 2 marks

Question 2

(a) Hypotheses: 3 marks

Value of the test statistic: 2 marks

Null Distribution: 2 marks

P-value: 2 marks

Conclusion: 2 marks

(b) 95% confidence interval: 4 marks Comparison the interval with the test: 2 marks

(c) Specifying assumptions (in general): 2 marks

Normality assumption for the data: 2 marks

Plot to verify the assumption of normality: 3 marks

SRS assumption for the data: 2 marks

Chief threat to validity: 2 marks

Question 3

(a) Hypotheses: 3 marks

Value of the test statistic: 2 point

Null distribution: 2 marks

P-value: 2 marks

Conclusions: 2 marks

(b) Choice of the test: 2 marks

Assumption of the two-sample t test: 2 marks

(c) Confidence interval: 4 marks

Consistency of the confidence interval with the test: 2 marks

4

Question 4

(a) Hypotheses: 3 marks

Value of the test statistic: 2 point

P-value: 2 marks

Conclusions: 2 marks

(b) Hypotheses: 3 marks

Value of the test statistic: 2 point

P-value: 2 marks

Conclusions: 2 marks

Question 5

Brief summary (including the answer to the question): 5 marks

TOTAL= 92