psych case study & stat hw assignment due 12/1/12 by 4:30pm
4
5
6
7
8
9
3
5
6
7
8
9
10
140
180
220
240
100
120
160
280
300
200
46
5:3
60
|
|
|
|
--
10
15
20
25
10 0
40
10
100
30
10
100
20
10
100
10
50
10
H00
40
20
100
10
20
50
12
100
12
50
12
I
.
.
CHAPTER 5
•
Hypothesis Tests with Means
of Samples
Chapter Outline
0 The Distribution of Means 138
' Hypothesis Testing with a Distribution
Advanced Topic Controversy:
Confidence Intervals versus Significance Tests 162
of Means: The Z Test 146
C, Controversy: Marginal
0 Advanced Topic: Confidence
Significance 153
Intervals in Research Articles 163
Hypothesis Tests About Means
0 Summary 163
of Samples (Z Tests) and Standard Errors in Research Articles 154
0 Key Terms 164
Advanced Topic: Estimation,
0 Example Worked-Out Problems 164 0 Practice Problems 167
o Chapter Notes 173
Standard Errors, and Confidence Intervals 156
/un c
Chapter 4, we introduced the basic logic of hypothesis testing. The studies we sed as examples had a sample of a single individual. As we noted, however, in
atual practice, psychology research almost always involves a sample of many
individuals. In this chapter, we build on what you have learned so far and consider hypothesis testing with a sample of more than one individual. For example, a social psychologist is interested in the potential effect of perceptions of people's personality on perceptions of their physical attractiveness. The researcher's theory predicts that, if a you are told that a person has positive personality qualities (such as kindness, warmth, a sense of humor, and intelligence), you will rate that person as more attrac�tive than if no mention had been made of the person's personality qualities. From extensive previous research (in which no mention was made of personality qualities),
137
1
138
Chapter 5
the researcher has established the population mean and standard deviation of the attractiveness rating of a photo of a particular person. The researcher then recruits a sample of 64 individuals to rate the attractiveness of the person in the photograph. However, prior to rating the person, each individual is told that the person whose photograph they are going to rate has many positive personality qualities. In this chapter, you will learn how to test hypotheses in situations such as those presented in this example, situations in which the population has a known mean and standard deviation and in which a sample has more than one individual. Mainly, this requires examining in some detail a new kind of distribution, called a "distribution of means." (We will return to this example later in the chapter.)
The Distribution of Means
Hypothesis testing in the usual research situation, where you are studying a sample of many individuals, is exactly the same as you learned in Chapter 4—with an important exception. When you have more than one person in your sample, there is a special prob�lem with Step @, determining the characteristics of the comparison distribution. In each of our examples so far, the comparison distribution has been a distribution of individual scores (such as the population of ages when individual babies start walking). A distribution of individual scores has been the correct comparison distribution be�cause we have used examples with a sample of one individual. That is, there has been consistency between the type of sample score we have been dealing with (a score from
one individual) and the comparison distribution (a distribution of individual scores).
Now, consider the situation when you have a sample of, say, 64 individuals (as in the attractiveness rating example). You now have a group of 64 scores (an attractive�ness rating from each of the 64 people in the study). As you will recall from Chapter 2, the mean is a very useful representative value of a group of scores. Thus, the score you care about when there is more than one individual in your sample is the mean of the group of scores. In this example, you would focus on the mean of the 64 individuals'
scores. If you were to compare the mean of this sample of 64 individuals' scores to a distribution of a population of individual scores, this would be a mismatch—like comparing apples to oranges. Instead, when you are interested in the mean of a sam�ple of 64 scores, you need a comparison distribution that is a distribution of means of samples of 64 scores. We call such a comparison distribution a distribution of means.
So, the scores in a distribution of means are means, not scores of individuals.
A distribution of means is a distribution of the means of each of lots and lots of samples of the same size, with each sample randomly taken from the same popula�tion of individuals. (Statisticians also call this distribution of means a sampling dis�tribution of the mean. In this book, however, we use the term distribution of means to keep it clear that we are talking about populations of means, not samples or some
kind of distribution of samples.)
The distribution of means is the correct comparison distribution when there is more than one person in a sample. Thus, in most research situations, determining the characteristics of a distribution of means is necessary for Step @ of the hypothesis�
testing procedure, determining the characteristics of the comparison distribution.
distribution of means distribution of
means of samples of a given size from a population (also called a sampling distrib�ution of the mean); comparison distribu- tion when testing hypotheses involving a single sample of more than one individual.
Building a Distribution of Means
To help you understand the idea of a distribution of means, we consider how you could�build up such a distribution from an ordinary population distribution of individual�scores. Suppose our population of individual scores was of the grade levels of the
Hypothesis Tests with Means of Samples
139
10,000
0
I
2
1
Grade
M=4.67.
SD2 = .39.
SD= .62.
1_1_11
I
Grade
Figure 5-1 Distribution of grade levels
among 90,000
Figure 5-2 Distribution of the means of
schoolchildren (fictional data).
three randomly taken samples of two schoolchild�ren's grade levels each from a population of grade levels of 90,000 schoolchildren (fictional data).
90,000 elementary and junior-high schoolchildren in a particular region. Suppose fur��ther (to keep the example simple) that there are exactly 10,000 children at each grade�level, from first through ninth grade. This population distribution would be rectangular,
with a mean of 5, a variance of 6.67, and a standard deviation of 2.58 (see Figure 5-1).
Next, suppose that you wrote each child's grade level on a table tennis ball and put all 90,000 balls into a giant tub. The tub would have 10,000 balls with a 1 on them, 10,000 with a 2 on them, and so forth. You stir up the balls in the tub and then take two of them out. You have taken a random sample of two balls. Suppose one ball has a 2 on it and the other has a 9 on it. The mean grade level of this sample of two children's grade levels is 5.5, the average of 2 and 9. Now you put the balls back, mix up all the balls, and select two balls again. Maybe this time you get two 4s, making the mean of your second sample 4. Then you try again; this time you get a 2
and a 7, making your mean 4.5. So far you have three means: 5.5, 4, and 4.5.
Each of these three numbers is a mean of a sample of grade levels of two school children. And these three means can be thought of as a small distribution in its own right. The mean of this little distribution of means is 4.67 (the sum of 5.5, 4, and 4.5, divided by 3). The variance of this distribution of means is .39 (the variance of 5.5, 4, and 4.5). The standard deviation of this distribution of means is .62 (the square root
of .39). A histogram of this distribution of three means is shown in Figure 5-2.
Suppose you continued selecting samples of two balls and taking the mean of the
numbers on each pair of balls. The histogram of means would continue to grow. Figure 5-3 shows examples of distributions of means varying from a sample with just 50 means, up to a sample with 1,000 means (with each mean being of a sample of two randomly drawn balls). (We actually made the histograms shown in Figure 5-3 using a computer to make the random selections instead of using 90,000 table tennis balls
and a giant tub.)
As you can imagine, the method we just described is not a practical way of de�
termining the characteristics of a distribution of means. Fortunately, however, you can figure out the characteristics of a distribution of means directly, using some sim-ple rules, without taking even one sample. The only information you need is (a) the characteristics of the distribution of the population of individuals and (b) the number of scores in each sample. (Don't worry for now about how you could know the char- acteristics of the population of individuals.) The laborious method of building up a distribution of means in the way we have just considered and the concise method you will learn shortly give the same result. We have had you think of the process in terms of the painstaking method only because it helps you understand the idea of a distribution of means.
• ir • la . 41 r-v-r "r
Before moving on to later chap�ters, be sure you fully understand the idea of a distribution of means (and why it is the correct cornpar
son distribution when a saammppllee contains more than one individual). You may need to go through this chapter a couple of times to achieve full understanding of this crucial concept.
Chapter 5
125
125
125
100
100H
100
0 75
0 75
75
50
50
50
25
25
0
A
A
�
�
o'III-T-(
11 I
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
N=50
V=200
N . 400
125
125
125
100
100
h
100
75
75
75M1
50
504
.
H
50 25t,
0,
25
0_„,
25
Ii
0
T
1.0 2.0 3.0 4.0 5
7.0 8.0 9.0
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
N=61
N =800
N = 1000
Figure 5-3 Histograms of means of two grade levels randomly selected from a large
group of students with equal numbers of grades 1 through 9. Histograms are shown for 50 such means, 200 such means, 400 such means, 600 such means, 800 such means, and 1,000 such means. Notice that the histograms become increasingly like a normal curve as the num�ber of means increases.
Determining the Characteristics of a Distribution of Means
Recall that Step A of hypothesis testing involves determining the characteristics of the comparison distribution. The three key characteristics of the comparison distrib�ution that you need to determine are:
Its mean.
Its spread (which you can measure using the variance and standard deviation).
Its shape.
Notice three things about the distribution of means we built in our example, as shown in Figure 5-3:
1. The mean of the distribution of means is about the same as the mean of the orig�
inal population of individuals (both are 5).
2. The spread of the distribution of means is less than the spread of the distribution
of the population of individuals.
3. The shape of the distribution of means is approximately normal.
mean of a distribution of means the
The first two observations, regarding the mean and the spread, are true for all distributions of means. The third, regarding the shape, is true for most distributions of means. These three observations, in fact, illustrate three basic rules you can use to find the mean, the spread (that is, variance and standard deviation), and the shape of any distribution of means without having to write on plastic balls and take endless
mean of a distribution of means of sam�pies of a given size from a population; the same as the mean of the population of individuals.
samples.
Now let's look at the three rules more closely. The first is for the mean of a distribution of means.
Hypothesis Tests with Means of Samples
141
Rule 1: The mean of a distribution of means is the same as the mean of the population of individuals. Stated as a formula,
�
The mean of a di
�
RA/ = µ
(5-1)
means is equal t(
the population of
pm is the mean of the distribution of means (it uses a Greek letter because the
distribution of means is also a kind of population). }I is the mean of the pop�
ulation of individuals.
Each sample is based on randomly selected individuals from the population of
individuals. Thus, the mean of a sample will sometimes be higher and sometimes�lower than the mean of the whole population of individuals. However, because the�selection process is random and we are taking a very large number of samples, even�
tually the high means and the low means perfectly balance each other out.
In Figure 5-3, as the number of sample means in the distributions of means increases, the mean of the distribution of means becomes more similar to the mean of the population of individuals, which in this example was 5. It can be proven math�ematically that, if you took an infinite number of samples, the mean of the distribu�tion of means of these samples would come out to be exactly the same as the mean
of the distribution of individuals.
The second rule is about spread. Rule 2a is for the variance of a distribution of
means.
Rule 2a: The variance of a distribution of means is the variance of the pop��ulation of individuals divided by the number of individuals in each sample.
A distribution of means will be less spread out than the distribution of individuals from which the samples are taken. If you are taking a sample of two scores, it is less likely that both scores will be extreme. Further, for a particular random sample to have an extreme mean, the two extreme scores would both have to be extreme in the same direction (both very high or both very low). Thus, having more than a single score in each sample has a moderating effect on the mean of such samples. In any one sample, the extremes tend to be balanced out by a middle score or by an extreme in the opposite direction. This makes each sample mean tend toward the middle and away from extreme values. With fewer extreme means, the variance of the means is
less than the variance of the population of individuals.
Consider again our example. There were plenty of I s and 9s in the population, making a fair amount of spread. That is, about a ninth of the time, if you were taking samples of single scores, you would get a 1 and about a ninth of the time you would get a 9. If you are taking samples of two at a time, you would get a sample with a mean of I (that is, in which both balls were 1s) or a mean of 9 (both balls 9s) much less often. Getting two balls that average out to a middle value such as 5 is much
more likely. (This is because several combinations could give this result-1 and 9, 2
and 8, 3 and 7, 4 and 6, or two 5s).
The more individuals in each sample, the less spread out will be the means of the samples. This is because, the more scores in each sample, the rarer it will be for ex�tremes in any particular sample not to be balanced out by middle scores or extremes in the other direction. In terms of the table tennis balls in our example, we rarely got a mean of 1 when taking samples of two balls at a time. If we were taking three balls at a time, getting a sample with a mean of 1 (all three balls would have to be 1s) is
even less likely. Getting middle values for the means becomes even more likely.
I.LAt mean of a distribution of means. variance of a distribution of means
Using samples of two balls at a time, the variance of the distribution of means
came out to about 3.34. This is half of the variance of the population of individuals,�which was 6.67. If we had built up a distribution of means using samples of three
variance of the population divided by the number of scores in each sample
,.
1 42
Chapter 5
balls each, the variance of the distribution of means would have been 2.22. This is�one-third of the variance of our population of individuals. Had we randomly selected�five balls for each sample, the variance of the distribution of means would have been
one-fifth of the variance of the population of individuals.
These examples follow a general rule—our Rule 2a for the distribution of means: the variance of a distribution of means is the variance of the population of individuals divided by the number of individuals in each of the samples. This rule
holds in all situations and can be proven mathematically.
Here is Rule 2a stated as a formula:
�
f a distribution
�
variance of
0-
1
14
—
0'
(5-2)
of individuals
N
number of
each sample.
cr is the variance of the distribution of means (it uses a Greek letter because the distribution of means is also a kind of population). cr2 is the variance of the popula�
tion of individuals, and N is the number of individuals in each sample.
In our example, the variance of the population of individual children's grade levels was 6.67, and there were two children's grade levels in each sample. Thus,
When you figure the variance of a distribution of means (4), be sure to divide the population variance (cr2) by the number of individuals in each sample. In many of the exam�ples, you are told the population standard deviation (a), which you will first have to square to find the population variance (cr2); then you can use Formula 5-2 to find the variance of the distribution of means (4).
=
o.2
=
6.67
3.34
N
2
To use a different example, suppose a population had a variance of 400 and you wanted to know the variance of a distribution of means of 25 individuals each:
=
0-
=
400
16
N
25
The second rule also tells us about the standard deviation of a distribution of
means.
The standard deviation of a distribution of means is the square root of the variance of the distribution of means and also the square root of the result of dividing the variance
Rule 2b: The standard deviation of a distribution of means is the square
root of the variance of the distribution of means. Stated as a formula,
QM
Vi
cri
0-
(5-3)
N
of the population of individuals by the number of individuals in each sample.
aM is the standard deviation of the distribution of means. I
The standard deviation of the distribution of means also has a special name of its
(di variance of a distribution of means.
own, the standard error of the mean (SEM), or the standard error (SE), for short. (Thus, aM also stands for the standard error.) It has this name because it tells you how much the means of samples are typically "in error" as estimates of the mean of the population of individuals. That is, it tells you how much the various means in the distribution of means deviate from the mean of the population. We have more to say
standard deviation of a distribution�of means square root of the variance
of a distribution of means; also called
standard error of the mean (SEM) and standard error (SE).
about the standard error later in the chapter.
cr$, standard deviation of a distribution
Finally, the third rule for finding the characteristics of a distribution of means focuses on its shape.
of means.
standard error of the mean (SEM) same as standard deviation
of a distribution of means; also called standard error (SE).
Rule 3: The shape of a distribution of means is approximately normal if either�(a) each sample is of 30 or more individuals or (b) the distribution of the�population of individuals is normal. Whatever the shape of the distribution of
standard error (SE) same as standard
deviation of a distribution of means; also�called standard error of the mean (SEM).
the population of individuals, the distribution of means tends to be unimodal and sym��metrical. In the grade-level example, the population distribution was rectangular.
Hypothesis Tests with Means of Samples
143
(It had an equal number at each value.) However, the shape of the distribution of 1,000�sample means (see Figure 5-3) was roughly that of a bell—unimodal and symmetri��cal. Had we taken many more than 1,000 samples, the shape would have been even
more clearly unimodal and symmetrical.
A distribution of means tends to be unimodal because of the same basic process of extremes balancing each other out that we noted in the discussion of the variance: middle scores for means are more likely, and extreme means are less likely. A distribution of means tends to be symmetrical because a lack of symmetry (skew) is caused by extremes. With fewer extremes, there is less asymmetry. In our grade-level example, the distribution of means we built up also came out so clearly symmetrical because the population distribution of individual grade levels was symmetrical. Had the popula�tion distribution of individuals been skewed to one side, the distribution of means
would have still been skewed, but not as much.
The more individuals in each sample, the closer the distribution of means will be to a normal curve. Although the distribution of means will rarely be an exactly normal curve, with samples of 30 or more individuals (even with a nonnormal popu�lation of individuals), the approximation of the distribution of means to a normal
curve is very close and the percentages in the normal curve table will be extremely
accurate.• (That is, samples that are larger than 30 make for even slightly better ap�
23
proximations, but for most practical research purposes, the approximation with 30 is quite good enough.) Finally, whenever the population distribution of individuals is normal, the distribution of means will be normal, regardless of the number of indi�viduals in each sample.
Summary of Rules and Formulas for Determining
the Characteristics of a Distribution of Means
Rule 1: The mean of a distribution of means is the same as the mean of the population of individuals:
RAI
11
Rule 2a: The variance of a distribution of means is the variance of the pop��ulation of individuals divided by the number of individuals in each sample:
0'2
=
N
Rule 2b: The standard deviation of a distribution of means is the square
root of the variance of the distribution of means:
Ana
vY2M
N
Rule 3: The shape of a distribution of means is approximately normal if�either (a) each sample is of 30 or more individuals or (b) the distribution�of the population of individuals is normal. Figure 5-4 shows these three rules
graphically.
These three rules are based on the central limit theorem, a fundamental principle
in mathematical statistics we mentioned in Chapter 3. A notable strength of the cen�tral limit theorem is that it provides the key characteristics (central tendency, vari�ability, and shape) of a distribution of means for a population with a distribution of any shape.
Chapter 5
�
�
Distribution of Means
Same Mean
Less Variance
Normal if population is normal
or regardless of population shape
if samples each contain 30 or
more scores
Figure 5-4 Comparing the distribution of the population of individuals (upper curve)
and the distribution of means (lower curve).
Example of Determining the Characteristics of a Distribution of Means
Think back to the example from the start of the chapter in which students rated the attractiveness of a person in a photograph. Consider the population of students' rat�ings of the person in the photograph (when students are told nothing about the per�sonality characteristics of the person in the photograph). Suppose the distribution is approximately normal with a mean of 200 and a standard deviation of 48. What will be the characteristics of the distribution of means for samples of 64 students?
Rule 1: The mean of a distribution of means is the same as the mean of the�population of individuals. The mean of the population is 200. Thus, the mean of
the distribution of means will also be 200. That is, p.m = p = 200.
Rule 2a: The variance of a distribution of means is the variance of the popu��lation of individuals divided by the number of individuals in each sample.
The standard deviation of the population of individuals is 48; thus, the variance of the population of individuals is 482, which is 2,304. The variance of the distribution of means is therefore 2,304 divided by 64 (the size of the sample). This comes out to 36. That is, cr2m = o-2/N = 2,304/64 = 36.
Hypothesis Tests with Means of Samples
145
Figure 5-5 Three kinds of distributions: (a) the distribution of a population of individ�
uals, (b) the distribution of a particular sample taken from that population, and (c) the distrib�ution of means.
Rule 2b: The standard deviation of a distribution of means is the square�root of the variance of the distribution of means. The standard deviation
of the distribution of means is the square root of 36, which is 6. That is, = Vo-2m = V36 = 6.
Rule 3: The shape of a distribution of means is approximately normal if either
(a) each sample is of 30 or more individuals or (b) the distribution of the pop�
ulation of individuals is normal. Our situation meets both of these conditions
the sample of 64 students is more than 30, and the population of individuals follows a normal distribution. Thus, the distribution of means will follow a normal curve. (It would have been enough even if only one of the two conditions had been met.)
Review of the Three Kinds of Distributions
We have considered three kinds of distributions: (1) the distribution of a population of individuals, (2) the distribution of a particular sample of individuals from that population, and (3) the distribution of means. Figure 5-5 shows these three kinds of distributions graphically and Table 5-1 describes them.
111111""1"51177""m"
Table 5-1
Comparison of Three Types of Distributions
Particular
Be sure you fully understand the different types of distribution shown in Table 5-1 before you move on to later chapters. To check your understanding, cover up portions of the table and then try to recall the hidden information.
Population's
Sample's
Distribution
Distribution
Distribution
of Means
Content
Scores of all individuals
Scores of the individuals
Means of samples randomly
in the population
in a single sample
taken from the population
Shape
Could be any shape; often
Could be any shape
Approximately normal if sam�
normal
pies have _?_.30 individuals in
each or if population is normal
Mean
M = (EX)/
P.m =
Variance
0?
SD2 = [E(X — M)2]/A/
QM = 02/N
Standard
0-
SD = VSD2
QM = V
0m
deviation
146
Chapter 5
�
1. What is a distribution of means?
2. Explain how you could create a distribution of means by taking a large num�
ber of samples of four individuals each.
3. (a) Why is the mean of the distribution of means the same as the mean of the
population of individuals? (b) Why is the variance of a distribution of means
smaller than the variance of the distribution of the population of individuals?
4. Write the formula for the variance of the distribution of means, and define
each of the symbols.
5. (a) What is the standard error? (b) Why does it have this name?
6. A population of individuals that follows a normal curve has a mean of 60 and
a standard deviation of 10. What are the characteristics of a distribution of means from this population for samples of four each?
ieLwou = adeqs = nn = 17/z0 = N/?.0
1.0 ;09 = r1 = :smog
se eJe Jnoj Jo seldwes sueew ;o uoilnqwsip e jo soilspaloaleqo au
ueaw
uoneindod jo salewnsa se Jalie UI 0.10 saidwes Jo sueew aw. Tegl Iunowe
1e01dAl 0144 snw pue `ueaw uoneindod et woe; ael3lp (Aipiepuels) AlleoldAl
seldwes jo sueew Lionw mot.' Inoqe noA 81104 n asneoaq aweu 801411 (q)
sueew jo uoilnqw.sip aw Jo uoileinap puepuels 8144 s1 JOJJe piepuels aqi (e) •9
aldwes JnoA u1 sienpinipui Jo Jaqwnu aq1 sl N :sienpv(pu! Jo uoneindod aw Jo aouepen aqI s1 Disueaw jo uonnqwsip jo 901.1epeA aw.
z
N/z.0 = zD s1 sueew Jo uoilnqw.sip aw. Jo 80lJelJeA aqI.1o; einwio; 0141 •7
'801.1e1JEA 5501 SI Well'
`SeJOOS aippiw OJOW pue SeJOOS eWeJTXe canna; glint\ isnitt •uoproalip al!soddo�agI u1 Seitte4Xe .10 saioos aippiw Aq Ino pagueleg eq oI pual SeJOOS eWeJ1X8�iuonoalip awes aw u1 SeWaVe iaJanas 406 0l Alaviun Alg61g sl II aldwes wop
Aue u1 asneoaq •aloos awaii.xa apuis 0 Tab 0l ace noA uegl ueaw�awanxa ue wim SeJOOS ialanas Jo aidwes 0 106 co, AlaNii 9801 ace noA (q)
aoueieq pinoqs asaw jo uoileindod ai_lljo asow uew sueew Aannoi
ewos pue sueaw AeL16114 eAel4 ewos isaidwes ua>iel Alwopual (e) •E
sueew asaw Jo uolingplslp e aNeiAl 'sawn jo Aaqwnu 06.101 /Gan e
sq. oa •ueaw s11 ambij pue uon.eindod agI woe; Anoj jo aidwes wopum e a)iel
sienpiAipui Jo uon.eindod 0 waij uaNel. Aiwopuai azis awes aw Jo saldwes Jo Aeqwnu 06,1014Jan e Jo sueew Jo uoilnquisip e s1 sueew Jo uoilnqwsip Ny • L
SJOAASUy
Hypothesis Testing with a Distribution of Means: The Z Test
Now we are ready to turn to hypothesis testing when there is more than one individual�in the study's sample. The hypothesis testing procedure you will learn is called a Z test.
The Distribution of Means as the Comparison Distribution in Hypothesis Testing
In the usual research situation, a psychologist studies a sample of more than one per��son. In this situation, the distribution of means is the comparison distribution. It�is the distribution whose characteristics need to be determined in Step of the
Hypothesis Tests with Means of Samples
147
BOX 5-1
More About Polls: Sampling Errors and Errors in Thinking About Samples
If you think back to Box 3-3 on surveys and the Gallup poll, you will recall that we left two important questions unanswered about fine print included with the results of a poll, saying something like, "From a telephone poll of 1,000 American adults taken on June 4 and 5. Sampling error ±3%." First, you might wonder how such small numbers, like 1,000 (but rarely much less), can be used to predict the opinion of the entire U.S. population. Sec-ond, after working through the material in this chapter on the standard deviation of the distribution of means, you may wonder what the term sampling error means when a sample is not randomly sampled but rather selected by
of accuracy. This absolute size determines the impact of�the random errors of measurement and selection. What�remains important is reducing bias or systematic error,
which can be done only by careful planning.
As for the term sampling error, it is worked out ac-cording to past experience with the sampling procedures used. It is given in tables for different sample sizes (usu�ally below 1,000, because that is where error increases
dramatically).
the complicated probability method used for polls.
So the number of people polled is not very important (provided that it is at least 1,000 or so), but what mat-ters very much are the methods of sampling and esti�mating error, which will not be reported in the detail necessary to judge whether the results are reliable. The reputation of the organization doing the survey is prob�ably the best criterion. If the sampling and error-estimating approach is not revealed at all, be cautious. For more information about how polls are conducted, go to � HYPERLINK http://media.gallup.com/PDF/FAQ/HowArePolls.pdf ��http://media.gallup.com/PDF/FAQ/HowArePolls.pdf� (note that sampling error is referred to as "margin of
Regarding sample size, you know from this chapter that large sample sizes, like 1,000, greatly reduce the standard deviation of the distribution of means. That is, the curve becomes very high and narrow, gathered all around the population mean. The mean of any sample of
that size is very close to being the population mean.
When a sample is only a small part of a very large pop�
ulation, the sample's absolute size is the only determiner
error" on the Web site).
hypothesis-testing process. The distribution of means is the distribution to which you compare your sample's mean to see how likely it is that you could have selected a sample with a mean that is extreme if the null hypothesis were true.
Figuring the Z Score of a Sample's Mean on the Distribution of Means
There can be some confusion in figuring the location of your sample on the compari�son distribution in hypothesis testing with a sample of more than one. In this situation, you are finding a Z score of your sample's mean on a distribution of means. (Before, you were finding the Z score of a single individual on a distribution of a population of single individuals.) The method of changing the sample's mean to a Z score is the same as the usual way of changing a raw score to a Z score. However, you have to be careful not to get mixed up because more than one mean is involved. It is important to remember that you are treating the sample mean like a single score. Recall that the ordinary formula (from Chapter 3) for changing a raw score to a Z score is Z=
(X — Mil SD. In the present situation, you are actually using the following formula:
The Z score for the sample's mean on the distribution of means is the sample's mean minus the mean of the
distribution of means, divided by the standard deviation of the distribution of means.
Z
=
M —
(5-4)
QM
For example, suppose your sample's mean is 18 and the distribution of means
has a mean of 10 and a standard deviation of 4. The Z score of this sample mean is +2. Using the formula,
Z
=
M
—
µA/
=
18
—
=
=
2
tsM
4
4
Chapter 5
0
�
�
Raw Scores:
,
2
1()
14
Z Scores:
—1
0
+1
+2
t
�
I
18
Figure 5-6 Z score for the mean of a particular sample on the distribution of means.
This is shown in Figure 5-6.
The hypothesis test you are learning in this chapter is called a Z test, because
you figure the Z score for your sample's mean.
Example
Let's return again to our example in which a social psychologist is interested in whether being told a person has positive personality qualities increases ratings of the physical attractiveness of that person. The psychologist asks 64 randomly selected students to rate the attractiveness of a particular person in a photograph. Prior to rat�ing the attractiveness of the person, each student is told that the person has positive personality qualities (kindness, warmth, a sense of humor, and intelligence). On a scale of 0 (the lowest possible attractiveness) to 400 (the highest possible attractive�ness), the mean attractiveness rating given by the 64 students is 220. From previous research, the psychologist knows that the attractiveness ratings of the person in the
photograph (when no mention is made of the person's positive personality qualities)�have a mean of 200 and a standard deviation of 48, and they follow an approximately
normal distribution. This distribution is shown in Figure 5-7a.4
J • a I at r -
Now let's carry out the Z test by following the five steps of hypothesis testing
As in Chapter 4, Population 2 is the population for the comparison distribution, which is the distribu-tion that shows the population sit uation if the null hypothesis is true.
you learned in Chapter 4:
0 Restate the question as a research hypothesis and a null hypothesis about
the populations. The two populations are these:
Population 1: Students who are told that the person has positive personality
Z test hypothesis-testing procedure in
qualities.
which there is a single sample and the population variance is known.
Population 2: Students in general (who are told nothing about the person's personality qualities).
Hypothesis Tests with Means of Samples
149
(a)
= 200
a2 = 2.304
a = 48
Test Scores:
260
200
Z Scores:
–1
0
+1
+2
a
�
0 1 2
200 206 212
(c)
M = 220
�
N = 64
220
Figure 5-7 For the fictional study of positive personality qualities and ratings of phys�
ical attractiveness, (a) the distribution of the population of individuals, (b) the distribution of means (the comparison distribution), and (c) the sample's distribution. The shaded area in the distribution of means is the rejection region—the area in which the null hypothesis will be rejected if the study's sample mean turns out to be in that area.
The research hypothesis is that the population of students who are told that the person has positive personality qualities will on the average give higher at�tractiveness scores for that person than the population of students who are told nothing about the person's personality qualities: Ili > 112. The null hypothesis is that Population l's scores will not on the average be higher than Population 2's: III I.L2. Note that these are directional hypotheses. The researcher wants to know if being told that the person has positive personality qualities will increase attractiveness scores; a result in the opposite direction would not be relevant to the theory the researcher is testing.
150
Chapter 5
A Determine the characteristics of the comparison distribution. The result of
the study will be a mean of a sample of 64 individuals (students in this case). Thus, the comparison distribution has to be the distribution of means of samples of 64 individuals each. This comparison distribution will have a mean of 200 (the same as the population mean). That is, as we saw earlier in the chapter,
standard deviation of 48 squared); the sample size is 64. Thus, the variance of the distribution of means, o-2m, will be 2,304/64, or 36. The standard deviation of the distribution of means, QM, is the square root of 36, or 6. Finally, because there are more than 30 individuals in the sample, the shape of the distribution of means will be approximately normal. Figure 5-7b shows this distribution of means.
= 200. Its variance will be the population variance divided by the number of
2 is 2,304 (the population individuals in the sample. The population variance, o.,
A Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. Let's assume the researcher decides to use the standard 5% significance level. As we noted in Step 0, the researcher is making a directional prediction. Hence, the researcher will reject the null hy�pothesis if the result is in the top 5% of the comparison distribution. The com�parison distribution (the distribution of means) is a normal curve. Thus, the top 5% can be found from the normal curve table. It starts at a Z of +1.64. This top 5% is shown as the shaded area in Figure 5-7b.
O Determine your sample's score on the comparison distribution. The result of
the (fictional) study is that the 64 students told that the person has positive per�sonality qualities gave a mean attractiveness rating of 220. (This sample's distri�bution is shown in Figure 5-7c.) A mean of 220 is 3.33 standard deviations above the mean of the distribution of means:
Z
=
M
—
µM
=
220
—
=
20
3.33
Cr m
6
6
A Decide whether to reject the null hypothesis. We set the minimum Z score to
reject the null hypothesis to +1.64. The Z score of the sample's mean is +3 .33. Thus, the social psychologist can reject the null hypothesis and conclude that the research hypothesis is supported. To put this another way, the result of the Z test is statistically significant at the p < .05 level. You can see this in Figure 5-7b. Note how extreme the sample's mean is on the distribution of means (the distribution that would apply if the null hypothesis were true). The final conclu�sion is that, among students, being told that a person has positive personality qualities does increase the attractiveness ratings of that person. (Results of actual studies show this effect, as well as showing that if you have heard negative in�formation about a person, you then rate them as less physically attractive; e.g., Lewandowski, Aron, & Gee, 2007.)
A Second Example
Suppose a researcher wants to test the effect of a communication skills seminar on�students' use of verbal fillers during a presentation. Verbal fillers are words such as�"urn," "uh," and "you know," that people commonly use in conversations and when
giving presentations. The researcher conducts a study in which 25 students attend a communication skills seminar and then give a half-hour presentation on a topic of their choice. The presentations are tape-recorded and a research assistant later counts the number of verbal fillers used by each student during his or her presentation. In this fictional example, we assume that the researcher knows from previous studies that students typically use a mean of 53 verbal fillers during a half-hour presentation
Hypothesis Tests with Means of Samples
151
(a)
u = 53
a2=49
Memory Scores 39
67
Z Scores
-2
-1
0
+1
+2
�
T
�
48.8 50.2 51.6
53
54.4 55.8
-3 -2 -1
0
+1 +2
(c)
M=48
N=25
48
Figure 5-8 For the fictional study of the use of verbal filters in a presentation, (a) the
distribution of the population of individuals, (b) the distribution of means (the comparison distribution), and (c) the sample's distribution. The shaded areas in the distribution of means are the rejection regions—the areas in which the null hypothesis will be rejected if the study's sample mean turns out to be in that area.
of this kind, with a standard deviation of 7, and the distribution of verbal fillers fol�lows a normal curve (see Figure 5-8a). The 25 students who take the communication skills seminar use a mean of 48 verbal fillers. The researcher wants to carry out the Z test using the 1% significance level, and an effect in either direction would be impor�tant (that is, the researcher is interested in whether the communications seminar could increase or decrease the use of verbal fillers).
0 Restate the question as a research hypothesis and a null hypothesis about
the populations. The two populations are:
Population 1: Students who attend a communication skills seminar. Population 2: Students in general (who do not attend a communication skills seminar).
I =. 2
Chapter 5
The research hypothesis is that the population of students who attend a commu�nication skills seminar will use a different number of verbal fillers during a pre�sentation than students in general: # p.2. The null hypothesis is that Population 1's scores are on the average the same as Population 2's: p. = pL2.
A Determine the characteristics of the comparison distribution. This compari�
son distribution is a distribution of means. It has a mean of 53 (the same as the�population mean). Its variance is the population variance divided by 25, the�number of individuals in the sample: crii = 0.2/N =72/2D5 = 49/25 = 1.96;
0M = V1.96 = 1.40. Its shape is normal, since the population of individual verbal filler scores is normally distributed. (Figure 5-8b shows the comparison distribution.)
A Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. This is a two-tailed test (the researcher is interested in an effect in either direction) at the overall 1% significance level. Based on the normal curve table for the top and bottom .5%, the cutoffs are +2.57 and —2.57 (see tiny shaded areas in Figure 5-8b).
0 Determine your sample's score on the comparison distribution. The sample's
mean was 48 (see Figure 5-8c). This comes out to a Z of —3.57 on the comparison distribution: Z = (M — = (48 — 53)/1.40 = —5/1.40 = —3.57.
A Decide whether to reject the null hypothesis. The Z score of the sample's
mean is —3.57, which is more extreme than the cutoffs of ±2.57. Thus, the researchers can reject the null hypothesis and conclude that the research hypoth�esis is supported. To put this another way, the result of the Z test is statistically significant at the p < .01 level. You can see this in Figure 5-8b. Note how ex�treme the sample's mean is on the distribution of means (the distribution that would apply if the null hypothesis were true). The final conclusion is that stu�dents' use of verbal fillers during a presentation decreases after attending a com�
munication skills seminar.
When you next give a presentation, ask a friend to count the number of times you use verbal fillers (such as "urn," "uh," and "you know"), or tape-record the presenta�tion and count your own verbal fillers. Verbal fillers can be distracting to listeners and may adversely affect the quality of a presentation. Communications specialists rec�ommend replacing verbal fillers with brief pauses (that allow you to gather your thoughts).
�
How are you doing?
1. How is hypothesis testing with a sample of more than one person different
from hypothesis testing with a sample of a single person?
2. How do you find the Z score for the sample's mean on the distribution of
means?
3. A researcher predicts that showing a certain film will change people's atti�
tudes toward alcohol. The researchers then randomly select 36 people, show them the film, and give them an attitude questionnaire. The mean score on the attitude test for these 36 people is 70. The score for people in general on this test is 75, with a standard deviation of 12. Using the five steps of hypothesis testing and the 5% significance level, carry out a Z test to see if viewing the film changes people's attitudes toward alcohol.
Hypothesis Tests with Means of Samples
153
sepnime abuego seop w!!1 6u!eas .slseglod
Ilnu et1440*-1!96'1.— awauxe alOW `Oc•e— s! ueaw smcl
-wes 9.100S Z ayl 'sisawodAq !!nu au; 10*J o; Jainaum eppea p
'09*Z— = Z/(gL — OL) = kvn/(14111 — IN) = Z
uoqnqpislp uospedwoo ay; uo 9.100S s‘aidules anon( auRweloCI 0
gg I— pue 9JB SI9A91 9OUBOIHUbIS
049 pale;-onni •paloafai eq wogs spaqiodAq !mu ay; uomm
uolincualsip uospeduloo uo woos aidwes ;winoaulauelea (
'IBLUJOU adeqs = WD= 90717[. = 9£/z34= =•sL = rl
INA uospedwoo au; lo soppapereqo aultuaalao 8
zri = I-11 :woos apmme ueaw awes
aneq suopindod aw. s! spet.BodAil !pm au .z1-1 LTI luieue6
ul aidoed to uopindod apmme ueaw wo.q. Tualamp s! Lull; ay;
umoqs uogeindod lo eprigue ueaw et.g.;eyl s! sisewodAq qamesai aqi
(wig umoqs 9JE Ol4M) IBJ9U96 9009d :z uoqeindod
umoqs eidoed uoueindod
:we suoReindod ong 8141 •suoReindod au; inoqe
s!saulodAu pnu e pue sisawdAq qoaeasaa e se uousanb au; amisa8 0 'C
"WO" —W) =Z
s! eint.wo; ayl .sueaw uo!lnquisp ay; jo uoReinap piepueis pue ueaw
6u!snlnq 'WOOS Z e 01. 9JOOS MBJ e oupueqo J01. elnw.104 lensn asn noA -z
.sueew to uoilnquisp e s! uo!inquIsup
uospedwoo eqI 'uosJed au° uew 9JOW eldwes e LigAN bugsal spewodAti ui • L
SA 9/1/1SUV
Controversy: Marginal Significance
A long-standing controversy regarding significance testing is what to do when a result
does not make the cutoff value at the usual 5% level but comes very close (say, p < .10).
" "approaching significance," or a "near
This is often called "marginal significance,
significant trend." A couple of years ago, the controversy was spotlighted on an email listsery for social and personality psychologists. The discussion began when the fol�lowing note was posted by Todd Nelson (California State University at Stanislaw):
Throughout my ph.d. training ...it was common parlance to refer to ...P-values of�between .05 and .10 as "marginally significant." It was a very common term in all the
major social psychological journals ... and among my professors.
The other day, in a thesis defense I was chairing, I was dumfounded when the other thesis committee members strongly objected to the term "marginally significant"
in the student's results section... both saying they had NEVER heard of the term (!).
Wanting to make sure I was still on planet Earth, I consulted several statistics and research textbooks in my office, and found a few that referred to "marginal sig�nificance" and several articles by noted statisticians who make the case for discussing
results in the .05—.10 range
...
[H]ave you heard of this term
? Do you use it and teach it? If so, why? If not,
what is your objection?
Almost immediately there were more than 100 responses! First, it quickly be�
"marginally significant" is indeed a
came clear that calling results that are close
common practice in many areas of psychology. As Frank LoSc hiavo (Ohio University) put it, "It sounds like it is the other committee members who are not on Earth."
i~«
Chapter 5
But it also became clear that while it may be fairly common, many think it is a bad
idea. Charles Stangor (University of Maryland) called it a "completely bogus concept used to make us poor scientists feel better when our results are close but no cigar." Tricia Yurak (Rowan University) called it "a fudge term that I won't use" adding that "even if I get a p value of .07, I will report it as not significant." Richard St. Jean (University of Prince Edward Island) recalls "one of my stats professors said calling a
...
finding marginally significant is like calling a woman marginally pregnant! The princi�
ple is that it is an all or none decision
...
" David Washburn (Georgia State University)
explains the logic: "One decides in advance or by convention to call p < .05 effects [T]erms like 'marginally significant' are counterfactuals—like say�
`significant'
ing you 'almost did' something that you didn't do." As several of the email posts noted, the argument was spelled out in some detail in an article by Chet Insko (2003; University of North Carolina):
...
null-hypothesis testing depends on traditional two-valued logic. Thus, one either
rejects or fails to reject the null hypothesis, and rejection of the null hypothesis allows for acceptance of the logical contradictory of the null hypothesis, the research hy�pothesis. The crucial point here is that deductive logic is two valued; for example, Socrates is or is not mortal.... A logician would not, for example, conclude that Socrates is marginally mortal. Since null-hypothesis testing depends on logic, only two-valued distinctions can be made, and this, of course, requires a single cut point to differentiate significant test statistics from nonsignificant test statistics. (p. 1331)
However, the majority opinion among those responding was expressed by Warren Thorngate (Carleton University): " ... people who adhere to the '0.05 or nothing!' philosophy either need to be reeducated or enter therapy." Phoebe Ellsworth (Uni�
versity of Michigan) added that "to act as though there is a gulf between .05 and .06�is not maintaining high standards; it is idiocy." The point here is that .05 is an arbi�
trary convention. Indeed, many quoted a comment in an influential article by Rosnow and Rosenthal (1989) "... surely God loves the .06 nearly as much as the .05" (p. 1277). In addition, many noted that it is more important to emphasize the size of the effect and the power of the study, issues we consider in the next chapter. Also, several mentioned that such "near significant results" may be appropriate particularly when there are related results that are clearly significant (for example, if a study of the effects on stress is significant when using a questionnaire measure and is also
near significant when using a physiological measure). Finally, quite a few people�emphasized that the acceptability of reporting such results varies considerably
among different specialty areas of psychology.
Hypothesis Tests About Means
of Samples (Z Tests) and Standard Errors in Research Articles
As we have noted several times, research in which there is a known population mean and standard deviation is quite rare in psychology. Thus, you will not often see a Z test in a research article. We have asked you to learn about this situation mainly as a building block for understanding hypothesis testing in more common research situa�
tions. Still, Z tests do show up now and then.
Here is an example. As part of a larger study, Wiseman (1997) gave a loneli�ness test to a group of college students in Israel. As a first step in examining the results, Wiseman checked that the average score on the loneliness test was not sig�nificantly different from a known population distribution based on a large U.S.
Hypothesis Tests with Means of Samples
155
study of university students that had been conducted earlier by Russell and col�leagues (1980). Wiseman reported:
. . .
[T]he mean loneliness scores of the current Israeli sample were similar to those
of Russell et al.'s (1980) university sample for both males (Israeli: M = 38.74, SD
9.30; Russell: M = 37.06, SD = 10.91;
1.09, NS) and females (Israeli: M .25, NS). (p. 291)
36.39, SD = 8.87; Russell: M = 36.06, SD = 10.11;
In this example, the researcher gives the standard deviation for both the sample studied (the Israeli group) and the population (the data from Russell). However, in the steps of figuring each Z (the sample's score on the distribution of means), the re�searcher would have used the standard deviation only of the population. Notice also that the researcher took the nonsignificance of the difference as support for the sam�ple means being "similar" to the population means. However, the researcher was
very careful not to claim that these results showed there was "no difference."
Of the topics we have covered in this chapter, the one you are most likely to see in a research article is the standard deviation of the distribution of means, used to describe the amount of variation that might be expected among means of samples of a given size from this population. In this context, it is usually called the standard error (SE) or standard error of the mean (SEM). Standard errors are typically shown in research arti�cles as the lines that go above (and sometimes also below) the tops of the bars in a bar graph; these lines are called error bars. For example, Stankiewicz and colleagues (2006) examined how limitations in human perception and memory (and other factors) affect people's ability to find their way in indoor spaces. In one of their experiments, eight stu�dents used a computer keyboard to move through a virtual indoor space of corridors and hallways shown on a computer monitor. The researchers calculated how efficiently stu�dents moved through the space, with efficiency ranging from 0 (extremely inefficient) to I (extremely efficient). The researchers compared the efficiency of moving through the space when students had a limited view of the space versus when they had a clear (or un�limited) view of the space. Their results, shown in Figure 5-9, include error bars.
1.0 7
0.8 -
0.6 -
T
T
0.4
0.2-I
0.0
Limited View Unlimited View�Viewing Condition
Figure 5-9 The mean navigation efficiency when navigating in the unlimited and lim�
ited viewing condition in Experiment 2. In the limited-view condition, visual information was�available as far as the next intersection (further details were obscured by "fog"). In the unlimited�
view condition, visual information was available to the end of the corridor. Error bars repre�sent 1 standard error of the mean.
Source: Stankiewicz, B. J., Legge, G. E., Mansfield, J. S., & Schlicht, E. J. (2006). Lost in virtual space: Studies in human and ideal spatial navigation. Journal of Experimental Psychology: Human Perception and Performance, 32, 688-704. Copyright © 2006 by the American Psychological Association.
Chapter 5
Error bars on graphs are common in psychology research articles, particularly in the more experimental areas such as perception and cognitive neuroscience.
Advanced Topic: Estimation, Standard Errors, and Confidence Intervals
Hypothesis testing is our main focus in this book. However, there is another kind of statistical question related to the distribution of means that is also important in psychology: estimating the population mean based on the scores in a sample. Tradition�ally, this has been very important in survey research. In recent years it is also becom�ing important in experimental research (e.g., Wilkinson and Task Force on Statistical Inference, 1999) and can even serve as an alternative approach to hypothesis testing.
Estimating the Population Mean When It Is Unknown
When the population mean is unknown, the best estimate of the population mean is the sample mean. In the study of students who were told about a person's positive personality qualities, the mean attractiveness rating given to that person by the sam�ple of 64 students was 220. Thus, 220 is the best estimate of the mean attractiveness rating that would be given by the unknown population of students who would ever
be told about a person's positive personality qualities.
How accurate is the sample mean as an estimate of the population mean? A way to get at this question is to ask, "How much do means of samples from a population vary?" Fortunately we have already thought about this question when considering the distribution of means. The variation in means of samples from a population is the vari�ation in the distribution of means. The standard deviation of this distribution of means, the standard error of the mean, is thus a measure of how much the means of samples vary from the overall population mean. (As we noted earlier, just because researchers are often interested in using a mean of a sample to estimate the population mean, this variation in the distribution of means is thought of as "error" and we give the name
"standard error of the mean" to the standard deviation of a distribution of means.)
In our example, the accuracy of our estimate of 220 for the mean of the popula�tion of students who are told about the person's positive personality qualities is the standard error, which we figured earlier to be 6.
Range of Possible Means Likely to Include the Population Mean
You can also estimate the range of possible means that are likely to include the pop�ulation mean. Consider our estimate of 220 with a standard error of 6. Now follow this closely: suppose you took a mean from our distribution of means; it is 34% likely you would get a mean between 220 (the mean of the distribution of means) and 226 (one standard error above 220). This is because the distribution of means is a normal curve. Thus, the standard error is 1 standard deviation on that curve, and 34% of a normal curve is between the mean and 1 standard deviation above the mean. From this reasoning, we could also figure that another 34% should be between 220 and 214 (1 standard error below 220). Putting this together, we have a region from 214 to 226 that we are 68% confident should include the population mean if our sample was randomly taken from this population. (See Figure 5-10a.)
Hypothesis Tests with Means of Samples
157
I
�
�
—2 —1
0
+1 +2
I
I
204.58 208.24 214
220
226 231.76 235.42
68%
(a)
4
95%
(b)
4
99%
(c)
Figure 5-10 A distribution of means and the (a) 68%, (b) 95%, and (c) 99% confi�
dence intervals for students rating the physical attractiveness of a person after being told that the person has positive personality qualities (fictional data).
This is an example of a confidence interval (usually abbreviated CI). We would call it the "68% confidence interval." The upper and lower ends of a confidence in�terval are called confidence limits. In this example, the confidence limits for the
68% confidence interval are 214 and 226 (see Figure 5-10a).
Let's review the logic: based on our knowledge of a sample's mean, we are try�ing to estimate the mean of the population that sample came from. Our best estimate of the population mean has to be our sample mean. What we don't know is how good an estimate it is. If sample means from that population could vary a lot, then we can�not be very confident that our estimate is close to the true population mean. But if the sample means are likely all to be very close to the true population mean, we can as�sume our estimate is pretty close. To get a sense of how accurate our estimate is, we can use our knowledge of the normal curve to estimate the range of possible means that are likely to include the population mean. This estimate of the range of means is called a confidence interval.
confidence interval (Cl) roughly
speaking, the range of scores (that is, the scores between an upper and lower value) that is likely to include the true popula�
The 95% and 99% Confidence Intervals
tion mean; more precisely, the range of possible population means from which it is not highly unlikely that you could have obtained your sample mean.
Normally, you would want to be more than 68% confident about your estimates.�Thus, when figuring confidence intervals, psychologists use 95% or even 99% con��fidence intervals. These are figured based on the distribution of means for the area
confidence limit upper or lower
value of a confidence interval.
1 >
Chapter 5
that includes the middle 95% or middle 99%. For the 95% confidence interval, you want the area in a normal curve on each side between the mean and the Z score that includes 47.5% (47.5% plus 47.5% adds up to 95%). The normal curve table shows this to be 1.96. Thus, in terms of Z scores, the 95% confidence interval is from -1.96 to +1.96 on the distribution of means. Changing these Z scores to raw scores for the attractiveness ratings example gives an interval of 208.24 to 231.76 (see Figure 5-10b). That is, for the lower confidence limit, ( -1.96)(6) + 220 -11.76 + 220 = 208.24; for the upper confidence limit, (1.96)(6) + 220 11.76 + 220 = 231.76. In sum, based on the sample of 64 students who were told about the person's positive personality qualities, you can be 95% confident that the true population mean for such students is between 208.24 and 231.76 (see
Figure 5-10b).
For a 99% confidence interval, you use the Z scores for the middle 99% of the normal curve (the part that includes 49.5% above and below the mean). This comes out to ±2.57. Changing this to raw scores, the 99% confidence interval is from
204.58 to 235.42 (see Figure 5-10c).
Notice in Figure 5-10 that the greater the confidence is, the broader is the con�fidence interval. In our example, you could be 68% confident that the true popula�tion mean is between 214 and 226; but you could be 95% confident that it is between 208.24 and 231.76 and 99% confident it is between 204.58 and 235.42. This is a gen�eral principle. It makes sense that you need a wider range of possibility to be more sure you are right.
Steps for Figuring Confidence Limits
There are two steps for figuring confidence limits. These steps assume that the distri�bution of means is approximately a normal distribution.
0 Figure the standard error. That is, find the standard deviation of the distribu�
tion of means in the usual way:
Qz
QM
'v 2
crm
N
A For the 95% confidence interval, figure the raw scores for 1.96 standard
errors above and below the sample mean; for the 99% confidence interval, figure the raw scores for 2.57 standard errors above and below the sample mean. To figure these raw scores, first multiply 1.96 or 2.57 by the standard error, then add this to the mean for the upper limit and subtract this from the
mean for the lower limit.
In terms of the overall figuring, once you know the standard error, the upper limit of the 95% confidence interval is equal to the sample mean plus 1.96 mul�tiplied by the standard error of the mean: upper limit = M + (1.96)(o- m); the lower limit is the sample mean minus 1.96 multiplied by the standard error: lower limit = M - (1.96)(o-m). For the 99% CI, the computation for the upper limit is: M + (2.57)(crm); the lower 99% CI limit is M - (2.57) (um).
95% confidence interval confidence
interval in which, roughly speaking, there is a 95% chance that the population mean falls within this interval,
Example Let's find the 99% confidence interval for the verbal fillers example
from earlier in the chapter. Recall that, in that example, the number of verbal fillers�used by students in the general population (that is, students who had not attended a
99% confidence interval confidence
interval in which, roughly speaking, there is a 99% chance that the population mean falls within this interval.
communication skills seminar) was normally distributed with a mean of 53 and a�standard deviation of 7. The 25 students who attended a presentation skills seminar
used a mean of 48 fillers.
Hypothesis Tests with Means of Samples
159
0 Figure the standard error. QM
e) For the 95% confidence interval, figure the raw scores for 1.96 standard
72
mewl w -ir 41 - wi 1
V1.96 = 1.40.
Note that you are figuring the con�fidence interval based on the mean of your sample, not based on the mean oftheknow npopulation.
So,inthecarrentexampleyeaare
figuring the confidence interval around a mean of 48, which was the mean number of fillers used by the 25 students who attended the seminar. The mean number of fillers used by the population of students in general is known (53); so there is no need to figure any kind of confidence interval based on that mean.
N
25
errors above and below the sample mean; for the 99% confidence interval, figure the raw scores for 2.57 standard errors above and below the sample mean. You want the 99% confidence interval. Thus, first multiply 2.57 by 1.40 to get 3.60, which is how far the confidence limit is from the mean. For the upper confidence limit, add this distance to the sample mean: 48 + 3.60 = 51.60.
In terms of the overall calculations, upper limit
M + (2.57)
(o-A,/ = 48 + (2.57)(1.40)
tract 3.60 (the results of multiplying 2.57 by 1.40) from 48, the mean of the sample, which gives 44.40. In terms of the overall calculations, lower limit
51.60. For the lower confidence limit, sub�
M — (2.57) (1.40) = 44.40.
Thus, based on this sample of 25 students, you can be 99% confident that an interval from 44.40 to 51.60 includes the true population mean.
The Subtle Logic of Confidence Intervals
If you understand the preceding explanation, it will be sufficient for practical purposes in working with confidence intervals. Basically, confidence intervals tell you the range of means that you can be pretty sure include the true population mean. However, if you want to think deeply about the situation, there is a subtle issue about precisely what these numbers mean. (What follows is kind of an "advanced, advanced" topic
section!)
Strictly speaking, consider what we are figuring to be, say, a 95% confidence
interval. We are figuring it as the range of means that are 95% likely to come from a population with a true mean that happens to be the mean of our sample. However, what we really want to know is the range that is 95% likely to include the true popu�lation mean. This we cannot know. That is, we are figuring one thing and really want to know another. Read this paragraph again and think about it. It is easy to miss this
subtle but logically significant twist.
The way this awkward situation has been dealt with traditionally by most re�searchers is just to ignore this subtlety. Researchers who are more sophisticated statisti�
cally are comfortable with the situation by emphasizing that what we actually say is�that we are 95% confident that the true mean is in this range. That is, by using this lan�
guage, we are acknowledging that what we are doing is really a bit backward, but it is the best we can do! "Confidence" is closer to the subjective interpretation of probability we discussed in Chapter 3. In fact, "confidence" is meant to be a slightly vaguer term than probability. Nevertheless, it is not completely vague. What we are doing does have
a solid basis.
Here is how to understand this: suppose in our attractiveness ratings example, the true mean was indeed 220. As we have seen, this would give a 95% confi�dence interval of 208.24 to 231.76, as shown in Figure 5-11a. We don't know what the true population mean is. But suppose the true population mean was 208.24. The range that includes 95% of sample means from this population's distribution of means would be from 196.48 to 220.00, as shown in Figure 5-11b. (You can work this out yourself following the two steps for figuring confidence limits.) Similarly, for the population with a mean of 231.76, the range that includes 95% of sample means from this population's distribution of means is from 220.00 to 243.52, as shown in Figure 5-11c.
Chapter 5
�
�
�
4
(
208.24
231.76
).
4
196.48
(b)
(c)
243.52
�
220
Figure 5-11 (a) 95% confidence interval based on sample mean of 220 for students
rating the physical attractiveness of a person after being told that the person has positive per�sonality qualities (fictional data); (b) range including 95% of sample means, based on distrib�ution of means shown above it with limb = lower limit of the 95% confidence interval for M = 220; (c) range including 95% of sample means, based on distribution of means above it
with RAI, = upper limit of the 95% confidence interval for M = 220
What this shows is a general principle: for a 95% confidence interval, the lower confidence limit is the lowest possible population mean that would have a 95% probability of including our sample mean; the upper confidence limit is the highest pos�
sible population mean that would have a 95% probability of including our sample mean.
This convoluted logic is a bit like the double-negative logic behind hypothesis testing. This is no accident, since both are making inferences from samples to popu�lations using the same information.
Confidence Intervals and Hypothesis Testing
A practical implication of the link of confidence intervals and hypothesis testing is that you can use confidence intervals to do hypothesis testing! If the confidence in�terval does not include the mean of the null hypothesis distribution, then the result is significant. For example, in the attractiveness ratings study, the 95% confidence in�terval for those who were told that the person has positive personality qualities was from 208.24 to 231.76. However, the population that was told nothing about the per�son's personality qualities had a mean of 200. This population mean is outside the
range of the confidence interval. Thus, if you are 95% confident that the true range is�208.24 to 231.76 and the population mean for those who were told nothing about the�person's personality qualities is not in this range, you are 95% confident that that
population is not the same as the one your sample came from.
Another way to understand this is in terms of the idea that the confidence limits are the points at which a more extreme true population would not include your sam�ple mean 95% of the time. The population mean for those who were told nothing
Hypothesis Tests with Means of Samples
161
about the person's personality qualities was 200. If this were the true mean also for the group that was told about the person's positive personality qualities, 95% of the time it would not produce a sample mean as high as the one we got.
�
How are you doing
�
�
(
_
(a) What is the best estimate of a population mean? (b) Why?
(a) What number is used to indicate the accuracy of an estimate of the popu�
lation mean? (b) Why?
3. What is a 95% confidence interval?
4. A researcher predicts that showing a certain film will change people's atti�
tudes toward alcohol. The researchers then randomly select 36 people, show them the film, and give them an attitude questionnaire. The mean score on the attitude test for these 36 people is 70. The score on this test for people in the general population (who do not see the film) is 75, with a standard devia�tion of 12. (a) Find the best estimate of the mean of people who see the film and (b) its 95% confidence interval. (c) Compare this result to the conclusion you drew for this same situation when you used this example in the How are
you doing? section for hypothesis testing with a distribution of means.
5. (a) Why is it wrong to say that the 95% confidence interval is the region in
which there is a 95% probability of finding the true population mean? (b) What is the basis for our 95% confidence?
ueew Jno tam eldwes 0 6u!pniou! to Al!!!qeqaid %960 eneqlou prim Jaqb!t4 Aue uopindod anal e wpm to mod eql s!i.!w!! aouepiluoo Jaddn eqi. 'Ape! -!wm :ueew Jno Lp!M eldwes e 6u!pripu! io 41!qeqaml 0/0960 anal Iou pinom
,I8MOIAue uoimindod anal e tp!qm le pod NI s! Tiw!! aoueppoo mmoi eqj (q)
.41!qeqoicl %96 6upn6!I uaqm qi!m Iwels of 10qm awls mi. 6u!
-mou)t to Aeon ou eneq noA os :ueew uopreindod am ayI mow! lou op noA�esneoeq ueow uopreindod awl ayI 6u!pu!1 to AT!!!qeqaid %g6 e s! amqi
qo!qm u! uo!6a, aqI s! ewe u! eouapquoo %96 ayI legl AES 01. 61_10.1M Si 11 (e) -9
bugsei. smet.nodki Joi eldwexa s!ql 6umn
ueqm se uomnpuoo awes aqI s! s!tu .ewes eq), awe suopindod oml eql�TeLIT sisoLllodALI !!nu eql loafal ueo noA `srita "(GL sem qo!qm) uopindod
lemue6 eql Jo ueew ayI epnpu! ;ou swop lemew! aouep!Juoo %96 au (o)
76.u. of 90.99 Wall SI le/vow! eoueppoo %g6
81-1.1 = 36T
+
OL = 006' + OL = (141.°)(96. + W = aouePq
-uoo mddn !80'99 = Z6'8 — OL = 0(96' l.) — OL = (wn)(96. I.) — W = liwil eouappoo mmo! au 7 = 9E/m70, = N/z:_cy\, s! (Pvn) .10.1.19 plepuels (q)
.13z. :ueew eldwes aql s! elew!Ise lseq eq (e) ti
eldwes e u! SalOOS eql uo peseq palew!Ise
`ueew uopindod aqI sapripuilueppoo %96 awe noA legl senien to e6ua eql t .sueew to uoilnqpisHo ayI to ueew eql loll fuen sueew Tequunowe abalone et44.16noA s! (sueew to uop,ncipls!p eqllo uownep pmpuemm).10.1.19 p_mpueis eqj (q) .ueew uopreindod NT to alewp.se ue to Aoainooe aqI emo!pui oI pesn
s! (sueew io uovq!Asup NI to uonsinep pmpuels .10110 pmpueis eql (e) •
uoReindod Jegio
Aue wowl ueul ueew ewes eql twm uopindod 0 wows ewoo anti o; Ale >!!!
WOW SI (q) •ueew aldwes eq), s! ueew
indod
e to emwpsa Isaq eta (e) • I.
SJOAASUV
162
Chapter 5
Advanced Topic Controversy: Confidence Intervals versus Significance Tests
You may recall from Chapter 4 that for a number of years there has been a lively de��bate among psychologists about significance testing. Among the major issues in that�debate is a proposal that psychologists should use confidence intervals instead of
significance tests.
Those who favor replacing significance tests with confidence intervals (e.g., Cohen, 1994; Hunter, 1997; Schmidt, 1996) cite several major advantages. First, as we noted above, confidence intervals contain all the key information in a signifi�cance test,5 but also give additional information—the estimation of the range of val�ues that you can be quite confident include the true population mean. A second advantage is that they focus attention on the estimation of effects instead of on hy�pothesis testing. Some researchers argue that the goal of science is to provide numeric estimates of effects (and the accuracy of those estimates), not just decisions as to
whether an effect is different from zero.
Confidence intervals are particularly valuable when the results are not signifi�cant (Frick, 1995). This is, because knowing the confidence interval gives you an idea of just how far from no effect you can be confident that the true mean is to be found. If the results are not significant and the entire confidence interval is near to no effect, you can feel confident that, even if there is some true effect, it is probably small. However, if the results are not significant and the confidence interval, while including no effect, also spreads out to include means far from no effect, it would tell us that the study is really very inconclusive: it is possible that there is little or no ef�
fect, but it is also possible that there is a substantial effect.
A third advantage claimed by proponents of confidence intervals over signifi�cance testing is that researchers are less likely to misuse them. As we noted in Chapter 4, a common error in the use of significance tests is to conclude that a non�significant result means there is no effect. With confidence intervals, it is harder to fall into this kind of error.
In light of these various advantages, the most recent Publication Manual of the American Psychological Association (2001) takes the position that "The use of con�
fidence intervals is . . . strongly recommended" (p. 22).
However, it is still relatively uncommon to find confidence intervals in many
types of psychology research articles. In part, this is probably due to tradition and to�most psychologists having been trained with significance tests and having become
used to them.
Other researchers (e.g., Abelson, 1997; Harris, 1997; Nickerson, 2000) emphasize two reasons for not abandoning significance testing in favor of confidence intervals. First, for some advanced statistical procedures, it is possible to do significance testing but not to figure confidence intervals. Second, just as it is possible to make mistakes with significance tests, it is also possible to make other kinds of mistakes with confidence inter�
vals—especially since most research psychologists are less experienced in using them.
Whatever the outcome of this controversy about confidence intervals, it is valu�able to understand them, since you will run into them occasionally when reading re�
search literature, and you are likely to see them more often in the future. On the other hand, they now appear only occasionally, and there is no sign that they are likely to replace significance testing any time soon. For this reason (and to keep the amount of material to be learned manageable), we have made confidence intervals an ad�vanced topic and decided not to emphasize them in subsequent chapters of this book, which are mainly on significance testing in various types of research situations.
Hypothesis Tests with Means of Samples
163
Advanced Topic: Confidence Intervals in Research Articles
As we noted, confidence intervals (usually abbreviated as CI), while far from stan�
dard, are sometimes reported in research articles. For example, consider a study by�Christakis and Fowler (2007). They studied more than 12,000 people over a 32-year
period to examine whether people's chances of becoming obese are related to whether they have friends and family who become obese. They reported that "A per�son's chance of becoming obese increased by 57% (95% confidence interval [CI], 6 to 123) if he or she had a friend who became obese in a given interval" (p. 370). This means that we can be 95% confident that the true increase in obesity risk was between 6% and 123%. As another example, an organizational psychologist might explain that the average number of overtime hours per week worked in a particular industry is 3.7 with a 95% confidence interval of 2.5 to 4.9. This would tell you that the true average
number of overtime hours is probably somewhere between 2.5 and 4.9.
A shortcut that many researchers find helpful in reading research articles that give standard errors but not confidence intervals is that the 95% confidence interval is approximately 2 standard errors in both directions (it is exactly 1.96 SEs) and the 99% confidence interval is approximately 2.5 standard errors in both directions (it is exactly 2.57 SEs).
�
1. When studying a sample of more than one individual, the comparison distribu�
tion in the hypothesis-testing process is a distribution of means. It can be thought�of as what would result from (a) taking a very large number of samples, each of�the same number of scores taken randomly from the population of individuals,
and then (b) making a distribution of the means of these samples.
2. The distribution of means has the same mean as the corresponding population
of individuals. However, it has a smaller variance because the means of samples are less likely to be extreme than individual scores. (In any one sample, extreme scores are likely to be balanced by middle scores or extreme scores in the other direction.) Specifically, the variance of the distribution of means is the variance of the population of individuals divided by the number of individuals in each sam�ple. Its standard deviation is the square root of its variance. The shape of the dis�tribution of means approximates a normal curve if either (a) the samples are each
of 30 or more scores or (b) the population of individuals follows a normal curve.
3. Hypothesis tests with a single sample of more than one individual and a known
population are called Z tests and are done the same way as the hypothesis tests of Chapter 4 (where the studies were of a single individual compared to a popu�lation of individuals). The main exception is that the comparison distribution in a hypothesis test with a single sample of more than one individual and a known
population is a distribution of means
4. There is some controversy about the use of terms such as marginal significance,
approaching significance, and near significant trend to describe results that come close to the significance cutoff value. Critics of these terms note that hypothesis testing is an all or nothing decision. However, other researchers advocate for greater flexibility and point out that the .05 and .01 significance levels are arbi�trary conventions.
164
Chapter 5
5. The kind of hypothesis testing described in this chapter (the Z test) is rarely used
in research practice; you have learned it as a stepping-stone. The standard devi�ation of the distribution of means (the standard error) is commonly used to describe the expected variability of means, particularly in bar graphs in which the standard error may be shown as the length of a line above (and sometimes
below) the top of each bar.
6. ADVANCED TOPIC: The sample mean is the best estimate for the population
mean when the population mean is unknown. The accuracy of the estimate is the standard deviation of the distribution of means (also known as the standard error), which tells you roughly the amount by which means vary. Based on the distribu�tion of means, you can figure the range of possible means that are likely to include the population mean. If we assume the distribution of means follows a normal curve, the 95% confidence interval includes the range from 1.96 standard deviations below the sample mean (the lower confidence limit) to 1.96 standard deviations above the sample mean (the upper confidence limit). Strictly speaking, the 95% confidence interval around a sample mean is the range in which the lower limit is the mean of the lowest population that would have a 95% probability of including a sample with this sample mean, and the upper limit is the corresponding mean of the highest population. The 99% confidence interval includes the range from 2.57 standard deviations below the sample mean (the lower confidence limit) to 2.57
standard deviations above the sample mean (the upper confidence limit).
7. ADVANCED TOPIC: An aspect of the ongoing controversy about significance
tests is whether researchers should replace them with confidence intervals. Propo�nents of confidence intervals argue that they provide additional information, put the focus on estimation, and reduce misuses common with significance tests. Confi�dence intervals have become more common in recent years in psychology research articles, but they are still relatively unusual, in part due to tradition and unfamiliar�ity with them. In addition, opponents of relying exclusively on confidence intervals argue that they cannot be used in some advanced procedures, estimation is not al�ways the goal, and they can have misuses of their own. When confidence intervals are reported in research articles, it is usually with the abbreviation CI.
1
Mon of means (p. 138)
o-m (p. 142)
2
standard error (SE) (p. 142)
f a distribution of means
standard deviation of a distribution
Z test (p. 148)
40)
of means (p. 142)
confidence interval (CI) (p. 157) confidence limit (p. 157)
95% confidence interval (p. 158)�99% confidence interval (p. 158)
141)
QM (p. 142)
e of a distribution of means
standard error of the mean (SEM) (p. 142)
41)
I
�
ig the Standard Deviation Distribution of Means
;tandard deviation of the distribution of means for a population with cr 1ple size of 20.
13
Hypothesis Tests with Means of Samples
165
Answer
Using Rules 2a and 2b for the characteristics of a distribution of means: The vari�ance of a distribution of means is the variance of the population of individuals divided by the number of individuals in each sample. The standard deviation of a distribution of means is the square root of the variance of the distribution of means. The variance of the population is 169 (that is, 13 squared is 169); dividing this by 20 gives a variance of the distribution of means of 8.45. The square root of
this, 2.91, is the standard deviation of the distribution of means.
Using the formula,
,72
132
169
QM
�
2.91
N
20
20
Hypothesis Testing with a Sample of More Than One: The Z Test
A sample of 75 was given an experimental treatment and had a mean of 16 on a par�ticular measure. The general population of individuals has a mean of 15 on this measure and a standard deviation of 5. Carry out a Z test using the five steps of hypothesis testing with a two-tailed test at the .05 significance level, and make a drawing of the distributions involved.
Answer
0 Restate the question as a research hypothesis and a null hypothesis about
the populations. The two populations are:
Population 1: Those given the experimental treatment.
Population 2: People in the general population (who are not given the experi�mental treatment).
The research hypothesis is that the population given the experimental treatment�will have a different mean on the particular measure from the mean of people�in the general population (who are not given the experimental treatment):
# The null hypothesis is that the populations have the same mean score�on this measure: pi = 112.
0 Determine
the
characteristics
of
the
comparison
distribution.
0.2
52
= P,
15; um
�
.57; shape is normal (sample
N
75
size is greater than 30).
0 Determine the cutoff sample score on the comparison distribution at which
the null hypothesis should be rejected. Two-tailed cutoffs, 5% significance level, are +1.96 and —1.96.
0 Determine your sample's score the comparison distribution. Using the
o
n
formula, Z = (M
—
1.44)10- m, Z = (16 — 15)/.57 = 1/.57 = 1.75.
0 Decide whether to reject the null hypothesis. The sample's Z score of 1.75
is not more extreme than the cutoffs of +1.96; do not reject the null hypothe�sis. Results are inconclusive. The distributions involved are shown in Figure 5-12.
Chapter 5
(a)
15
cy2 = 25
a = 5
Raw Scores: 5
Z Scores: –2
0
+1
+2
�
—
�
�
-2
–1
0
1
2
15
15.57
16.14
(c)
M= 16
�
16
Figure 5-12 Answer to the hypothesis-testing problem in Example Worked-Out Prob�
lems: (a) the distribution of the population of individuals, (b) the distribution of means (the comparison distribution), and (c) the sample's distribution.
Outline for Writing Essays for Hypothesis-Testing Problems Involving a Single Sample of More Than One and a Known Population (Z Test)
1. Describe the core logic of hypothesis testing in this situation. Be sure to explain
the meaning of the research hypothesis and the null hypothesis in this situation�where we focus on the mean of a sample and compare it to a known population�mean. Explain the concept of support being provided for the research hypothe�
sis when the study results allow the null hypothesis to be rejected.
2. Explain the concept of the comparison distribution. Be sure to mention that,
with a sample of more than one, the comparison distribution is a distribution of�means because the information from the study is a mean of a sample. Mention�that the distribution of means has the same mean as the population mean
Hypothesis Tests with Means of Samples
167
because there is no reason for random samples in the long run to have a differ-ent mean; the distribution of means has a smaller variance (the variance of the population divided by the number in each sample) because it is harder to get ex�treme means than extreme individual cases by chance, and the larger the sam�
I
ples are, the rarer it is to get extreme means.
3. Describe the logic and process for determining (using the normal curve) the cut�
off sample score(s) on the comparison distribution at which the null hypothesis
should be rejected.
4. Describe why and how you figure the Z score of the sample mean on the com�
parison distribution.
5. Explain how and why the scores from Steps e and 0 of the hypothesis-testing
process are compared. Explain the meaning of the result of this comparison with regard to the specific research and null hypotheses being tested.
Advanced Topic: Finding Confidence Intervals
Find the 99% confidence interval for the sample mean in the study just described.
Answer
0 Figure the standard error. The standard error is the standard deviation of the
distribution of means. In the preceding problem, it was .57.
A For the 95% confidence interval, figure the raw scores for 1.96 standard
errors above and below the sample mean; for the 99% confidence interval, figure the raw scores for 2.57 standard errors above and below the sample mean. For the 99% confidence interval, upper limit = M + (2.57)(crM)
16 + (2.57)(.57) = 16 + 1.46 = 17.46; lower limit = M — (2.57)(aM) 16 — (2.57)(.57) = 16 — 1.46 = 14.54. Thus, the 99% confidence interval from 14.54 to 17.46.
i
s
Advanced Topic: Outline for Writing Essays for Finding Confidence Intervals
1. Explain that a confidence interval is an estimate (based on your sample's mean
and the standard deviation of the distribution of means) of the range of values that�is likely to include the true population mean for the group studied (Population 1).�Be sure to mention that the 95% (or 99%) confidence interval is the range of val�
ues you are 95% (or 99%) confident include the true population mean.
2. Explain that the first step in figuring a confidence interval is to estimate the pop�
ulation mean (for which the best estimate is the sample mean), and figure the
standard deviation of the distribution of means.
3. Mention that you next find the Z scores that go with the confidence interval that
you want.
4. Describe how to change the Z scores to raw scores to find the confidence interval.
�
Practice Proble
These problems involve figuring. Most real-life statistics problems are done on a�computer with special statistical software. Even if you have such software, do these
problems by hand to ingrain the method in your mind.
All data are fictional unless an actual citation is given.
168
Chapter 5
Set I (for Answers to Set I Problems, see pp. 677-678)
1. Why is the standard deviation of the distribution of means generally smaller than
the standard deviation of the distribution of the population of individuals?
2. For a population that has a standard deviation of 10, figure the standard devia�
tion of the distribution of means for samples of size (a) 2, (b) 3, (c) 4, and (d) 9.
3. For a population that has a standard deviation of 20, figure the standard devia�
tion of the distribution of means for samples of size (a) 2, (b) 3, (c) 4, and (d) 9.
4. ADVANCED TOPIC: Figure the 95% confidence interval (that is, the lower and
upper confidence limits) for each part of problem 2. Assume that in each case�the researcher's sample has a mean of 100 and that the population of individuals
is known to follow a normal curve.
5. ADVANCED TOPIC: Figure the 99% confidence interval (that is, the lower and
upper confidence limits) for each part of problem 3. Assume that in each case�the researcher's sample has a mean of 10 and that the population of individuals
is known to follow a normal curve.
6. For each of the following samples that were given an experimental treatment,
test whether they are different from the general population: (a) a sample of 10 with a mean of 44, (b) a sample of 1 with a mean of 48. The general population of individuals has a mean of 40, a standard deviation of 6, and follows a normal curve. For each sample, carry out a Z test using the five steps of hypothesis test�ing with a two-tailed test at the .05 significance level, and make a drawing of the distributions involved. (c) ADVANCED TOPIC: Figure the 95% confidence
interval for parts (a) and (b).
7. For each of the following samples that were given an experimental treatment,
test whether they scored significantly higher than the general population: (a) a sample of 100 with a mean of 82, (b) a sample of 10 with a mean of 84. The gen�eral population of individuals has a mean of 81, a standard deviation of 8, and follows a normal curve. For each sample, carry out a Z test using the five steps of hypothesis testing with a one-tailed test at the .01 significance level, and make a drawing of the distributions involved. (c) ADVANCED TOPIC: Figure
the 99% confidence interval for parts (a) and (b).
8. Twenty-five women between the ages of 70 and 80 were randomly selected
from the general population of women their age to take part in a special program to decrease reaction time (speed). After the course, the women had an average reaction time of 1.5 seconds. Assume that the mean reaction time for the general population of women of this age group is 1.8, with a standard deviation of .5 seconds. (Also assume that the population is approximately normal.) What should you conclude about the effectiveness of the course? (a) Carry out a Z test using the five steps of hypothesis testing (use the .01 level). (b) Make a drawing of the distributions involved. (c) Explain your answer to someone who is famil�iar with the general logic of hypothesis testing, the normal curve, Z scores, and probability, but not with the idea of a distribution of means. (d) ADVANCED TOPIC: Figure the 99% confidence interval and explain your answer to some�one who is familiar with the general logic of hypothesis testing, the normal curve, Z scores, probability, and the idea of a distribution of means, but has not
heard of confidence intervals.
9. A large number of people were shown a particular film of an automobile colli�
sion between a moving car and a stopped car. Each person then filled out a ques��tionnaire about how likely it was that the driver of the moving car was at fault,�on a scale from 0 = not at fault to 10 = completely at fault. The distribution of
Hypothesis Tests with Means of Samples
169
ratings under ordinary conditions follows a normal curve with p = 5.5 and Q = .8. Sixteen randomly selected individuals are tested in a condition in which the wording of the question is changed so that the question asks, "How likely is it that the driver of the car who crashed into the other was at fault?" (The differ�ence is that in this changed condition, instead of describing the event in a neutral way, the question uses the phrase "crashed into.") Using the changed instruc�tion, these 16 research participants gave a mean at-fault rating of 5.9. Did the changed instructions significantly increase the rating of being at fault? (a) Carry out a Z test using the five steps of hypothesis testing (use the .05 level). (b) Make a drawing of the distributions involved. (c) Explain your answer to some�one who has never taken statistics. (d) ADVANCED TOPIC: Figure the 95%
confidence interval.
10. Lee and colleagues (2000) tested a theory of the role of distinctiveness in
face perception. In their study, participants indicated whether they recognized�each of 48 faces of male celebrities when they were shown rapidly on a com�
puter screen. A third of the faces were shown in caricature form, in which facial features were electronically modified so that distinctive features were exaggerated; a third were shown in veridical form, in which the faces were not modified at all; and a third were shown in anticaricature form, in which facial features were modified to be slightly more like the average of the celebrities' faces. The average percentage correct across their participants is shown in Figure 5-13. Explain the meaning of the error bars in this figure to a person who understands mean, standard deviation, and variance, but nothing
else about statistics.
11. ADVANCED TOPIC: Anderson and colleagues (2000) studied the rate of HIV
testing among adults in the United States and reported one of their findings as�follows: "Responses from the NHIS [National Health Interview Survey] indi�
cate that by 1995, 39.7% of adults (95% CI = 38.8%, 40.5%) had been tested at
least once
" (p. 1090). Explain what "(95% CI = 38.8%, 40.5%)" means to a
person who understands hypothesis testing with the mean of a sample of more than one but who has never heard of confidence intervals.
70
65 -
60
55
50
Anticaricature
Veridical
Caricature
Image Type
Figure 5-13 Identification accuracy as a function of image type. Standard error bars
are shown.
Source: Lee, K., Byatt, G., & Rhodes, G. (2000). Caricature effects, distinctiveness, and identification:�Testing the face-space framework. Psychological Science, 11, 381. Copyright © 2000 by Blackwell
Publishing. Reprinted by permission of Blackwell Publishers Journals.
170
Chapter 5
Set II
12. Under what conditions is it reasonable to assume that a distribution of means
will follow a normal curve?
13. Indicate the mean and the standard deviation of the distribution of means for each of the following situations.
Population
Sample Size
Mean
Variance
N
(a)
(b)
(C)
(d)
10
(e)
10
(f)
(9)
14. Figure the standard deviation of the distribution of means for a population with a
standard deviation of 20 and sample sizes of (a) 10, (b) 11, (c) 100, and (d) 101.
15. ADVANCED TOPIC: Figure the 95% confidence interval (that is, the lower and
upper confidence limits) for each part of problem 13. Assume that in each case the researcher's sample has a mean of 80 and the population of individuals is known to follow a normal curve.
16. ADVANCED TOPIC: Figure the 99% confidence interval (that is, the lower and
upper confidence limits) for each part of problem 14. Assume that in each case�the researcher's sample has a mean of 50 and that the population of individuals
is known to follow a normal curve.
17. For each of the following studies, the samples were given an experimental treat�
ment and the researchers compared their results to the general population. (Assume all populations are normally distributed.) For each, carry out a Z test using the five steps of hypothesis testing for a two-tailed test, and make a draw�ing of the distributions involved. ADVANCED TOPIC: Figure the 95% confi�dence interval for each study.
Sample
Sample
Significance
Population
Size
Mean
Level
cr
N
(a)
36
16
38
.05
(b)
36
16
38
.05
(c)
36
16
38
.05
(d)
36
16
38
.01
(e)
34
16
38
.01
18. For each of the following studies, the samples were given an experimental treat�
ment and the researchers compared their results to the general population. For each, carry out a Z test using the five steps of hypothesis testing for a two-tailed test at the .01 level, and make a drawing of the distributions involved. ADVANCED TOPIC: Figure the 99% confidence interval for each study.
(
Hypothesis Tests with Means of Samples
171
Population
Sample Size
Sample Mean
a
N
(a)
ilj
(b)
(c)
12
(d)
14
100
12
19. A researcher is interested in whether people are able to identify emotions cor�
rectly when they are extremely tired. It is known that, using a particular method of measurement, the accuracy ratings of people in the general population (who are not extremely tired) are normally distributed with a mean of 82 and a vari�ance of 20. In the present study, however, the researcher arranges to test 50 peo�ple who had no sleep the previous night. The mean accuracy for these 50 individuals was 78. Using the .05 level, what should the researcher conclude? (a) Carry out-a Z test using the five steps of hypothesis testing. (b) Make a drawing of the distributions involved. (c) Explain your answer to someone who knows about hypothesis testing with a sample of a single individual but who knows nothing about hypothesis testing with a sample of more than one individual. (d) ADVANCED TOPIC: Figure the 95% confidence interval and explain your an�swer to someone who is familiar with the general logic of hypothesis testing, the normal curve, Z scores, probability, and the idea of a distribution of means, but
who has not heard of confidence intervals.
20. A psychologist is interested in the conditions that affect the number of dreams
per month that people report in which they are alone. We will assume that based on extensive previous research, it is known that in the general popula�tion the number of such dreams per month follows a normal curve, with µ = 5 and o = 4. The researcher wants to test the prediction that the number of such dreams will be greater among people who have recently experienced a trau�matic event. Thus, the psychologist studies 36 individuals who have recently experienced a traumatic event, having them keep a record of their dreams for a month. Their mean number of alone dreams is 8. Should you conclude that people who have recently had a traumatic experience have a significantly dif�ferent number of dreams in which they are alone? (a) Carry out a Z test using the five steps of hypothesis testing (use the .05 level). (b) Make a drawing of the distributions involved. (c) Explain your answer to a person who has never had a course in statistics. (d) ADVANCED TOPIC: Figure the 95% confidence
I
interval.
21. A government-sponsored telephone counseling service for adolescents tested
whether the length of calls would be affected by a special telephone system that had a better sound quality. Over the past several years, the lengths of telephone calls (in minutes) were normally distributed with µ = 18 and o = 8. They arranged to have the special phone system loaned to them for one day. On that day, the mean length of the 46 calls they received was 21 minutes. Test whether the length of calls has changed using the 5% significance level. (a) Carry out a Z test using the five steps of hypothesis testing. (b) Make a drawing of the dis�tributions involved. (c) Explain your answer to someone who knows about hypothesis testing with a sample of a single individual but who knows nothing about hypothesis testing with samples of more than one individual. (d) ADVANCED TOPIC: Figure the 95% confidence interval.
172
Chapter 5
50
�
Young
Old
40
30
20
10
1
0
-10
Neither
Walk
Memory
Both
Task Made More Difficult
Figure 5-14 Dual-task costs in memory as a function of age group and difficulty con�
dition. Error bars represent ± I SEM.
Source: Li, K. Z. H., Lindenberger, U., Freund, A. M., & Baltes, P. B. (2001). Walking while memorizing: Age-related differences in compensatory behavior. Psychological Science, 12, 230-237. Copyright 0 2001 by Blackwell Publishing. Reprinted by permission of Blackwell Publishers Journals.
22. Li and colleagues (2001) compared older (aged 60 to 75) and younger (aged 20
to 30) adults on the impact on memory of making some aspect of what they were doing more difficult. Figure 5-14 shows some of their results. In the figure caption they note that "Error bars represent ±1 SEM" (standard error of the mean). Explain the meaning of this statement, using one of the error bars as an example, to a person who understands mean and standard deviation, but knows
nothing else about statistics.
23. ADVANCED TOPIC: Perna and colleagues (2003) tested whether a stress man�
agement intervention could reduce injury and illness among college athletes. In their study, 34 college athletes were randomly assigned to be in one of two groups: (1) a stress management intervention group: this group received a cog�nitive behavioral stress management (CBSM) intervention during preseason training; (2) a control group: this group did not receive the intervention. At the end of the season, for each athlete, the researchers recorded the number of health center visits (including visits to the athletic training center) and the num�ber of days of illness or injury during the season. The results are shown in Figure 5-15. In the figure caption, the researchers note that the figure show shows the "Mean (+SE)". This tells you that the line above the top of each bar
represents the standard error. Explain what this means, using one of the error bars as an example, to a person who understands mean and standard deviation, but knows nothing else about statistics.
Hypothesis Tests with Means of Samples
173
5
4
3
2
NOCBSM
=Control
1
I
(
0
Office Visits
Days Out (ill/injured)
Figure 5-15 Mean (+SE) number of accumulated days injured or ill and athletic
training room and health center office visits for cognitive behavioral stress (CBSM) (n = 18) and control groups (n = 16) from study entry to season's end.
management
Source: Perna, F. M., Antoni, M. H., Baum, A., Gordon, P., & Schneiderman, N. (2003). Cognitive�behavior stress management effects on injury and illness among competitive athletes: A randomized
clinical trial. Annuls of Behavioral Medicine, 25, 66-73. Copyright © Lawrence Erlbaum Associates, Inc. Reprinted with permission.
24. Cut up 90 small slips of paper, and write each number from 1 to 9 on 10 slips
each. Put the slips in a large bowl and mix them up. (a) Take out a slip, write�down the number on it, and put it back. Do this 20 times. Make a histogram, and�figure the mean and the variance of the result. You should get an approximately
rectangular distribution. (b) Take two slips out, figure out their mean, write it
down, and put the slips back.
then figure the mean and the variance of this distribution of means. The variance should be about half of the variance of the distribution of individual scores. (c) Repeat the process again, this time taking three slips at a time. Again, make a histogram and figure the mean and the variance of the distribution of means. The distribution of means of three slips each should have a variance of about a third of the distribution of individual scores. Also note that as the sample size in�creases, your distributions get closer to normal. (Had you begun with a normally distributed distribution of slips, your distributions of means would have been fairly close to normal regardless of the number of slips in each sample.)
6 Repeat this process 20 times. Make a histogram;
�
174
Chapter 5
2. We have ignored the fact that a normal curve is a smooth theoretical distribu�
tion, while in most real-life distributions, scores are only at specific numbers, such as a child being in a particular grade and not in a fraction of a grade. So one difference between our example distribution of means and a normal curve is that the normal curve is smooth. However, in psychology research, even when our measurements are at specific numbers, we usually assume that the underlying
thing being measured is continuous.
3. We have already considered this principle of a distribution of means tending
toward a normal curve in Chapter 3. Though we had not yet discussed the distri�bution of means, we still used this principle to explain why the distribution of so many things in nature follows a normal curve. In that chapter, we explained it as the various influences balancing each other out, to make an averaged influence come out with most of the scores near the center and a few at either extreme. Now we have made the same point using the terminology of a distribution of means. Think of any distribution of individual scores in nature as a situation in which each score is actually an average of a random set of influences on that in�dividual. Consider the distribution of weights of pebbles. Each pebble's weight is a kind of average of all the different forces that went into making the pebble
have a particular weight.
4. This fictional study would be much better if the researchers also had another
group of students who were randomly assigned to rate the attractiveness of the person after being told nothing about the person's personality qualities. Relying on the attractiveness ratings for a known population of students is a bit hazardous because the circumstances in the experiment might be somewhat different from that of the usual situation in which students rated the attractiveness of the person. However, we have taken liberties with this example to help introduce the hypothesis-testing process to you one step at a time. In this example and the others in this chapter, we use situations in which a single sample is contrasted with a "known" population. Starting in Chapter 7, we extend the hypothesis-testing procedure to more realistic research situations, those involving more than one group of partic�ipants and those involving populations whose characteristics are not known.
5. Some proponents of confidence intervals over significance testing argue that we should ignore the link with hypothesis testing altogether. This is the most radi�
cal antisignificance-test position. That is, these psychologists argue that our en��tire focus should be on estimation, and significance testing of any kind should�be irrelevant. In Chapter 6, we will discuss the rationale for their position, along
with the counterarguments.
6. Technically, when taking the samples of two slips, this should be done by taking
one, writing it down, putting it back, then taking the next, writing it down, and putting it back. You would consider these two scores as one sample for which you figure a mean. The same applies for samples of three slips. This is called sam�pling with replacement. However, with 90 slips in the bowl, taking two or three slips at a time and putting them back will be a close enough approximation for this exercise and will save you some time.