300 words

profilejery.b
04CH_Tanner_Statistics.pdf

91

4Applying z to Groups

Victor Faile/Corbis

Chapter Learning Objectives: After reading this chapter, you should be able to do the following:

1. Describe the distribution of sample means.

2. Explain the central limit theorem.

3. Analyze the relationship between sample size and confidence in normality.

4. Calculate and explain z test results.

5. Explain statistical significance.

6. Calculate and explain confidence intervals.

7. Explain how decision errors can affect statistical analysis.

8. Calculate the z test using Excel.

tan82773_04_ch04_091-120.indd 91 3/3/16 2:30 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.1 Distribution of Sample Means

Introduction As we noted at the end of Chapter 3, researchers are generally more interested in groups than in individuals. Individuals can be highly variable, and what occurs with one is not necessarily a good indicator of what to expect from someone else. What occurs in groups, on the other hand, can be very helpful in understanding the nature of the entire popu- lation. A Google search indicates that the suicide rate is higher among dentists than it is among those of many other professions. If we wanted to experiment with some therapy designed to relieve depressive symptoms among dentists, we would be more confident observing how a group of 50 dentists responds than in examining results from just one. This chapter will use the material from the first three chapters to begin analyzing people in groups.

Noting that many of the characteristics that interest behavioral scientists are normally dis- tributed in a population implies that some characteristics are not. Since samples can never exactly emulate their populations, it may not be clear in the midst of a particular study when data are normally distributed. This uncertainty potentially poses a problem: we may wish to use the z transformation and Table B.1 of Appendix B in our analysis, but Table B.1 is based on the normality assumption. If the data are not normal, where does that leave the related analysis?

4.1 Distribution of Sample Means What options do researchers have if they are suspi- cious about data normality? One important answer is the distribution of sample means, so named because the scores that constitute the distribution are the means of samples rather than individual scores.

Note that the descriptor population means all pos- sible members of a defined group. Recall that the frequency distribution—the bell-shaped curve rep- resenting the population—was a figure based on the individual measures sampled one subject at a time. In discussing the frequency distribution, we assumed that we would measure each individual on some trait, and then plot each individual score. Instead of selecting each individual in a population one at a time, suppose a researcher

1. selects a group with a specified size; 2. calculates the sample mean (M) for each

group; 3. plots the value of M (rather than the value

of each score) in a frequency distribution; 4. and continues doing this until the popula-

tion is exhausted.

iStockphoto/Thinkstock

A population is all members of a defined group, such as all voters in a county.

tan82773_04_ch04_091-120.indd 92 3/3/16 2:30 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.1 Distribution of Sample Means

How would plotting group scores rather than individual ones affect the distribution? Would the end result still be a population? The answer to the second question is yes: because every member is included, it is still a population. Whether a population is measured individually or as members of a group is incidental, as long as all are included.

Perhaps researchers are interested in language development among young children and wish to measure mean length of utterance (MLU) in a county population. Whether the researchers mea- sure and plot MLU for each child in a county’s Head Start program or plot the mean MLU for every group of 25 in the program, the result is population data for Head Start learners for that county.

The Central Limit Theorem The answer to the question “how would the distribution be affected?” is a little more involved, but it is important to nearly everything we do in statistical analysis. It involves what is called the central limit theorem:

If a population is sampled an infinite number of times using sample size n and the mean (M) of each sample is determined, then the multiple M measures will take on the characteristics of a normal distribution, whether or not the original population of individuals is normal.

Take a minute to absorb this. A population of an infinite number of sample means drawn from one population will reflect a normal distribution whatever the nature of the original distribu- tion. A healthy skepticism prompts at least two questions: 1) How would we prove whether this is true since no one can gather an infinite number of samples? and 2) Why does sampling in groups rather than as individuals affect normality?

Although prove is too strong a word, we can at least provide evidence for the effect of the central limit theorem using an example. Perhaps a psychologist is working with 10 people on their resistance to change, their level of dogmatism. Technically, because 10 constitutes the entire group, the population is N 5 10. Recall that N refers to the number in a population. Even with a small population we cannot have an infinite number of samples, of course, but for the sake of the illustration we will assume that

• dogmatism scores are available for each of the 10 people; • the data are interval scale; • the scores range from 1 to 10; and • each person receives a different score.

So with N 5 10, the scores are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Figure 4.1 depicts a frequency distribu- tion of those 10 scores.

The distribution in Figure 4.1 is not normal. With R 5 10 21 5 9 and s 5 3.028 (a calculation worth checking), the distribution is extremely platykurtic (i.e., flatter than normal); the range is less than 3 times the value of the standard deviation rather than the approximately 6 times associated with normal distributions. There is either no mode or there are 10 modes, neither of which suggests normality. We can illustrate the workings of the central limit theorem with a procedure Diekhoff (1992) used. We will use samples of n 5 2, and make the example man- ageable by using one sample for each possible combination of scores in samples of n 5 2 from the population, rather than an infinite number of samples.

tan82773_04_ch04_091-120.indd 93 3/3/16 2:30 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

S c

o re

F re

q u

e n

c y

Score Values

10

9

8

7

6

5

4

3

2

1

10987654321

Section 4.1 Distribution of Sample Means

Table 4.1 lists all the possible combinations of two scores from values 1–10. Ninety combina- tions of the 10 dogmatism scores are possible. The larger the sample size, the more readily it demonstrates the tendency toward normality, but all combinations of (for example) three scores would result in a very large table.

Table 4.1: All possible combinations of the integers 1–10

1, 2 2, 1 3, 1 4, 1 5, 1 6, 1 7, 1 8, 1 9, 1 10, 1

1, 3 2, 3 3, 2 4, 2 5, 2 6, 2 7, 2 8, 2 9, 2 10, 2

1, 4 2, 4 3, 4 4, 3 5, 3 6, 3 7, 3 8, 3 9, 3 10, 3

1, 5 2, 5 3, 5 4, 5 5, 4 6, 4 7, 4 8, 4 9, 4 10, 4

1, 6 2, 6 3, 6 4, 6 5, 6 6, 5 7, 5 8, 5 9, 5 10, 5

1, 7 2, 7 3, 7 4, 7 5, 7 6, 7 7, 6 8, 6 9, 6 10, 6

1, 8 2, 8 3, 8 4, 8 5, 8 6, 8 7, 8 8, 7 9, 7 10, 7

1, 9 2, 9 3, 9 4, 9 5, 9 6, 9 7, 9 8, 9 9, 8 10, 8

1, 10 2, 10 3, 10 4, 10 5, 10 6, 10 7, 10 8, 10 9, 10 10, 9

Figure 4.1: A frequency distribution for the scores 1 through 10: Each score

occurring once

A frequency distribution of ten scores, each with a different value. This type of distribution, which is not normal, is highly platykurtic.

S c

o re

F re

q u

e n

c y

Score Values

10

9

8

7

6

5

4

3

2

1

10987654321

tan82773_04_ch04_091-120.indd 94 3/3/16 2:30 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

S c o

re F

re q

u e n

c y

Sample Means

10

9

8

7

6

5

4

3

2

1

6.05.55.04.54.03.53.02.5 9.08.58.07.57.0 9.56.52.01.5

Section 4.1 Distribution of Sample Means

Mean of the Distribution of Sample Means The symbol used for a population mean to this point, µ, is actually the symbol for a population mean formed from one score at a time. To distinguish between the mean of the population of individual scores and the mean of the population of sample means, we’ll subscript µ with an M: µM. This symbol indicates a population mean (µ) based on sample means (M).

With a distribution of just 90 sample means, Figure 4.2 shows nothing like an infinite num- ber, of course, but it is instructive nevertheless. The mean of the scores 1 through 10 is 5.5 (µ 5 5.5). Study Figure 4.2 for a moment. What is the mean of that distribution? The mean of our distribution of sample means is also 5.5: µM 5 5.5. The point is this: When the same data are used to create two distributions—one a population based on individual scores and the other a distribution of sample means—the two popula- tion means will have the same value, or, symbolically stated: µ 5 µM.

Describing the distribution as “normal” is a stretch, but Figure 4.2 is certainly more normal than Figure 4.1. For one thing, rather than the perfectly flat distribution that occurs when all the scores have the same frequency,

Try It!: #1 Why is there less variability in the distri- bution of sample means than in a distribu- tion of individual scores?

For each possible pair of scores, if we calculate a mean and plot the value in a frequency dis- tribution as a test of the central limit theorem, the result is Figure 4.2. Because the entire distribution is based on sample means, Figure 4.2 is a distribution of sample means. Strictly speaking, this distribution is not normal, but although based on precisely the same data, Fig- ure 4.2’s distribution is a good deal more normal than the distribution in Figure 4.1.

Figure 4.2: A frequency distribution of the means of all possible pairs of scores

1 through 10

A distribution of the means of each possible pair of scores with values between 1 and 10. This distribution is not normal, but has more normality than the distribution shown in Figure 4.1.

S c o

re F

re q

u e n

c y

Sample Means

10

9

8

7

6

5

4

3

2

1

6.05.55.04.54.03.53.02.5 9.08.58.07.57.0 9.56.52.01.5

tan82773_04_ch04_091-120.indd 95 3/3/16 2:30 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.1 Distribution of Sample Means

some scores have greater frequency than others. The sample means near the middle of the distribution in Figure 4.2 occur more frequently than the sample means at either the extreme right or left.

Why are extreme scores less likely than scores near the middle of the distribution? It is because many combinations of scores can produce the mean values in the middle of the distribution, but comparatively few combinations can produce the values in the tails of the distribution. With repetitive sampling, the mean scores that can be produced by multiple combinations increase in frequency and the more extreme scores occur only occasionally, which the next section illustrates.

Variability in the Distribution of Sample Means In the original distribution of 10 scores (Figure 4.1), what is the probability that someone could randomly select one score (x) that happens to have a value of 1? Because there are 10 scores, and just one score of 1, the probability is p 5 1/10 5 0.1, right? By the same token, what is the probability of selecting x 5 10? It is the same, p 5 0.1.

Moving to the distribution based on 90 scores (Figure 4.2), what is the probability of select- ing a sample of n 5 2 that will have M 5 1.0? Is there any probability of selecting two scores out of the 10 that will have M 5 1.0? Because there is only one value of 1, there is no way to select two values with M 5 1.0. As soon as a score of 1 is averaged with any other score in the group, all of which are greater than one, the result is M . 1. That is why the lowest possible mean score in Figure 4.2 is 1.5, which can only occur when 1 and 2 are in the same sample.

The same thing occurs in the upper end of the distribution. The probability of selecting a group of n 5 2 with M 5 10 is also zero (p 5 0) because all other scores have lower values than 10. For the 90 possible combinations, the highest possible mean score is 9.5, which can occur only when the 10 and the 9 happen to be in the same sample.

The point is that variability in group scores is always less than the variability in individual scores. A related point is that the impact of the most extreme scores in a distribution dimin- ishes when they are included in samples with less extreme scores. Applied, these principles mean that a researcher examining, for example, problem-solving ability among a group of subjects can afford to be less concerned about the impact of one extremely low or one extremely high score as the size of the group increases. Larger group sizes minimize the effect of extreme scores.

Standard Error of the Mean Recall that the sigma, σ, indicates a population’s standard deviation. Specifically, σ indicates a standard deviation from a population of individual scores. The symbol for the standard devia- tion in a distribution of sample means is σM and as the subscript M suggests, it measures vari- ability among the sample means. The formal name for σM is the standard error of the mean.

In the language of statistics, error as in standard error of the mean refers to unexplained vari- ability. As we move through the different procedures, we will calculate other standard errors, which all have this in common: all are measures of unexplained data variability.

tan82773_04_ch04_091-120.indd 96 3/3/16 2:30 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.1 Distribution of Sample Means

Earlier, we noted that whether charting the distribution of individual scores or the distribu- tion of sample means, the means of the two distributions will always be equal: µ 5 µM. Is it logical to expect the same from the measures of variability; in other words, will σ 5 σM? The fact that the distribution of individual scores always has more variability than the distribu- tion of sample means answers this question. Symbolically speaking, σ . σM, something that the dogmatism data show.

The standard deviation of the 10 original scores (1, 2, 3, 4, 5, 6, 7, 8, 9, 10) is σ 5 2.872. Note that this instance and the calculation of the standard error of the mean just below deal with populations and the formula must, therefore, employ N, rather than n 2 1. Elsewhere in the book, however, the formula will always be n 2 1.

The standard error of the mean can be calculated by taking the standard deviation of the mean scores of each of those 90 samples which constituted the distribution of sample means. The calculation is a little laborious, and happily, not a pattern that must be followed later, but the value is σM 5 1.915.

So, as predicted, σ has a larger value than σM. The smaller value for σM reflects the way less extreme scores moderate the more extreme scores when they occur in the same sample.

Sampling Error Although the standard error of the mean does not per se refer to a mistake, another kind of error, sampling error, does. In inferential statistics, samples are important for what they reveal about populations. However, information from the sample is helpful for drawing infer- ences only when the sample accurately represents the population. The degree to which the sample does not represent the population is the degree of sampling error.

Samples reflect the population with the greatest fidelity when two prerequisites are met: 1) the sample must be relatively large, and 2) the sample must be based on random selection.

The safety of large samples is explained by the law of large numbers. According to this mathematical principle, as a proportion of the whole, errors diminish as the number of data points increases. The potential for serious sampling error diminishes as the size of the sample grows. The text earlier referred to this principle in noting that the distorting effect of extreme scores diminishes as sample size grows.

Random selection refers to a situation where every member of the population has an equal probability of being selected. Random selection contrasts with what are called convenience samples, samples that are used intact because they are handy. A sociology professor who uses the students in a particular section of his class is relying on a type of convenience sample known as a nonrandom sample.

A random sample of n 5 5 could be created from the 10 people being treated for dogmatic behavior by assigning each person a number, placing the 10 numbers into a paper bag, shak- ing the bag well, and without looking, drawing out five numbers.

The result would be a randomly selected sample. When randomly selected, samples differ from populations only by chance. They will differ, of course, but the differences are less and less important as sample size grows.

tan82773_04_ch04_091-120.indd 97 3/3/16 2:30 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.2 The z Test

If the sample should fail to capture some important characteristic of the population other than its size, the problem is sampling error. The impor- tant characteristic might be the mean, for example, and when M ? µ, sam- pling error has occurred. In fact, some sampling error always occurs because a sample can never exactly duplicate all the descriptive characteristics of the population, but sampling error will usually be minor if samples are rela- tively large and randomly selected.

Statistical analysis procedures toler- ate minor, random sampling error, but systematic sampling error is another matter. Systematic sampling error occurs when the same mistake is made time after time.

In 1936, the publishers of Literary Digest, a prominent publication of the time, decided to predict the outcome of that year’s presidential election in the United States. To ensure that sample size would not pose a problem, they sent out millions of postcards to registered vot- ers. Literary Digest seemed to at least have met the requirement for a relatively large sample, because the Harris and Gallup polling organizations typically obtain very accurate results with a few thousand, and sometimes just a few hundred, responses. Unfortunately, the pub- lishers decided to rely on telephone books and automobile registrations to locate poll recipi- ents. Consider the historical setting. At the height of the Great Depression, two indicators of relative prosperity identified voters: a telephone in the home and a currently registered car. The study proved disastrous for the magazine’s reputation. Poll results indicated that Alf Landon would win, but of course Franklin Roosevelt was elected in a landslide to a second term, carrying every state in the union except Maine and Vermont.

The problem was systematic sampling error. The voters were consistently and nonrandomly selected from groups not representative of the entire population. If they had been randomly selected, chances are that with the large sample size, the study would have predicted the election results accurately, but the sample size alone was not enough to compensate for the error.

4.2 The z Test The distribution of sample means is a distribution based not on individual scores but on the means of samples of the same size repeatedly drawn from a population. The central limit theorem assures that such a distribution will be normal. Consequently, if the z score formula from Chapter 3 is adjusted to accommodate groups rather than individual scores, Table B.1 answers all the same questions about groups that it did about individuals in Chapter 3.

National Atlas

Systematic sampling error provides results drastically different from the actual outcome. Such a sampling error occurred before the 1936 presidential election, when a study predicted a win by Alf Landon. The election results, displayed in the map, were drastically different.

tan82773_04_ch04_091-120.indd 98 3/3/16 2:30 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.2 The z Test

Recall that the z score formula (3.1) had the following form:

z 5 x 2 M

s

If the following substitutions are made:

• M for x, so that the focus is on a sample mean rather than on an individual score; • µM for M to shift from the sample mean to the mean of the distribution of sample

means; • σM for σ so that the measure of variability is for the distribution rather than the

sample;

then the result is the z test:

Formula 4.1

z 5 M 2 μM σM

The z test produces a z value for groups rather than individual scores. Just as it did for indi- vidual scores, z indicates how distant a particular sample mean is from the mean of the dis- tribution of sample means.

Note the similarities in Formulas 3.1 and 4.1:

• Both formulas produce values of z. • Both numerators call for subtractions that result in difference scores. • Both denominators measure data variability.

Calculating the z Test When calculating z scores, as shown in Chapter 3, everything that is needed (x, M, and s) can be determined from the sample:

z 5 x 2 M

s

Values needed for the z test, however, are often not as easy to determine. Because µM 5 µ, one of those two parameters must be provided. The standard error of the mean (σM) can also present a problem. No one wishing to complete a z test is going to have the mean scores for that infinite number of samples that make up the distribution of sample means. So calculating the standard deviation of those means, which is what the standard error of the mean repre- sents, is not an option. It is possible, however, to determine the value of the standard error of the mean without having to calculate a standard deviation for an indeterminate number of scores. If a researcher does not have σM (the standard error of the mean is . . .) but does know the population standard deviation (σ), σM can be found as follows:

Formula 4.2

σM 5 σ √N

tan82773_04_ch04_091-120.indd 99 3/3/16 2:30 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.2 The z Test

where

σM 5 the standard error of the mean

σ 5 the population standard deviation

N 5 the number in the group

So for a group of 100 with the value of σ as 15, then σM is

σM 5 σ √N

σM 5 15

√100 5

15 10

5 1.5

This approach affords only a partial solution to the standard error of the mean, however, because it still requires at least σ.

Chapter 5 will explain a way around the problem of determining σM, but in the meantime, consider the following example. A marriage and family counselor has access to some national data on the frequency of negative verbal comments exchanged between couples in troubled marriages. The counselor finds the following:

• Couples in troubled marriages tend to have 11 negative exchanges per week, with a standard deviation of 4.755.

• A study of 45 couples who have filed for divorce in the area where the counselor has her practice reveals that the mean number of negative comments per week is 12.865.

• Given the national data, the counselor wants to know the probability that a ran- domly selected group of couples from that population will have as many negative exchanges as the counselors’ clients, or more.

Although the counselor’s question is about groups rather than individuals, the problem is much like a z score problem. The counselor knows the following: µ and (because µM 5 µ) µM 5 11.0, σ 5 4.755, N 5 45, and M 5 12.865.

The standard error of the mean is

σM 5 σ √N

5 4.755 √45

5 0.709

And z is

z 5 M 2 μM σM

5 12.865 2 11.0

0.709 5 2.630

Comparing M to µM indicates that the counselor’s group has a higher number of negative verbal exchanges per week than the number nationally among couples with troubled marriages: 12.865 is a higher value than 11.0. In the following section, we will discuss what else can be determined from the analysis.

Jeffrey Hamilton/DigitalVision/Thinkstock

Marriage and family counselors can use existing data to help them learn more about their clients.

tan82773_04_ch04_091-120.indd 100 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

–3 +30–1–2 +1 +2

z=2.63

z value

p=0.0043p=0.4957p=0.5

Section 4.2 The z Test

Note some important differences between this z and those calculated in Chapter 3.

• The difference between the mean of the population (µM 5 11.0) and the sample mean (M 5 12.865) is really quite modest, but the z value (z 5 2.630) is compara- tively extreme. Recall that ±2z includes 95% of the distribution, and z 5 2.630 is substantially beyond that.

• The reason for the rather large value of z is the quite small standard error of the mean, 0.709. That value reminds us that variability in populations based on groups is small compared to variability based on individual scores, and it does not take much of a difference between the sample mean (M) and the mean of the distribution of sample means (µM) to produce an extreme value of z.

Another z Test To evaluate the impact of group therapy on a group of juvenile offenders, a psychologist con- structs a study in which the level of social alienation among juvenile offenders attending

Interpreting the Value from the z Test The result from the marriage and family counselor’s group in the previous section is a value of z just like those that were calculated in Chapter 3, except that the value indicates how much a sample mean (M) differs from the mean of a population of samples (µM),instead of how an individual (x) differs from either a sample mean (M) or a population mean (µ). The Table B.1 value indicates that 0.4957 out of 0.5 occurs between a value for z 5 2.63 and the mean of the distribution. So among the population of couples with troubled marriages, 49.57% will have negative verbal exchanges somewhere between the level of this group (12.865 per week) and the mean of the national population, 11.0 per week. But the question concerns the probability that a group of clients selected at random would have 12.865 negative comments per week, or more. Because 49.57% will have 12.865 or fewer negative exchanges per week, just 0.43% (50% 2 49.57%) will have 12.865 negative comments per week or more. Stated as a prob- ability, p 5 0.0043 that a group of individuals in troubled marriages will have 12.865 negative exchanges per week or more.

Figure 4.3 depicts this result.

Figure 4.3: The probability of selecting a sample with M = 12.865 or higher

from a population with µM = 11.0

The probability of selecting a sample with M 5 12.865 or higher is indicated by determining the z equivalent of a sample with M 5 12.865 and then determining the proportion of the distribution at that point and higher in the population. The proportion is indicated in red.

–3 +30–1–2 +1 +2

z=2.63

z value

p=0.0043p=0.4957p=0.5

tan82773_04_ch04_091-120.indd 101 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.3 Statistical Significance

court-mandated counseling sessions is compared to the level of social alienation measured in a national sample. For a group of 15 juveniles who are the psychologist’s clients, social alienation (SA) after 6 months of counseling has M 5 13.554. Nationally, SA is μ 5 14.500 with σ 5 2.734. What percentage of all randomly selected groups of juvenile offenders will have mean levels of SA 13.554 or lower? The psychologist knows that µ, and there- fore, µM 5 14.500, σ 5 2.734, N 5 15, and M 5 13.554. First, calculate the standard error of the mean:

σM = σ/√N 5 2.734/√15 5 0.706

Then determine the value of z:

z M 2 μM σM

5 13.554 2 14.500

0.706 5 21.340

The table value for z 5 21.3450 is 0.4099.

The researcher wants to know what percentage of all juvenile offender groups will have mean SA scores 13.554 or lower. Since the national mean SA for juvenile offenders is 14.500, we know that 50% of all groups will be 14.500 or lower. The proportion with 13.554 or lower is 0.5 minus the table value for z 5 21.340: 0.5 2 0.4099 5 0.0901. Multiplying that proportion by 100 will indicate the percentage at or below that point: 0.0901 3 100 5 9.01%.

It looks as though the counselor’s therapy sessions are reasonably effective. This group mani- fests a level of social alienation considerably lower than what is represented in the national population of juvenile offenders. Is the result just a chance outcome? How could the researcher know?

4.3 Statistical Significance Like the z score problems in Chapter 3, the z test is a ratio of the difference (M 2 μM in the numerator) compared to data variability (σM in the denominator). A large ratio indicates that the score (in the z score problem) or the sample mean (in the case of the z test) is quite distant from the mean to which it is compared.

With increasing values of z, is there a point at which the sample mean (M) becomes so differ- ent from the mean of the distribution of sample means (μM) that a researcher should conclude that the sample mean is more characteristic of some distribution other than the one to which it is compared? In the z test, when the sample is more characteristic of some other population rather than the one to which it is compared, it is statistically significant. To put it another way, a statistically significant result is one that is extreme enough—sufficiently distant from that to which it is compared—that it is unlikely to have occurred by chance.

In the first z test problem, we proceeded as though the sample of those who had filed for divorce was a subgroup of all couples with troubled marriages. What if the sample is actually more characteristic of some other distribution, say, a population of couples for whom divorce is imminent? Can large values of z reflect the fact that the sample actually represents a popu- lation different from the one to which it was compared?

tan82773_04_ch04_091-120.indd 102 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.3 Statistical Significance

Consider another example before we answer this question. Those in a college honors program are probably adults. If researchers are interested in studying intelligence, would it be reason- able to expect that the members of this group represent what is characteristic of all adults? From the standpoint of age (and in the absence of child prodigies), those honors students are probably all adults, but in terms of intelligence, they probably are not typical. Perhaps they are more representative of the population of intellectually gifted adults than of adults in general.

The individuals in every sample belong to many different populations. The couples on the verge of divorce belong to

• the population of married people; • the population of adults; • the population of adults in the particular state; • the population of adults in the particular county; • the population of couples with troubled marriages, and so on.

One of the questions the z test helps answer is whether a particular sample is most character- istic of the population to which it is compared, or whether the sample is more like some other population. The magnitude of the z value is the key to the answer.

Statistical Significance and Probability In the case of the z test, an outcome is statistically significant when it meets these conditions:

• It is so unlike the population to which it is compared that statistically, at least, it represents some other population.

• The random selection of a sample from the population with the particular value of μM would almost always result in a less extreme difference between M and µM.

So, at what point is an outcome nonrandom? Fisher (1925), who created the term statistically significant, made the answer a matter of probability. If the probability that an outcome (in our case, the value of z) occurred by chance is p 5 0.05 or less, the outcome is probably not random; it is statistically significant.

Although p 5 0.05 is probably the most common, other probability levels have also been used to indicate statistical significance. Reviewing journal articles indicates statistical testing done at p 5 0.01, p 5 0.001, and occasionally, even p 5 0.1. It is up to the person doing the analy- sis to state the level chosen to indicate statistical significance (before conducting the test, by the way). Because we can use the z test and the z score table to calculate the probability of an occurrence (in addition to the other things we can do to determine the percentage of the population above a point, below a point, and between points), we can also use the table to determine whether an outcome is statistically significant. In the first z test we completed, we compared the mean number of negative verbal exchanges in a sample of couples on the verge of divorce to the mean level of negative exchanges among those identified as the population of couples with “troubled” marriages and found that z 5 2.630. The table value indicates that the probability of randomly selecting a sample of couples that would have M 5 12.865 or more negative verbal exchanges per week was p 5 0.0043. At less than p 5 0.05, that outcome is unlikely to have occurred by chance. It is statistically significant.

tan82773_04_ch04_091-120.indd 103 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

–3 +30–1–2 +1 +2

z=1.340 p=0.0901

z value

p=0.4090p=0.5

Section 4.3 Statistical Significance

The second z test dealt with how the group of juveniles in therapy compared to a national population of juve- nile offenders. For that problem, z 5 21.340, and the table value for that z was 0.4099. We determined that the probability of a group scoring M 5 13.554 or lower was p 5 0.0901 (shown in Figure 4.4).

Figure 4.4: The probability of selecting a sample with social alienation scores

of M = 13.554 or lower from a population with mean SA of µM = 14.5

Visual representation of the probability of selecting a random sample of individuals with social alienation scores of 13.554 or lower, when the population mean is 14.5.

–3 +30–1–2 +1 +2

z=1.340 p=0.0901

z value

p=0.4090p=0.5

Try It!: #2 What does the term statistically significant mean?

Determining Significance Without the Table Remember that 6z 5 1.0 includes about 68% of the z distribution, so the probability of ran- domly selecting an outcome that occurs in the 6z 5 1.0 area is p 5 0.68. Nothing in that region is going to be statistically significant because those z values indicate results that are characteristic of the distribution as a whole. The uncharacteristic events are the significant ones, and Fisher’s standard of p 5 0.05 indicates that the key is a z value that excludes only the most extreme 5% of the distribution.

Recall that normal distributions are symmetrical. That 5% exclusion means that the most extreme 2.5% of outcomes in the lower tail and the most extreme 2.5% of outcomes in the upper tail are the regions that include statistically significant outcomes. Because Table B.1 provides proportions for only the upper half of the distribution, the z value, which includes all but the extreme 2.5% of outcomes, will be the point at which results become statistically significant. If 2.5% needs to be the percentage excluded, 47.5% is the percentage included. As a proportion, 47.5% is expressed as 0.475.

• From Table B.1, find the z value which includes 0.475 from that point to the mean of the distribution.

• Because z 5 1.96 includes 0.475 of the distribution, ± that value will include 0.95 of the distribution (2 3 0.475 5 0.95).

• Any time a z test produces a z 5 61.96 or greater, the result is statistically signifi- cant at p 5 0.05.

For example, if the registrar at a university had a group of students applying for admission to a graduate program, and the admissions test scores for that group resulted in a value of, say, z 5 1.98 compared to all applicants, the registrar would know straight away that those students have scores significantly greater than those to whom they were compared. Such a difference is not likely to be an artifact of sampling variability.

tan82773_04_ch04_091-120.indd 104 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.3 Statistical Significance

Apply It! Confidence in the Claim

A parent is looking at private high schools for his child. A particular high school claims that last year, its students scored above average on the math and verbal sections of the SAT. The parent, who knows something about statistical analysis, decides to test this claim.

The parent finds the nationwide results for last year’s SAT scores. The mean math SAT score was 500, with a standard deviation of 100. The mean verbal score was also 500, with a stan-

dard deviation of 100. The parent asks to see the high school’s study. According to the study, the high school looked at SAT scores from a random sample of 40 students for that same year. The mean math score was 515 and the mean verbal score was 510.

The parent intends to test whether the high school scores represent a significantly different population from the national scores. The z test will provide an answer. If the value of z could occur by chance with a probability p 5 0.05 or less, the parent will view the scores from the particular high school as something that prob-

ably was not a random outcome, an artifact of sampling variability. Since the math scores were the most different from the population scores, the parent examines them first.

Math Scores

µ, and therefore, µM 5 500

σ 5 100 N 5 40 M 5 515 Calculate the standard error of the mean:

σM = σ/√N 5 100/√40 5 15.811 Then determine the value of z

z 5 M 2 µM σM

5 515 2 500

15.811 5 0.949

The table value for z 5 0.949 is 0.3289.

The probability of scoring 515 or higher is 0.5 2 0.3289 5 0.1711, which is substantially greater than the 0.05 that indicates a statistically significant result. Although these particu- lar high-schoolers scored higher on the SAT-math than the national population—the differ- ence between the sample and population means answers that question—they did not score significantly higher. This much difference could have occurred by chance. The parent’s best explanation for the difference is sampling variability rather than any systematic performance at this high school that is better than students are performing nationally. Since the math scores manifested the greater difference, the parent finds little use in calculating the outcome for the verbal scores. There too, the result will not be statistically significant.

Jack Hollingsworth/Thinkstock

Apply It! boxes written by Shawn Murphy

tan82773_04_ch04_091-120.indd 105 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.3 Statistical Significance

Another View of Significance The z 5 61.96 indicator for statistical significance assumes that the most extreme 5% of outcomes are statistically significant. There are alternatives. A p 5 0.01 standard excludes the most extreme 1% of outcomes and p 5 0.001 excludes the most extreme 0.1%. All of these standards for statistical significance are somewhat arbitrary. Fisher selected a point on the distribution and said essentially, “Anything beyond this level of probability is unlikely to have occurred by chance.” Not everyone agrees such a standard must exist. Dr. William Smith, who chaired the department of statistics at Texas A&M University during the author’s student days, took the position that what is “significant” depends upon circumstances. His approach was to calculate the probability that an event could occur by chance, and then let consumers make their own decision about whether such an outcome is significant.

Dr. Smith is in good company. Rather than indicating that a result is or is not statistically sig- nificant, many of the statistical packages that professionals use today simply determine the probability than an event could have occurred by chance, and leave it to the person doing the analysis to interpret the outcome. Fisher’s position fills the need to have a standard for statis- tical significance in the absence of some other criterion, but the researcher may at times have a better sense than p 5 0.05 of what is worthy of note.

Sampling Error as an Explanation of Difference Virtually every z test will have some difference between M and µM, which means that z will have some value other than 0. When the differences fall short of statistical significance (z , 1.96), how can the researcher explain them? The answer could be sampling error. Because no sam- ple can exactly emulate the population, most samples in the distribution of sample means will have a sample mean different from the population mean. In the SAT example, those from the particular high school the parent was examining scored better than the national population, but the difference was not large enough to be statistically significant. Such a difference might reflect the fact that those selected for the sample group just happened to be generally above the mean of the distribution. In the example concerning the number of negative exchanges in marriage, some of the difference may also be due to sampling error, but that factor alone is insufficient to explain the difference between M and µM.

More Confidence in the Sample The foregoing discussion underscores the importance of having confidence in the sample from the start. Even though samples can never mirror populations exactly, large, randomly selected samples minimize sampling error. However, it can be difficult to define large.

Consider the following example. An instructor wants to gauge the impact that a service learning course has on students’ attitudes toward community service. The university research office has surveyed students’ interest in service learning and from the scores on the instrument has determined a standard deviation of 8.294. While recognizing that the sample will not have the same characteristics as the population, the instructor decides that the results will still be very helpful if the variability in the sample data digress from uni- versity-wide data by no more than 2 points. The instructor wishes to be 0.95 confident of the result.

tan82773_04_ch04_091-120.indd 106 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.3 Statistical Significance

Those two criteria, the amount of variation from the population and the required level of confidence, provide the parameters for determining minimum sample size (Sprinthall, 2000). Sprinthall’s formula follows:

Formula 4.3

n 5 ¢ (z)(σ)

variation from σ≤ 2

where

n 5 the required sample size

z 5 the value of z that corresponds to how certain of the result a researcher wishes to be. Because 6z 5 1.96 includes the middle 95% of the distribution, using that value in the formula provides p 5 0.95 that the sample emulates the population. If 0.99 certainty is required, z 5 2.58.

σ 5 the standard deviation of the population. If the population standard deviation is unavailable, a sample standard deviation (s) can be substituted, although the estimate will lose some precision.

variation from σ 5 the amount a researcher is willing to allow the sample standard deviation, s, to vary from the population standard deviation, σ.

Using this formula in our example results in σ 5 8.294 and z 5 1.96, so

n 5 ¢ (1.96)(8.294)

2.0 ≤

2

5 approximately 66

With p 5 0.95, which is the level of confidence required, a random sample of at least 66 people will provide a sample whose characteristics are within 2 points of the population standard deviation of students’ interest in service learning.

Changing the conditions can dramatically affect the required sample size. If the instructor needs to be within one point of the population standard deviation and desires p 5 0.99, note the impact on the result:

n 5 ¢ (2.58)(8.294)

1.0 ≤

2

5 approximately 458 people

The example indicates the impact on sample size of requiring both less variation from the population and more certainty. It indicates the constant tension between how certain and how precise we need to be on the one hand and the size of the needed sample on the other, and the results can be dramatic. Increasing the level of certainty or requiring less error both require larger sample sizes, but at least Formula 4.3 can help strike a balance between the two. Samples that are very large can be time-consuming and expensive for researchers. Sam- ples that are very small may not reflect the essential characteristics of the population, making generalizing the results a problem.

Try It!: #3 If a result is not statistically significant, how is the difference between M and µM explained?

tan82773_04_ch04_091-120.indd 107 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.3 Statistical Significance

Decision Errors Statistical significance is based on the probability that an event could occur by chance, and interpreting outcomes based on probabilities carries a risk. Consider the following possibili- ties related to the chapter’s examples:

• Is it not possible, however unlikely, that a researcher could accidentally sample the couples in the distribution who have the most negative exchanges? Maybe they do not belong to a distinct population at all. Maybe these couples are just from the most extreme portion of the population of all married couples.

• On the other hand, is it not also possible that those juvenile offenders in the group- therapy study belonged to a population of juvenile offenders with significantly lower social alienation scores, but because they actually had extremely high SA scores before the therapy, their improvement brought them to just slightly lower, rather than significantly lower than the national mean?

Any statistical decision involves the risk of one or the other—but never both—in the same analysis. Because statistical decisions are based on probabilities rather than certainties, any statistical decision based on a probability can result in a decision error. There are two types of decision errors, and they are mutually exclusive.

Type I Errors Type I errors in statistical testing occur when an outcome is initially determined to be statis- tically significant, but further research and testing indicate that it is not. In other words, the first, errant conclusion is an anomaly that fails to hold up under further scrutiny. The prob- ability of this error is defined by the level at which the testing occurs. If the criterion for statis- tical significance is p 5 0.05, and the result is deemed statistically significant, the probability of a type I error is 0.05. Because type I error is also called alpha (α) error, the significance level of a test is sometimes noted in terms of the risk of alpha error, α 5 0.05, rather than p 5 0.05. It means the same thing, except that the author has chosen to indicate the probability of type I error rather than referring directly to the criterion for statistical significance. At p 5 0.05, or α 5 0.05, for every 100 times someone concludes that a result is statistically significant, a type I error will occur an average of 5 times.

Although those most extreme outcomes are the least likely to occur, that most extreme 5% of the distribution is still part of the distribution in question. Outcomes in that area of the distribution hold the greatest potential for a type I error. Two other items may be noted regarding type I error:

• The only time a type I error is possible is when a result is deemed statistically significant. If there is no statistically significant outcome, the probability of a type I (alpha) error is zero.

• In a particular significant finding, a researcher has no way to know whether a type I error has occurred. Gathering new data and repeating the analysis is the only way to check, which is why replication studies are so important.

Type II Errors In a z test, type II errors occur when the sample is actually characteristic of some popula- tion other than the distribution of sample means to which it was compared, but the statistical

tan82773_04_ch04_091-120.indd 108 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.3 Statistical Significance

testing (z , 1.96) suggests no significant difference. This type of decision error is also called a beta (ß) error.

Type II errors would present little problem if the populations involved were completely sepa- rate, but they often have important similarities. The population of all high school students taking the SAT probably bears a number of similarities to the population of high-achieving students. The more the populations overlap, the more likely decision errors become.

Although the level at which the statistical test is conducted (often p 5 0.05) defines the likeli- hood of a type I error, the probability of a type II error is more elusive, and in fact we never know the exact probability of committing this error, although some statistical tests are more prone to it than others. Note also:

• The only time a type II error can occur is when a result is determined not to be statistically significant.

• In a particular analysis where the result appears to be not significant, a researcher has no way to know whether a decision has resulted in a type II error.

Decision Errors and Power Is one error more damaging than the other? Do analysts have a preference for one type of error? The answer, of course, depends upon circumstances and especially on the impact that a decision error has on the people involved.

Perhaps a committee is evaluating certification pro- grams for mental health professionals, and it deems the program at University A to be significantly better than the competing programs. If the result is that the graduates from University A receive preferential hiring, but the difference among programs is not statistically significant after all, the study has made a type I error.

On the other hand, perhaps a client has a serious illness and comes to a health profes- sional for a diagnosis. If the health professional fails to recognize that the client is not healthy and misses the condition that is affecting the client’s well-being, a type II error has occurred.

In short, which kind of error is more serious depends upon circumstances, but statisticians may have their own preference. Power in statistical testing is described in terms of the likeli- hood of a type II error. The most powerful tests are associated with the fewest beta errors. The power of a statistical test is symbolically indicated as 1 2 ß.

Which Type of Errors to Use Although type I and type II errors cannot both occur in the same analysis, the probability of one affects the likelihood of the other. Mental-health professionals ordinarily must pass some sort of licensing requirement, perhaps in the form of a test. Like most professional licensing tests, it is most likely a good, but certainly not perfect, indicator of who is competent. Fig- ure 4.5 shows the possible decision errors.

Try It!: #4 An analysis results in a finding that is sta- tistically significant at p 5 0.05. What is the probability of a type II error?

tan82773_04_ch04_091-120.indd 109 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Decision Errors

Type I Error Type II Error

Test results suggest not competent, but the individual

actually is competent.

Test results suggest competence but the individual actually lacks

the skills/knowledge.

Section 4.4 The Confidence Interval

If type I error is thought to be the greater problem, the licensing body might simply raise the required test score. This tactic would probably reduce the number of incompetent people who are licensed. The companion problem it creates, however, is excluding more profession- als who actually are competent but because they do not test well fail to demonstrate their competence on the required measure—a type II error. This inherent connection between the two kinds of decision errors is why someone has to make a decision about which is the more damaging.

In testing professionals like airline pilots and surgeons, the decision is straightforward. Usu- ally it favors excluding some who are competent (therefore committing a type II error) rather than risk licensing some who are not competent (committing a type I error). The potential cost to the well-being of others is too great to do otherwise. In other circumstances, the great- est good is less clear.

4.4 The Confidence Interval When the results from a z test are statistically significant, the sample best represents a popu- lation other than the one to which it was compared. In these instances, the mean of the sample (M) is in fact an estimate of the value of that other population mean. Because it is a discrete value, M is called a point estimate of the population mean µM. For a variety of reasons related to the nature of the sample and the way it was selected, M might not be a very accurate point estimate of that other population mean, however. The confidence interval (CI) provides a way to determine how precisely M estimates µM.

When a z value is significant, the result indicates that the sample represents a population other than the one to which it was compared. The confidence interval allows us to estimate the mean of that other population, or at least a range of values within which µM is likely to

Figure 4.5: Types of decision errors by outcome

The flowchart depicts the path of each type of decision error.

Decision Errors

Type I Error Type II Error

Test results suggest not competent, but the individual

actually is competent.

Test results suggest competence but the individual actually lacks

the skills/knowledge.

tan82773_04_ch04_091-120.indd 110 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.4 The Confidence Interval

occur. If the z test is not significant, we have no need for the confidence interval because our analysis indicates that the sample belongs to the population to which it was compared. Calcu- lating the confidence interval requires values from the z test. The formula is

Formula 4.4

CI 5 6z(σM) 1 M

where

CI 5 the interval within which the population mean is expected to occur

z 5 the table value that reflects the level at which the z testing was conducted. For p 5 0.05, z 5 1.96.

σM 5 the value of the standard error of the mean from the z test

M 5 the value of the sample mean

For the study of couples engaged in negative verbal exchanges, the result was significant, indi- cating that the sample probably belongs to some distribution of sample means other than the one to which it was compared. The confidence interval will establish a range of values within which the mean for that other population probably occurs.

CI 5 6z(σM) 1 M

CI 5 61.96(0.709) 1 12.865

CI 5 61.390 1 12.865 5 11.475, 14.255

With a probability of 0.95, the population which the sample represents has a mean (µM) value somewhere between 11.475 and 14.255.

Note that the level of probability is one of the factors affecting the size of the confidence interval. If we wish to be more certain of capturing the population mean, we can use a 0.99 confidence interval instead of 0.95, and substitute z 5 2.58 for z 5 1.96 in the formula. Recal- culating the confidence interval for p 5 0.99,

CI 5 6z(σM) 1 M

CI 5 62.58(0.709) 1 12.865

CI 5 61.829 1 12.865 5 11.036, 14.694

A greater level of certainty of capturing the true mean of the distribution represented by the sample requires a wider confidence interval.

The other factor that affects the width of the confidence interval is the standard error of the mean, σM, which measures the amount of variability in the distribution of sample means. More data variability translates into a larger standard error of the mean, which makes the confidence interval larger.

Try It!: #5 What does the confidence interval for z determine?

tan82773_04_ch04_091-120.indd 111 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.4 The Confidence Interval

Researchers have no need for a confidence interval unless the z test results are statistically significant. The reason can be illustrated by completing a confidence interval for the social alienation (SA) problem. Recall that σM 5 0.706 and M 5 13.554 for that example.

CI 5 6z(σM) 1 M

CI 5 61.96(0.706) 1 13.554

CI 5 61.384 1 13.554 5 12.170, 14.938

Note that this confidence interval includes within its range the value of the original popula- tion, 14.500. That is because with a nonsignificant z value, a researcher would conclude that the population that the sample represented is likely the same population to which it was com- pared. A nonsignificant z test value will always produce a confidence interval that includes the original population mean.

Apply It! How Long is Too Long?

A psychologist working with the military has studied post-traumatic stress disorder (PTSD) and is convinced of the relationship between length of deployment and the incidence of PTSD. The currently used scale for PTSD has been normed on service personnel and yields μ 5 54.375 with σ 5 6.037 for all who have been deployed up to 60 days. Using data for a sample of 50 individuals who have all been deployed for at least 4 months, the psychologist has determined that M 5 56.989. The psychologist wants to know whether this particular sample still repre- sents the population of service personnel deployed up to 60 days. Anxious about a type I error, or what is sometimes called a false- positive error, the psychologist decides to test at α 5 0.01.

µ and therefore µM 5 54.375 σ 5 6.037 n 5 50 The sample mean, M 5 56.989

First, using σ and n, the psychologist must calculate the standard error of the mean:

σM 5 σ/√n 5 6.037/√50 5 0.854

Then to determine z:

z 5 (M 2 μM)/σM 5 (56.989 2 54.375)/0.854 5 3.061

Recall that the standard for statistical significance at α 5 0.01 is an absolute value of z of 2.58. We say “absolute value” because at this point, the issue is not whether the sample manifests significantly more PTSD or significantly less; the question is: does the sample represent the population? Had the value of z been 23.061, the psychologist would draw the same conclusion about whether the sample represents the population to which it was compared: that is, that it does not. In the case of a negative z value for the sample compared to the population, the psychologist would have concluded that for some reason, this sample belongs to a population with significantly lower PTSD than the one to which it was actually compared. As it is, however, the sample better represents some population with a mean PTSD score significantly higher than the population to which the sample was compared.

As we noted earlier in the chapter, z values are ratios of the difference to variability. In the case of the z test, it is a ratio of the difference between sample and population means to the

tan82773_04_ch04_091-120.indd 112 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

140

120

100

80

60

40

20

0 1

105 100 110

120

105 110 115

125

87654 32

Section 4.4 The Confidence Interval

standard error of the mean, which is the relevant measure of data variability in the z test. When that ratio is 1.96 or larger, the result is significant at α 5 0.05. When the ratio is 2.58 or larger, the result is significant at α 5 0.01. Large ratios are the result either of larger dif- ferences between sample and population means, or of comparatively small measures of data variability. In turn, small measures of the standard error of the mean occur when data have little variability (small standard deviation values), or large samples, or both. Although the difference between the means of the sample and the population does not appear to be very large, the result is statistically significant because of a relatively small standard error of the mean.

Since the data indicate that this sample represents a population of service personnel with a mean measure of PTSD larger than the 54.375, the psychologist can calculate the confidence interval to estimate a range within which that new population mean is likely to fall. The appro- priate value of z for the CI will be the same as the one used in the z test, 2.58.

CI 5 6z(σM) 1 M

CI 5 62.58(0.854) 1 56.989

CI 5 62.203 1 56.989 5 54.786, 59.192

With p 5 0.99 confidence, the mean level of PTSD for those deployed to a combat zone for 41 months is somewhere from 54.786 to 59.192. In terms of PTSD, these personnel are likely a different population than those deployed up to 2 months.

To solidify our understanding of the z test, statistical significance, and confidence intervals, consider one more example. The population mean (μ) for intelligence scores is 100, with a standard deviation (σ) of 15. A group of recruits who hope to work as intelligence analysts in a security agency have the following intelligence scores: 105, 100, 110, 120, 105, 110, 115, and 125. Are they characteristic of the population as a whole? If not, with 0.95 confidence, what is the mean of the population that they do represent? The intelligence scores for the 8 subjects are shown graphically in Figure 4.6.

The sample mean is 111.250.

Figure 4.6: Intelligence scores for recruit population

The bar graph depicts the intelligence scores for each member of a group of recruits seeking employment as intelligence analysts.

140

120

100

80

60

40

20

0 1

105 100 110

120

105 110 115

125

87654 32

(continued)

tan82773_04_ch04_091-120.indd 113 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

–3 –2 –1 0 1 2 3 z Scores

z=-1.96 z=1.96 z=2.121

Section 4.4 The Confidence Interval

(continued)

The standard error of the mean is calculated from

σM5 σ/√n 5 15/√8 5 5.303.

For the z test, we have

z 5 M 2 µM σM

5 111.250 2 100

5.303 5 2.121

Depicted in Figure 4.7, this result is in the extreme upper tail of the distribution, beyond the middle 95%. By the usual standard, the group of eight recruits has intelligence scores that are statistically significant, compared to the population of all adults.

Since the conclusion is that the result was statistically significant, the only decision error pos- sible is a type I error. It is possible that the sample does belong to the population of all adults, rather than some separate population with a higher mean intelligence, but the probability of such an occurrence is just p 5 0.05.

Although in any significant finding there is a probability of type I error, note that it is com- paratively low. The p 5 0.05 suggests that, in circumstances such as this one, the finding of statistical significance will fail to hold up just one time in 20. The greater probability is that the sample belongs to a separate population. The confidence interval will help us bracket the mean of that other population:

CI.95 5 6z(σM) 1 M 5 (1/21.96 3 5.303) 1 111.250 5 100.856, 121.644

With 0.95 confidence, the mean of that alternate population is somewhere between 100.856 and 121.644. The lower bound of that confidence interval suggests that the alternate popula- tion might have a mean quite similar to the population of all adults. The fact that the lower value is quite similar to the mean of the original population reflects the fact that the z value was only modestly greater than the limit for significance, as well as the fact that the sample size was very small.

Figure 4.7 Intelligence z scores compared to the distribution curve

A frequency distribution of the intelligence scores shows that the recruits’ scores are higher than the population of all adults, a statistically significant result that may represent a Type I error.

–3 –2 –1 0 1 2 3 z Scores

z=-1.96 z=1.96 z=2.121

Apply It! boxes written by Shawn Murphy

tan82773_04_ch04_091-120.indd 114 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Section 4.5 The z Test Using Excel

4.5 The z Test Using Excel Although Excel’s Data Analysis includes an option for a z test, it is a different z test than the one we perform here. To complete our test, we must program some formulas. We will use the following sample data.

A social worker’s caseload includes eight people with the following annual incomes (in thousands):

13.5, 18, 22.375, 25.240, 26, 29.331, 30, 30.

If all social workers’ clients have an average annual income of 19.500 and a standard devia- tion of 4.525, are this particular social worker’s clients significantly different? We will use Excel to perform the work as follows:

1. Enter the income data into a spreadsheet in cells A1–A8. 2. Enter the formula 5average(A1:A8) in cell A9 to have Excel calculate the mean. 3. In cell A11, determine the standard error of the mean by dividing the population

standard deviation by the square root of the number. The commands in Excel are 54.525/sqrt(8).

4. Determine the z value in cell A13 by entering the commands 5(A9-19.5)/A11. The part in parentheses is the numerator in the z ratio: M (cell A9) 2 µM.

Figure 4.8 shows a screenshot of the display just before you press Enter.

The result is z 5 3.004. Testing at p 5 0.05 (for which z 5 1.96), these eight people have sig- nificantly different incomes than the population of all social workers’ clients.

Figure 4.8: Calculating a z test in Excel

Use Excel to perform a z test by entering the formulas identified.

Source: Microsoft Excel. Used with permission from Microsoft.

tan82773_04_ch04_091-120.indd 115 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

Summary and Resources

Chapter Summary While the z test is not common in statistical testing, it is not unknown. Its unique value comes from the way it introduces us to statistical testing. The z test provides a good intro- duction to formal statistical testing and although an uncomplicated test, it addresses many of the same issues that more advanced tests do. Generally speaking, behavioral researchers are much more interested in analyzing the performance of groups than of single individuals. We have many reasons to wonder whether this or that group truly represents the popula- tion to which it is being compared. The z test provides a mechanism for evaluating program quality (is this program as good as the rest?), therapeutic techniques (is this intervention as effective as others?), improvement strategies, and indeed any situation where we wish to compare one group for whom we have data to an identified population.

The z test is based on the distribution of sample means (Objective 1), a population of the means of samples rather than of individuals’ scores. The central limit theorem indicates that such a distribution will be normal even if the distribution of individual scores is not (Objective 2). The normality allows the use of the z table to analyze how groups compare to populations (Objectives 4 and 6). Because the sample data researchers analyze sometimes do not fit well with the population presumed to be the source, the z test provides a way to determine whether the sample belongs to some other population, an outcome related to the concept of statistical significance (Objective 5).

When the sample is determined to represent some other population, the sample mean is a point estimate of the value of that other µM, but only an estimate. The confidence interval provides a range of values within which the mean of that other population will occur with a specified probability (Objective 7). In doing so, the confidence interval indicates the preci- sion with which M estimates µM.

Inferential statistical analysis involves the risk of making an incorrect decision. Occasionally, results that appear significant in one test will not hold up when the study is repeated with new data. On the other hand, further analysis sometimes overturns a nonsignificant finding. These type I and type II errors, respectively, remind us that statistical decisions are based on probabilities rather than certainties (Objective 8).

Small samples, no matter how carefully selected, cannot mirror all the relevant characteris- tics of complex populations, and populations involving people are invariably complex. For this reason, a procedure to determine the size of the sample needed to emulate the important characteristics of the population has some utility (Objective 3). Formula 4.3 meets that need.

Statistical significance is a very important concept in educational analysis. When new pro- grams or strategies are instituted, we often look for ways to determine whether the program makes a difference. The z test helps answer some of these questions. Specifically, it indicates whether a result is likely to have occurred by chance, whether the outcome is random. As important as the z test is as an introduction, it has limitations in that it requires access to two parameters, µM and σM. Although a researcher can usually determine population means, the standard error of the mean sometimes is not accessible. The t tests Chapter 5 discusses provide a way around this difficulty.

tan82773_04_ch04_091-120.indd 116 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

central limit theorem Holds that if a popu- lation is sampled an infinite number of times using sample size n and the mean of each sample determined, the multiple means (Ms) will take on the characteristics of a normal distribution whether or not the original population of individuals was normal.

confidence interval (CI 5 6z(σM) 1 M) Provides a way to determine how precisely M estimates µM.

decision errors The result of any statistical decision based on a probability. There are two types of decision errors: type I errors and type II errors.

distribution of sample means A popu- lation based on the means of all possible samples of equal size drawn from the par- ticular population, rather than on the values of the individual measures which make up the population.

law of large numbers Mathematical prin- ciple that errors diminish as the number of data points increases.

power Refers to the likelihood that a test will detect statistical significance when it is present. Because error is often reduced in a factorial ANOVA, reducing the value of MSwith and making a significant F more likely, it is often a more powerful test than a one-way ANOVA.

random selection Selection of a sample from a population where every member has an equal probability of being selected.

sampling error Reflected in the degree to which the characteristics of the sample, such as the mean and standard deviation, vary from those populations.

standard error of the mean Standard deviation of the sample means (σM).

statistically significant An outcome so unlike the population to which it is com- pared that it can be presumed to reflect some other population. Put another way, the value of M is distant enough from µM that it probably was not randomly selected from the distribution of sample means. If the probability that an outcome occurred by chance is p 5 0.05 or less, the outcome is statistically significant.

systematic sampling error Occurs because the same mistake in selecting a sample of a population is made over and over.

type I errors, or alpha (α) errors Deci- sion errors made when a result is judged to be statistically significant, but further research and testing would show that it is not.

type II errors, or beta (ß) errors Deci- sion errors that occur when the sample is characteristic of some population other than the distribution of samples means to which it was compared, but the statis- tical testing suggests no significant difference.

z test Indicates how distant a sample mean is from the mean of the distribution of

Key Terms

This summary offers a good barometer of your grasp of Chapters 1–4. Although some of the material has probably been familiar, many of the ideas are likely new. If this review makes sense, that is excellent. If some areas seem difficult, take some time to go back to the relevant sections and review. Statistical analysis is incremental, as we have stressed before, so it is important to understand what has been presented before continuing. Working the examples just below—repeatedly if needed—will help.

tan82773_04_ch04_091-120.indd 117 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

sample means in units of the standard error of the mean. When the value of z is 1.96 or greater, there is a probability of

p 5 0.05 or less that the sample belongs to the population.

Review Questions Answers to the odd-numbered questions are provided in Appendix A.

1. If all the psychologists working at a state mental hospital have an average age of 47.5 years, what will be the value of µM created from such a population?

2. A researcher calculates the standard deviation of the psychologists’ ages in Review Question 1. If the researcher calculates the standard error of the mean for the distribution of sample means for the same data, which will have the greater value? Why is there a difference?

a. For which population is the standard error of the mean the variability measure?

b. What is the equivalent of the standard error of the mean for a population based on individual scores?

3. The assistant vice president for personnel at a college has job-performance scores for all clerical staff, with a mean value of 32.956 and a standard error of the mean of 5.924.

a. What is the probability of randomly selecting a sample with a job satisfaction mean of 35.0 or higher?

b. If a group with M 5 35.0 is selected, is it significantly different from the population?

c. If a group is significantly different from a population, which type of decision error is possible?

4. The clerical staff in a large law office scores the following on job performance:

25, 37, 38, 43, 44, 48, 51

If the mean level of performance for all clerical staff is 33.255, with a standard error of the mean of 3.248, are those in the law office characteristic of that population? Test at p 5 0.05.

5. What is the relationship between the level of probability for a statistical test and the probability of type I error in the event of a significant finding? When a result is statistically significant, what is the probability of type II error?

6. The standard deviation for a major intelligence test is σ 5 15.0. If, in a given year, the test is administered to 347 people, what is the value of the standard error of the mean?

tan82773_04_ch04_091-120.indd 118 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

7. An exclusive graduate program requires GRE Quantitative scores of 500 or better. This year’s entering class have n 5 16 and M 5 625.

a. Are they characteristic of a national population of graduate students for whom µ 5 500 with σ 5 100?

b. What is the probability that a group of 16 applicants selected at random would have σ 5 100 and M 5 525 or better?

8. A researcher wishes to gather a sample of people who have intelligence scores that differ from the national standard deviation of 15 by no more than 3 points, with 0.95 confidence.

a. How large must the sample be? b. What will happen to the required sample size if the researcher insists on being

0.99 confident? c. Explain the answer to 8b. d. How large must the sample be if it is to vary from the national standard deviation

by no more than 2 points?

9. A group of social workers take a measure of optimism and score as follows:

11, 14, 14, 16, 19, 20, 22, 23, 27, 30

If the population standard deviation is 4.554,

a. What is the value of the standard error of the mean? b. What is the z value for a z test with this group if µM 5 26.0? c. If 26.0 is the mean for all employed adults, is this group of social workers signifi-

cantly different? d. Use Excel to complete this problem. Refer to Figure 4.5 for help.

10. If a z test result is not significant, why will a confidence interval for the population mean contain the value of the population mean to which the sample was compared?

11. What factors will reduce the size of a confidence interval?

12. If someone is testing at p 5 0.01 and the result is statistically significant, what is the probability of a type I error? What is the probability of a type II error?

Answers to Try It! Questions

1. The distribution of sample means has less variability than a distribution of individ- ual scores because sample means moderate the effect of extreme scores. The larger the sample, the more extreme scores are minimized as factors in data variability.

2. “Statistically significant” means that the calculated value, z in this case, is large enough that it is unlikely to have occurred by chance; it is probably not a random outcome.

tan82773_04_ch04_091-120.indd 119 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.

Summary and Resources

3. When the difference between M and µM in a z test is not significant, the difference is attributed to random variability; the value of M is one of the possible values of samples drawn at random from the distribution of sample means.

4. A type II, or beta, error can occur only when a result is determined not statistically significant. When the result is significant, the probability of ß 5 0.

5. Calculated only for a statistically significant result, a confidence interval for the value of z indicates a range of scores within which the population mean that the sample does probably represent occurs.

tan82773_04_ch04_091-120.indd 120 3/3/16 2:31 PM

© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.