300 words
121
5t Tests
Owen Franken/Corbis
Chapter Learning Objectives After reading this chapter, you should be able to do the following:
1. Explain the advantage of the one-sample t test over the z test.
2. Compare the one-sample t test to the independent t test.
3. Distinguish between one-sample and one-tailed t tests.
4. Explain hypothesis testing in statistical analysis.
5. Determine practical significance.
6. Construct a confidence interval for the difference between the means.
7. Discuss research applications for the t tests.
122
Section 5.1 Estimating the Standard Error of the Mean
Introduction The z test (Chapter 4) involves more than just an expansion of the z score from individu- als to groups. The z test also introduced us to statistical significance. Those who work with quantitative data need to be able to distinguish between outcomes that probably occurred by chance and those that are likely to emerge each time the data are gathered and analyzed. For example, data indicate that a group of clients, each one grieving the loss of a loved one, becomes more positive and peaceful with time. A therapist needs to know whether this would have happened anyway with the passage of time, or whether it has something to do with the treatment the therapist provided. The z test answers such questions.
The z test has important limitations, however. Its greatest difficulty is the requirement of a value for the population standard error of the mean, σM. That value is not the type of infor- mation that tends to come up just as a matter of course, and it may be inaccessible when the researcher lacks access to population data, including the population standard deviation.
The z test’s second limitation is allowing only one type of comparison: a sample to a popula- tion. What if two therapists want to compare their respective groups of grieving clients to see if one type of grief counseling is better than the other? The z test does not allow that compari- son, which is where William Sealy Gosset comes in.
Gosset worked for Guinness Brewing during the early part of the 20th century. Part of his responsibility was quality control, and he studied ways to make sure that day-to-day brewing remained consistent with Guinness’s standards. A man of remarkable ability, Gosset devised procedures for quantifying product quality and then testing the consistency of the quality over time. To help him in his analyses, he developed the t tests.
Gosset recognized that his t tests had application well beyond studying the quality of beer and wanted to publish information about his tests so that others could benefit. A no-publishing policy at Guinness—instituted after a previous employee published what the company con- sidered trade secrets—presented a roadblock. Believing that his research would not compro- mise Guinness, Gosset published anyway under the pseudonym “Student.” Traditional statis- tics textbooks still contain references to “Student’s t.”
The pseudonym Gosset selected suggests his unpretentious nature. As his career pro- gressed, he became associated with Karl Pearson and Ronald Fisher, who also made remark- able contributions to statistical analysis. These were men of substantial ego in a running professional and personal battle with each other, but Gosset maintained good relationships with both.
5.1 Estimating the Standard Error of the Mean The z test statistic requires two parameter (population) values: the mean (µM) and the stan- dard error of the mean (σM):
z 5 M 2 μM
σM
123
Section 5.2 One-Sample t Test
Researchers usually have access to the population mean. Gosset’s problem was how to work around the more elusive population standard error of the mean, σM, that the z test also requires.
Although σM can be calculated by dividing the population standard deviation by the square root of the number (σM 5 σ/√n), that offers small consolation if the population standard deviation (σ) is unavailable. How can researchers proceed without a value for sigma?
The sample mean, M, is one of the possible values of the population mean, μ. Likewise, the sample standard deviation, s, is one of the possible values of the population standard devia- tion, σ. Gosset reasoned that statistics from samples can provide quite accurate estimates of their equivalent population parameters, particularly when the samples are fairly large and randomly selected.
Samples can never exactly replicate the characteristics of populations, however, and the dif- ference is called “sampling error.” To adjust for sampling error, the difference between μ and M or between σ and s, Gosset proposed a correction. Part of the genius of his solution is that correction is graduated; with small samples, the adjustment is greatest when the risk of sam- pling error is greatest.
To determine the standard error of the mean, Gosset used the sample standard deviation (s) as an estimate of the population standard deviation (σ). So instead of σM 5 σ/√n, he esti- mated the standard error of the mean, SEM, this way:
Formula 5.1
SEM 5 s
√n
where
s 5 the standard deviation of the sample
n 5 the number in the sample
So, perhaps a researcher examining depression among the aged finds the following depres- sion scores: 13, 17, 16, 17, 11, 13, 18, 15, 15, and 12. Rather than trying to determine the standard deviation (σ) for depression in this population, the researcher can calculate the sample standard deviation (s) and divide it by the square root of 10 to determine the esti- mated standard deviation. That value is 2.359/√10 5 0.746.
5.2 One-Sample t Test Using the estimated standard error of the mean (SEM) in the place of σM changes the test statistic for the one-sample t test to
Formula 5.2
t 5 M 2 μM
SEM
124
Section 5.2 One-Sample t Test
where
t 5 the calculated value of t
M 5 the sample mean
µM 5 the mean of the distribution of sample means
SEM 5 the estimated standard error of the mean
The numerator of the one-sample t test is the same as the numerator for the z test because both tests answer the same question: Is the sample characteristic of the population to which it is compared? The t test, however, frees researchers from needing to know the population standard error of the mean. Choosing to rely on statistics from samples instead of population values, however, creates the risk of sampling error.
Besides substituting the estimated standard error of the mean (SEM) for the population stan- dard error of the mean (σM), another difference between z and the t tests is that there is just one z distribution but many t distributions. Because there is just one z distribution, one value, called a critical value, indicates statistical significance. In the z test, that value is z 5 1.96 (Table B.1). With many t distributions, each with its own characteristics, a different criti- cal value indicates statistical significance for each distribution. Different t distributions are defined by their degrees of freedom (df). For the one-sample t test, the degrees of freedom are the number of scores in the sample minus one: df 5 n 2 1. So if the sample size (n) 5 10, then df 5 9; if n 5 30, df 5 29, and so on.
The risk of sampling error is greatest when samples are smallest because, of course, small samples have a hard time emulating population characteristics. So Gosset used df, which directly correlates with sample size, as an index to the critical values which indicate statisti- cal significance. Small samples (and therefore small df) require a larger value of t to indicate statistical significance than when sample sizes (and df) are larger. As df increases, the critical values required for significance decline until ultimately, with a sample of infinite size (which of course is possible only in theory), the critical values for t match that for z, 1.96.
Calculating the One-Sample t The steps for completing the one-sample t follow:
1. Calculate the sample mean, M (Formula 1.1) and the sample standard deviation, s (Formula 1.3).
2. Calculate the estimated standard error of the mean, SEM (Formula 5.1). 3. Calculate the value of t (Formula 5.2). 4. Compare the calculated t to the critical value for t for the appropriate df.
Consider a psychologist working for the police department in a major city who is interested in job stress among law enforcement officers. The psychologist wants to know if job stress is sig- nificantly different among police officers than it is in the general population, where stress has a mean of 27.353. The psychologist administers the tool test initiative (“too-tite,” for short) to 10 police officers at random. Their too-tite scores are as follows:
22, 26, 29, 29, 33, 35, 36, 38, 40, 42
125
Section 5.2 One-Sample t Test
First, verify that for this sample M 5 33.0, and s 5 6.412. Remember that the mean of a popu- lation based on individual values equals the mean of the distribution of sample means, or µ 5 µM (Chapter 4). So, because µ 5 27.353, µM 5 27.353. Next, estimate the standard error of the mean: SEM 5 s / √n (Formula 5.1), SEM 5 6.412 / √10 5 2.028. Then calculate the value of t: t 5 (M – µM) / SEM 5 (33.0 2 27.353) / 2.028 5 2.785; √10 52.028. Finally, the value of the calculated t must be compared to the critical value for t, listed in Table 5.1 (Table B.2 in Appendix B). The researcher’s test happens to be a “two-tailed” test (explained later in the chapter), so consult the columns under “Two-tailed tests.”
Earlier, we noted that the greatest threats to sampling error occur when samples are smallest. That reality is reflected in the critical values in the table. Note how dramatic the change is from, say 2 to 3 degrees of free- dom compared to the change from 29 to 30.
The first column in the table is for degrees of freedom. In our example, n 5 10, so df 5 9. The second column lists the critical values of t when the test is conducted at p 5 0.05. The p value indicates the probability that a significant finding will result in a type I error. As we noted in Chapter 4, the probability of a type I or alpha error (α) is always the same as the probability level (p) at which the test is conducted. The 0.05 is the traditional level at which statistical tests are conducted. For example, a research article that does not state the alpha level usually assumes α 5 0.05.
The third column has the critical values of t when testing at p (or α) 5 0.01. For p 5 0.05, the critical value for df 5 9 is 2.262. So as not to confuse the critical value of t with the calculated value from the t test, it is helpful to write the critical value and note its df and the level of prob- ability for the test as follows:
t0.05(9) 5 2.262
The subscripts to t indicate that the test was conducted at p 5 0.05 with 9 degrees of freedom.
Note that the critical value for testing at p 5 0.01 with df 5 9 is 3.250. The greater the critical value, the less likely that such a calculated value of t would occur by chance, which means that the likelihood of a type I error is reduced with the more stringent p or alpha level. The com- panion problem, as we noted in Chapter 4, is that the likelihood of a type II error increases correspondingly. This relationship between type I and type II errors places the researcher on the horns of a dilemma: Which error poses the greater risk, type I or type II?
In a circumstance where the question is whether, for example, psychologists’ analytical abilities are characteristic of the analytical abilities of all adults, failing to find that the psychologists are unique when actually they are is a type II error. However, the type II error probably poses little risk, except perhaps for those attempting to identify the characteristics of effective psycholo- gists. On the other hand, failing to identify that violent offenders have significantly greater levels of social alienation than what prevails in the general population may be much more important. If social alienation leads to the commission of violent crimes, and a researcher sets an alpha level that is so stringent (0.01 or 0.001) that it does not identify those with the greatest level of alienation as belonging to a distinct population, perhaps there is no opportunity for interven- tion. In this latter situation, the researcher would probably prefer a type I to a type II error.
Try It!: #1 In terms of degrees of freedom, which t distributions are accompanied by the highest critical values for t?
126
Table 5.1: t distribution critical values
df
Critical t value
Two-tailed tests One-tailed tests
p 5 0.05 p 5 0.01 p 5 0.05 p 5 0.01
1 12.706 63.657 6.314 31.821
2 4.303 9.925 2.920 6.965
3 3.182 5.841 2.353 4.541
4 2.776 4.604 2.132 3.747
5 2.571 4.032 2.015 3.365
6 2.447 3.707 1.943 3.143
7 2.365 3.499 1.895 2.998
8 2.306 3.355 1.860 2.896
9 2.262 3.250 1.833 2.821
10 2.228 3.169 1.812 2.764
11 2.201 3.106 1.796 2.718
12 2.179 3.055 1.782 2.681
13 2.160 3.012 1.771 2.650
14 2.145 2.977 1.761 2.624
15 2.131 2.947 1.753 2.602
16 2.120 2.921 1.746 2.583
17 2.110 2.898 1.740 2.567
18 2.101 2.878 1.734 2.552
19 2.093 2.861 1.729 2.539
20 2.086 2.845 1.725 2.528
21 2.080 2.831 1.721 2.518
22 2.074 2.819 1.717 2.508
23 2.069 2.807 1.714 2.500
24 2.064 2.797 1.711 2.492
25 2.060 2.787 1.708 2.485
26 2.056 2.779 1.706 2.479
27 2.052 2.771 1.703 2.473
28 2.048 2.763 1.701 2.467
29 2.045 2.756 1.699 2.462
30 2.032 2.750 1.697 2.457
∞ 1.960 2.576 1.645 2.326
Source: Critical values of the t distribution. Retrieved from shazam.econ.ubc.ca/intro/critval.htm
127
+2.262�2.262
Section 5.2 One-Sample t Test
Interpreting t Test Results If the outcomes from the police officers’ job-stress level test were z test results, the only ques- tion would be whether the calculated value is equal to or greater than 1.96. If so, the result is significant. With t, the question is still whether the calculated value is equal to or greater than a table value, but that value changes depending upon df. In the job-stress level example, the calculated t 5 2.785. From the table of critical values, the value for p 5 0.05 and df 5 9 (t0.05(9)) is 2.262. Because the calculated value is larger than the table value, the result is statistically significant, which in our example, indicates that police officers have significantly greater stress than people in the general population. Figure 5.1 illustrates this result, where the plus- and minus-1.96 values associated with statistical significance for a z test are replaced with the plus- and minus-2.262 for a t test with 9 degrees of freedom. The difference in job stress between law enforcement officers and the general public probably did not occur by chance.
According to these data, law enforcement officers represent a population with job stress higher than the general public’s. Such a finding confirms a potential problem to decision- makers and allows those involved to design services for law enforcement personnel that can help remediate stress-related problems.
Another Example Pursuing stress findings, the psychologist decides to compare social workers’ levels of stress to the job stress of the general public. For 12 randomly selected social workers, the stress scores are as follows:
19, 21, 22, 25, 27, 27, 32, 33, 35, 35, 36, 37
1. Verify that M 5 29.083, and s 5 6.374; µM is still 27.353. 2. Calculate the standard error: SEM 5 s/√n 5 6.374 / √12 51.840 3. Next, find the value of t:
t 5 M 2 μM
SEM 5
29.083 2 27.353 1.840 5 0.940
4. t0.05(11) 5 2.201
Figure 5.1: Statistical significance in a t test with 9 degrees of freedom
+2.262�2.262
128
Section 5.2 One-Sample t Test
Because the sample size—and therefore the df—have changed, the critical value is different for this problem than it was for the example about stress among police officers. At p 5 0.05 and df 5 11, the difference between social workers and the general population is not statisti- cally significant; the calculated value of t is less than the critical value from the table. Although as sample size (and so df) increase the critical value declines, a t 5 0.940 is never going to be statistically significant, as scanning down the table of critical values verifies. At p 5 0.05, no critical value is as small as 0.940, and the critical values at p 5 0.01 values are larger yet. Even with a sample of infinite size, the lowest critical value when testing at p 5 0.05 is 1.96, the same as the critical value for the z test. In terms of stress, social workers are no different than people in the general population. In contrast to law enforcement officers, probably no special program or action needs to address the stress-related problems of this particular group.
The One-Tailed Test The z and t significance tests to this point have been what are called “two-tailed tests.” That
description means that the sample mean is either sig- nificantly less than the population mean, or signifi- cantly greater. To put it differently, in a two-tailed test, whether the calculated values of z or t are positive (placing the result in the upper tail of the distribution) or negative (in the lower tail of the distribution) does not matter. The only issue is whether the absolute cal- culated value—the value without regard to the sign—is as large as or larger than the critical value.
In the first t test example, if law enforcement officers’ too-tite scores had been the same dis- tance below the mean for the population as they were above (that is, if M 5 21.706 instead of 33.0 and the SEM had remained the same at 2.028), the t value would have been
t 5 M 2 μM
SEM 5
21.706 2 27.353 2.028 5 22.785
While we would interpret this result differently—law enforcement officers are significantly less, rather than more, stressed than the general population—the result would have been sta- tistically significant nevertheless. In a two-tailed test, the sign of t does not matter. But in the original problem, the police psychologist might have framed the question this way: Are police officers significantly more stressed than the general population?
With that wording, the analysis becomes a one-tailed test. Language such as more stressed or less stressed indicates a prediction about more precisely how the sample is expected to differ from the population, rather than just asking whether the sample is different from the popula- tion. Specifying the direction of the difference (more or less) indicates the tail of the distribu- tion in which the difference is expected to occur.
The Statistically Significant Region in the One-Tailed Test To restate the concept, the way the question is framed determines whether the result can occur in either, or just one, tail of the t distribution. In a two-tailed test conducted at p 5 0.05, for example, is the sample significantly different from the population?, the following apply:
Try It!: #2 If the degrees of freedom for a one-sample t test (df) 5 14, what is the associated sample size?
129
Section 5.2 One-Sample t Test
• A statistically significant result can occur in either the highest 2.5% of the distribu- tion or the lowest 2.5% of the distribution.
• The sign of the calculated value of t does not matter.
Apply It! Testing a Pilot Program
An innovative teacher has implemented meditation (she calls it “quiet time”) in her classroom to relieve high stress, increase test scores, and improve behav- ior among students who participate. The school principal wants to determine whether the quiet-time students are get- ting better grades than the population of all students at the school. Note that when the question is whether the quiet time students are doing better, not just performing differently, the comparison becomes a one-tailed test. Because of
widespread skepticism about the program from school board members, the principal chooses a conservative p 5 0.01 criterion for statistical significance.
During the 2010 school year, 10% of the student population was chosen at random to partici- pate in the quiet-time program. That year, the grade point average for the student population as a whole was 2.53. The GPAs for 12 randomly selected students who had been involved in the quiet-time program were as follows:
2.49, 2.85, 3.18, 2.82, 2.89, 2.50, 3.26, 3.20, 2.48, 2.93, 2.67, 2.47
For these 12 students, M 5 2.812 and s 5 0.295.
SEM 5 s / √n 5 0.295 / √12 5 0.085
Note that because µ 5 2.530, µM 5 2.530.
t 5 M 2 µM
SEM 5
2.812 2 2.530 0.085 5 3.318
To compare the calculated t to the critical value for t, consult the columns in Table 5.1 for one- tailed test (the final two columns in the table). For this analysis, with p 5 0.01 and df 5 11, the critical value of t0.01(11) 5 2.718.
At p 5 .01 level and df 5 11, the results are statistically significant because 3.318 . 2.718. Therefore, at p 5 0.01, students in the quiet-time program had significantly higher GPAs than those of the general middle-school student population.
Using statistics, the principal was able to present quantitative evidence that the quiet-time program is associated with higher student GPAs. To decrease the possibility of type I error, the principal selected p 5 0.01 rather than 0.05. Perhaps this analysis, as well as other tests measuring stress and behavioral issues, can enable the principal to raise additional funds to expand the pilot program.
Mandy Glinsbockel/Demotix/Corbis
Apply It! boxes written by Shawn Murphy
130
Highest 2.5%Lowest 2.5%
Highest 5%
Section 5.2 One-Sample t Test
Figure 5.2 indicates a two-tailed test.
Figure 5.2: Statistically significant regions in a two-tailed test
Highest 2.5%Lowest 2.5%
In a one-tailed test conducted at p 5 0.05, for example, is the sample significantly greater than the population?, the following apply:
• A statistically significant result can occur in only the highest 5% of the distribution. • The sign of the calculated value of t is all important.
In the t test example comparing the police officers to the general public, the question “are police officers significantly more stressed than the general public?” would have been a one- tailed test, with a rejection region resembling the distribution in Figure 5.3.
Figure 5.3: Statistically significant regions in a one-tailed test where the sample is predicted to be higher than the population
Highest 5%
Critical Values in the One-Tailed Test Because the rejection region is divided between both tails of the distribution in two-tailed tests and placed entirely in one tail in one-tailed tests, the point at which a result becomes sta- tistically significant differs in one- and two-tailed tests, which means that they have different
131
Section 5.3 Hypothesis Testing
critical values. When testing at p 5 0.05, the critical value for a two-tailed test is the point which includes 0.475 out of each 0.5 of the distribution. The equivalent proportion for a one- tailed test includes 0.45 of the distribution. The z value that includes 0.45 of the distribution is z 5 1.65 (actually z 5 1.645, which the table does not display). Recall that the z value that includes 0.475 of the distribution is z 5 1.96. In our last example, where n 5 10 (df 5 9), the critical value for a two-tailed test (t0.05(9)) is 2.262. For the equivalent one-tailed test, the value is 1.833. Table 5.1 shows the critical values for one-tailed tests.
The less-demanding critical value makes it “easier” to find statistical significance with a one-tailed test, as long as the difference is in the direction predicted; less extreme values of t can be statistically significant in one-tailed tests. The downside to one-tailed tests is that the opposite tail contains no rejection region. If something unex- pected occurs and the value of t or z has the “wrong” (the unexpected) sign, there is no way to find statistical significance, no matter how extreme the value. To protect against having to ignore the unexpected outcome, some researchers and analysts stay clear of one-tailed tests.
5.3 Hypothesis Testing In statistical testing, the calculated value either is, or is not, significant. Those two possibili- ties are represented in predictions called the null hypothesis and the alternate hypothesis. The null hypothesis predicts that the result will not be statistically significant: that there is no (null) difference between the population that the sample represents and the population to which it is compared. The alternate hypothesis predicts that the result will be statistically significant: that there is a difference.
• The null hypothesis for a z, or for a one-sample t test, is written as follows: H0: µ1 5 µ2. Symbolically, the null hypothesis (H0) states that the mean of the pop- ulation from which the sample was drawn (µ1) has the same value as the population to which the sample is compared (µ2).
• The alternate hypothesis (HA) for two-tailed tests—we will address one-tailed tests below—is stated this way: HA: µ1 ? µ2. The mean of the population from which the sample was drawn (µ1) does not have the same value as the population to which the sample is compared (µ2).
In hypothesis testing, results are always stated in terms of the null hypothesis. When the z or t value is statistically significant, one “rejects the null hypothesis,” which means that the sample probably belongs to a population other than the one to which it was compared. When a result is not statistically significant, one “fails to reject the null hypothesis.” Although we might be tempted to say “accept the null hypothesis,” it would be extremely difficult to prove that µ1 5 µ2. Because the calculated value is not great enough to reject the null hypothesis does not necessarily mean that µ1 5 µ2. We know only that there is not enough of a difference to let us reject H0.
Try It!: #3 A researcher asks whether intelligence scores are significantly different for math majors and journalism majors. Is the researcher calling for a one-tailed or a two-tailed test?
132
Section 5.3 Hypothesis Testing
The Null and Alternate Hypotheses in a One-Tailed Test In one-tailed tests, the null hypothesis is the same as it is for two-tailed tests, H0: µ1 5 µ2. The alternate hypothesis, however, reflects the “greater-than,” or “less-than” prediction. If the psychologist researching job stress had predicted higher job stress for the law enforce- ment officers than for the general population, the alternate hypothesis would have been HA: µ1 . µ2, i.e., the mean of the population represented by the sample is greater than the mean of the population to which it is compared. In terms of the example, the population the law enforcement officers represent is expected to have a higher mean stress level than the general population. Note that the first population mean, µ1, represents the mean of the population from which the sample was drawn. Placing the mean of the population from which the sample was drawn first in the hypothesis matches the way the z and t tests are written, with M preceding µM in the numerator of the test statistic (M 2 µM). If the researcher predicts the sample to belong to a population where the mean is lower than the population to which it is compared, the alternate hypothesis is HA: µ1 , µ2, the mean of the population represented by the sample is less than the mean of the population to which it is compared.
Another Example A hospital administrator reads a study indicat- ing that compassion for patients is higher among nurse’s assistants than among health-care profes- sionals generally and wishes to test that finding at San Juan General Hospital. The mean for all health professionals on the Index of CAreful Recuperative Effort (the I-CARE) is 19.600. For 10 nursing assis- tants, I-CARE results are
15, 17, 20, 20, 23, 23, 23, 25, 27, 28
Do nursing assistants have more compassion than other health-care professionals? Is the difference statistically significant?
The null hypothesis for this analysis is H0: µ1 5 µ2.
To test whether nurses’ assistants have more compassion for patients than the population of health care professionals does, the alternate hypothesis is HA: µ1 . µ2.
To complete the test, follow these steps:
1. Verify that for nursing assistants, the I-CARE M 5 22.1 and s 5 4.149. 2. Note that SEM 5 s / √n 5 4.149 / √10 5 1.312. 3. Calculate
t 5 M 2 μM
SEM 5
22.1 2 19.600 1.312 5 1.905.
LWA-Dann Tardif/Corbis How can an analyst compare the compassion among a sample of nurses’ assistants to the level of compassion among the general population?
133
Section 5.4 The Independent t Test
4. In the column for one-tailed tests, note that the critical value is t0.05(9) 5 1.833.
• With the entire rejection region in one tail of the distribution, the calculated value of t is more extreme than the critical value from the table.
• The results are statistically significant. • Reject H0.
5.4 The Independent t Test Gosset’s one-sample t test represents an important development to anyone who needs to make a judgment about whether a particular sample represents a speci- fied population. It relieves the researcher of the need to have that standard error of the mean parameter (σM) for the entire population, a value which may not be accessible.
But if rather than does the sample belong to the population the question is whether two samples belong to different populations, the z and one-sample t tests offer no way to answer. Another approach is required. Suppose a psychologist working with substance abusers divides them into two samples, each receiving a different kind of therapy, and wishes to know which is better? Questions that compare two samples to each other are the domain of the independent t test. In this test, Gosset designed a procedure where everything needed to complete the test can be calculated from the sample data. No population parameters are necessary.
The word independent in the term independent t test refers to the nature of the samples. They must involve separate groups; subjects who are in one group cannot also be in the other. (In other words, this procedure does not allow researchers to measure a group and then apply a different treatment and measure them again; such a procedure describes what is called the before/after t test.)
A Population Based on Difference Scores The z test and the one-sample t test are based on the distribution of sample means. A fre- quency distribution based on the means of samples makes it possible to ask questions about whether a particular sample is likely to have been drawn from a particular population.
The independent t test is based on a population created from “difference scores.” Rather than sampling one group at a time and then plotting the sample mean to create the distribution of sample means, this population is created by
• selecting two samples of the same size, • computing the sample mean for each,
Try It!: #5 If the t value in the I-CARE problem were 21.905 instead of 11.905, would the sta- tistical decision change?
Try It!: #4 Had the I-CARE problem been a two-tailed test, (a) what would the alternate hypoth- esis have been, and (b) what would the correct decision have been?
134
Section 5.4 The Independent t Test
• subtracting the second mean from the first (M1 2 M2), and • repeating this with enough randomly selected pairs of samples until the result is a
distribution of difference scores.
When the first sample mean has a higher value than the second sample mean (M1 . M2), the difference will be positive and plotted in the right half of the distribution. When the second mean (M1 , M2) has the higher value, the difference is plotted in the left half of the distribu- tion, and so on. The mean for this distribution of difference scores is symbolized as µM1 2 M2 with the subscript M1 2 M2 reminding us that this parameter is the mean of all differences between pairs of sample means. Although the means of pairs of samples usually have some difference, over many pairs of samples, the positive differences counter the negative differ- ences and the mean of all those M1 2 M2 differences will be 0, µM1 2 M2 5 0.
In the z distribution, σ measured variability. In the distribution of sample means σM, the standard error of the mean, measured data variability. In the distribution of differences, variability in the difference scores is measured with the standard error of the difference, σM1 2 M2. In practice, however, the standard error of the difference is estimated, just as the standard error of the mean was estimated for the one-sample t test.
Using the Distribution of Difference Scores to Explain Results As we noted earlier, the mean of the distribution of difference scores for any population is always 0. That population indicates how much M1 2 M2 difference can be expected to occur by chance when two samples are selected from the same population. When differences between sample means exceed expectations—when the value of t in an independent t test is significant—the samples probably represent different populations.
Perhaps a researcher selects two random samples of subjects from the population of all cleri- cal workers in a particular city. If the researcher measures subjects’ analytical ability, the result would probably show some differences between the two groups, but they would likely be minor—two samples drawn from the same population. If a facilitator presents analytical techniques in a series of weekly seminars to those in one sample, and then the researcher measures analytical ability again after 60 days, perhaps the difference between the two groups grows too great to be explained by sampling variability. The evidence would be a sta- tistically significant t value.
The Hypotheses in the Independent t Test The null and alternate hypotheses for independent t tests look the same as for the one-sample t test, H0: µ1 5 µ2 and H0: µ1 ? µ2. The distinction is they reflect two samples rather than just one. The null hypothesis predicts that the mean of the population from which sample one was drawn has a value the same as the mean of the population from which sample two was drawn. In other words, the samples come from the same population.
The alternate hypothesis maintains that the samples were drawn from different populations. If the independent t test is one-tailed, the alternate hypothesis indicates the direction of the predicted difference, just as with the one-sample t test. Either H0: µ1 , µ2 or H0: µ1 . µ2.
The test statistic for the independent t test is
135
Section 5.4 The Independent t Test
Formula 5.3
t 5 M1 2 M2 SEM1 2 M2
The numerator in the test statistic for the independent t test is actually, M1 2 M2 2 µM1 2 M2, the mean of the first sample minus the mean of the second sample minus the mean of the distribution of differences. But because µM1 2 M2 is always 0, that part of the term is usually omitted, leaving us with M1 2 M2.
For the denominator of the test statistic, Gosset provided an estimated standard error of the difference (SEM1 2 M2). It takes the place of the population standard error of the difference σM1 2 M2, which would be extremely difficult to calculate. That parameter value that would require the calculation of the standard deviation (σ) of all differences between all possible pairs of sample means in the distribution of differences! Clearly, that value is better estimated than calculated directly.
Before actually using the independent t test, note the harmony in the tests covered to this point:
1. The numerator for each procedure, from the z score to the independent t, involves a difference score. • For the z score, the numerator is x 2 M. • For the z test, the numerator is M 2 µM. • For the one-sample t test, the numerator is also M 2 µM. • Similarly, for the independent t test, the numerator is M1 2 M2.
2. The denominator for each involves a measure of data variability. • For the z score, the denominator is s. • For the z test, the denominator is σM. • For the one-sample t test, the denominator is SEM. • The independent t also involves a measure of data variability, SEM1 2 M2.
The Variability Estimate Because this test includes data from two samples, the variability statistic, SEM1 2 M2, must include the variance within both samples, making the estimated standard error of the differ- ence a “measure of pooled variance.” When the number in both samples is equal, the formula for the estimated standard error of the difference is as follows:
Formula 5.4
SEM12M2 5 Î (SEM1) 2 1 (SEM2)
2
where,
SEM 5 s / √n
SEM1 2 5 the square of the estimated standard error of the mean for sample 1
SEM2 2 5 the square of the estimated standard error of the mean for sample 2
136
Section 5.4 The Independent t Test
An Independent t Test Example Suppose that a sports psychologist is interested in athletes’ willingness to take risks as they participate in their sport. With research to suggest that having a winning record prompts athletes to repeat what has been successful in the past and to be inclined toward conventional play, the psychologist reasons that risk-taking will be lower in athletes with winning records than in those with losing records. Using the Demonstration Assessment for Ready Evaluation (DARE), the psychologist collects data from two teams of eight volleyball players, the first with a winning record, the second with a losing record. Note that the researcher’s hypothesis holds that risk-taking will be higher in the team with the losing record. As we noted ear- lier, making such a prediction has implications for the alternate hypothesis and the way the results are interpreted. Table 5.2 shows DARE scores for members of the two teams.
Table 5.2: DARE scores
Record Scores
Winning 37, 51, 57, 60, 66, 62, 68, 69
Losing 34, 37, 44, 47, 51, 54, 57, 59
Because the researcher predicts that risk-taking will be higher in the team with the losing record, if that team represents the μ2 population, the hypotheses for this problem are:
• H0: µ1 5 µ2 • HA: µ1 , µ2
The steps to completing the independent t test follow:
1. Calculate the means (M) and standard deviations (s) for each group. 2. Calculate the standard error of the mean (SEM) for each group. 3. Calculate the standard error of the difference, (SEM12M2). 4. Calculate the value of t. 5. Compare the value of t to the critical value of t.
For the independent t test, degrees of freedom are n1 1 n2 2 2, or the number in Group 1 plus the number in Group 2 minus 2. This calculation allows the degrees of freedom for the entire problem to reflect the degrees of freedom in each of the component groups.
The results follow. Note that the subscripts to M, s, and SEM indicate the group that the statistic represents: the team with the winning record (1) or the team with the losing record (2).
1. First, we calculate the mean and standard deviation of each sample:
M1 5 58.75, s1 5 10.634,
M2 5 47.875, and s2 5 9.109.
137
Section 5.4 The Independent t Test
2. Then using the standard deviation for each sample, we calculate the associated stan- dard error of the mean:
SEM1 5 s1 / √n1 5 10.634 / √8 5 3.760
SEM2 5 s2 / √n2 5 9.109 / √8 5 3.221
3. Knowing the standard error of the mean for each group allows us to estimate the standard error of the difference:
SEM12M2 5 Î (SEM1) 2 1 (SEM2)
2
5 Î 3.7602 1 3.2212
5 4.951
4. With that value, we can calculate the inde- pendent t:
t 5 M1 2 M2 SEM1 2 M2
5 58.75 2 47.875
4.951 5 2.197
determine the number of degrees of freedom:
df 5 8 1 8 2 2 5 14
and compare it to the critical value of t with 14 degrees of freedom:
t0.05(14) 5 1.761
The critical value of t is from the portion of the table for one-tailed tests because the predic- tion was that the winning team would have a significantly lower (rather than just a different) level of risk-taking than the losing team. The calculated t value exceeds the table value. Was the result statistically significant?
The researcher hypothesized that risk-taking would be lower among the more successful players (Group 1). A negative value of t would have reflected that result, but as it turned out, the winning group had the higher risk-taking scores. With a one-tailed test and the stated hypothesis, the entire 5% rejection region was in the negative tail of the t distribution.
No positive value of t, no matter how extreme, can be significant when a one-tailed test is based on
Try It!: #6 Because the risk-taking problem produced a result that was not in the tail predicted and so was not statistically significant, is it appropriate to just run the test again as a two-tailed test?
Hemera/ Thinkstock If a sports psychologist hypothesizes that winning and losing athletes have significantly different levels of risk-taking, what is the alternate hypothesis?
138
Section 5.4 The Independent t Test
the prediction of a negative t. This problem illustrates the gamble researchers take with one-tailed tests. If the result is in the direction opposite the predicted one, the magni- tude of the value is unimportant. The proper statistical decision is to fail to reject the null hypothesis.
Apply It! The Science of Athletic Performance
Sports science is a discipline that studies the application of scientific principles and tech- niques with the aim of improving sporting performance. A sports scientist is interested in determining if a certain training method will change the 100-yard-dash times of sprinters. Does training with a parachute on a runner’s back (a small one, opened up behind the run- ner, creating a drag) affect sprint times? The obvious conclusion would be that it improves times, but the scientist is concerned that per- haps it alters the way sprinters run in a way that leads to poorer running mechanics and slower times.
The sports scientist tests his theory on high school sprinters. No population data are avail- able, so the scientist employs an independent, two-tailed t test, with p 5 0.05. He uses a two- tailed test because his interest is only determining if this training has an effect. Whether the parachute can lower or raise the average sprint times is unknown.
Sixty high school sprinters are chosen at random from throughout the country. During the track season, 30 sprinters are randomly selected from the original 60 to incorporate para- chutes into their normal training (Group 1). The remaining 30 (Group 2) do not. The null hypothesis predicts that the mean of the population represented by the first group is the same as the mean of the population of the second group:
H0: µ1 5 µ2 At the end of the season, the researcher records the final 100-yard sprint times for each of the 30 runners in the two groups. The means, standard deviations, and standard errors of the mean for the two groups are listed in Table 5.3.
Table 5.3: Sprinter data for parachute experiment
Group 1 Group 2
Mean, M 12.07 seconds 12.21 seconds
Standard deviations, s 0.97 seconds 0.94 seconds
Standard errors of the mean, SE s1/√n1 5 0.97/√30 5 0.177 s2/√n2 5 0.94/√30 5 0.172
A. Green/Corbis
139
Section 5.4 The Independent t Test
Another Independent t Test Example A sociologist wants to know whether optimism about the future differs between those who are employed and those who are unemployed. Data are collected using the Upward Scientific Differential (the Upside) and are listed in Table 5.4.
Table 5.4: Upside scores for unemployed and employed
Employment status Scores
Unemployed 5, 7, 7, 8, 11, 14, 15
Employed 7, 10, 12, 15, 15, 16, 17
1. First, we calculate the mean and standard deviation of each sample:
M1 5 9.571, s1 5 3.823
M2 5 13.143, s2 5 3.625
2. Then, using the standard deviation for each sample, we calculate the associated standard error of the mean:
The two samples are independent, and participants in each group were randomly selected. As the standard errors of the mean reflect, the two samples have similar variability. This similar- ity is important because one of the assumptions for an independent t test is that both groups have similar variability. The measure of pooled variance is calculated first.
SEM12M2 5 Î (SEM1)2 1 (SEM2)2 5 Î 0.1772 1 0.1722 5 0.247
t 5 M1 2 M2 SEM12M2
5 12.07 2 12.21
0.247
5 20.567
The critical value for t0.05(29) 5 2.045 (two-tailed test).
In this instance, the difference is not statistically significant. The results indicate that training with parachutes had no significant effect on the 100-yard-dash times of high school sprinters. Note that because it was a two-tailed test (i.e., was there an effect?), the sign of t is incidental.
Apply It! boxes written by Shawn Murphy
140
Section 5.4 The Independent t Test
SEM1 5 s1 / √n1
5 3.823 / √7
5 1.445
SEM2 5 s2 / √n2
5 3.625 / √7
5 1.370
3. Knowing the standard error of the mean for each group allows us to estimate the standard error of the difference:
SEM12M2 5 Î (SEM1) 2 1 (SEM2)
2
5 Î 1.4452 1 1.3702
5 1.991
4. Using that value, we can calculate the independent t:
t 5 M1 2 M2 SEM1 2 M2
5 9.571 2 13.143
1.991
5 21.794
determine the number of degrees of freedom:
df 5 7 1 7 2 2 5 12
and compare it to the critical value of t with 12 degrees of freedom:
t0.05(12) 5 2.179 (two-tailed test)
A calculated t value of 21.794 is not a large enough dif- ference to be statistically significant. Once again, the proper decision is to fail to reject. In the distribution of difference scores on which the independent t test is based, this much difference between the two groups could occur because of sampling error, as the distribu- tion of differences in Figure 5.4 suggests. Note that the area for which we would reject the null hypothesis is outside the area from 22.179 to 12.179. Both groups probably belong to the same optimism population.
Try It!: #7 Because the question in the Upside exam- ple concerns whether optimism about the future differs between the two groups, is it a one-tailed or a two-tailed test?
141
–3 –2 –1 0 +1 +2 +3
-2.179 -1.794 +2.179
Section 5.4 The Independent t Test
Standard Error of the Difference for Unequal Samples Formula 5.4, to determine the standard error of the difference, is based on the assumption that the two samples have the same numbers of subjects. The fact that the formula treats the two standard errors of the mean values the same bears this out. When the sample sizes differ, the t test statistic looks the same, but the standard error of the difference has to accommodate the unequal sample sizes. Formula 5.4 must be adjusted for n values that are different in each group:
Formula 5.5
SEM1 2 M2 5 Ü B
(n1 2 1)s21 1 (n2 2 1)s22 (n1 1 n2 2 2)
R B 1 n1
1 1 n2 R
Although this formula looks complicated, it simply involves the number of scores in each group and the square of the standard deviation (or variance) for each group. Note that the adjustment for unequal sample sizes is made in both the first and second bracketed terms. Be careful with the order of mathematical operations; for example, remember to calculate within the parentheses first.
The steps for completing SEM1 2 M2 for unequal sample sizes follow:
1. Determine each group’s variance (s2). As the notation suggests, the variance is the square of the standard deviation. If your calculator lacks a standard deviation function and you determine the variance longhand, delete the final step, calculating the square root; the result is the variance. If your calculator does have a standard deviation function, square the result to find the variance. To calculate the variance in Excel, use the VAR command.
2. Take the Group 1 number minus 1 (n1 2 1) and multiply the result by the variance for Group 1 (s21).
3. Repeat Step 2 for Group 2. 4. Add the results of Steps 2 and 3 together, and divide their sum by the number in
the two groups minus 2 (n1 1 n2 2 2). Note that this is the degrees of freedom for the problem.
5. Add the result of 1 / n1 plus 1 / n2. 6. Multiply the result of Step 4 and the result of Step 5. 7. Find the square root of the result of Step 6.
Figure 5.4: A nonsignificant difference
–3 –2 –1 0 +1 +2 +3
-2.179 -1.794 +2.179
142
Section 5.4 The Independent t Test
The Independent t Test with Unequal Sample Sizes Now, return to the problem of the level of optimism among the employed versus the unem- ployed. Suppose that the unemployed person who scored 15 on the Upside is no longer will- ing to be involved in the study and insists that his data be deleted. The remaining data, in that case, are listed in Table 5.5.
Table 5.5: New set of Upside scores
Status Scores
Unemployed 5, 7, 7, 8, 11, 14
Employed 7, 10, 12, 15, 15, 16, 17
From these data, we find M1 5 8.667 and s1 5 3.266, which makes s21 5 10.667.
Group 2 stays unchanged: M2 5 13.143, s2 5 3.625, and s22 5 13.143. The standard error of the difference, then, is
SEM12M2 5 Ü B
(n1 2 1)s21 1 (n2 2 1)s22 (n1 1 n2 2 2)
R B 1 n1
1 1 n2 R
5 Ü
B (6 2 1)10.667 1 (7 2 1)13.143
(6 1 7 2 2) R B
1 6 1
1 7 R
5 1.929 and
t 5 M1 2 M2 SEM1 2 M2
5 6.667 2 13.143
1.929
5 22.321
The critical value for the two-tailed test is t0.05(11) 5 2.201.
Other things equal, eliminating a score reduces the potential for a significant result because the critical value—the value that calculated t must meet or exceed—becomes slightly higher with fewer scores. In this case, removing the score makes the result significant. Why? There are two reasons. First, in this particular problem, taking out the highest score in the lower- scoring group increases the difference between means; the M1 2 M2 difference becomes greater. Second, eliminating the highest score from the group also decreases the variability within the group. Less within-group variability results in a smaller value s21, which in turn, reduces the value of the standard error of the difference, SEM12M2. A smaller denomina- tor in the t ratio and a greater M1 2 M2 distance create a larger value of t. The increase is great enough that even with a larger critical value to meet—because df 5 11 rather than 12—the calculated t value exceeds the table value and is statistically significant at p 5 0.05.
143
Section 5.4 The Independent t Test
How much difference did using Formula 5.5 instead of 5.4 make? If SEM1 2 M2 had been cal- culated using the formula for equal sample sizes, what would its value have been for the following?
SEM1 5 s1 / √n1 5 3.266 / √6 5 1.333
SEM2 5 s2 / √n2 5 3.625 / √7 5 1.370—this is the same as before, of course.
SEM1 2 M2 5 Î (SEM1) 2 1 (SEM2)
2
5 Î 1.3332 1 1.3702 5 1.912
The adjustment is not very significant compared to the SEM1 2 M2 5 1.930 from the unequal sample sizes formula, but it is certainly more accurate. The difference is more apparent as sample sizes become more disparate.
Assumptions Associated with the Independent t Test Each statistical test is based on certain assumptions or conditions. Those associated with the independent t include the following:
1. The samples are independent. 2. The participants in each group are randomly selected. 3. The measures of the dependent variable must be at least interval scale. 4. The two samples have equivalent variability.
The chapter discussed the first three assumptions earlier. The technical word for the equiva- lent variability in the final assumption is homogeneity of variance. The term does not refer to exact equality of variability which, of course, will rarely be the case. Homogeneity in this case means “similar” equality. Although there are tests to indicate the point at which homo- geneity is violated, Excel does not provide one. We will say that when measures of variability (R, s, or s2) are quite different from one group to the other, a researcher should not assume homogeneity of variance. In that case, when completing a problem with Excel, the researcher makes an adjustment.
Although Conditions 1 and 3 are fairly strict requirements, the t test is robust in the face of violations to Conditions 2 and 4.
Running an Independent t Test Using Excel An expert is interested in analyzing the effect of a mild stimulant on participants’ recall abil- ity. The expert randomly selects 20 volunteers from a larger group and randomly assigned them to groups. Those in the first group receive a placebo; those in the second group receive a mild stimulant. Members of both groups are given a complex scenario from which they are required to retrieve details. The number of details that the volunteers in each group can correctly retrieve is listed in Table 5.6:
144
Section 5.4 The Independent t Test
Table 5.6: Recall ability experimental data
Group Scores
Placebo 2, 5, 2, 4, 7, 1, 2, 3, 4, 5
Stimulant 6, 6, 7, 10, 12, 9, 6, 5, 5, 7
Are the retrieval differences statistically significant? We will use Excel to answer the question:
1. Begin a data set in Excel by naming the variables “placebo” (cell A1) and “stimulant” (cell B1).
2. Enter the scores in their respective columns. 3. Click the Data tab, and then click Data Analysis. Because the ranges for the scores in the
two samples are fairly similar, 5 and 7 respectively, we can safely assume equal variances. 4. Select t Test: Two-Sample Assuming Equal Variances. 5. Click OK. 6. Excel names the Group 1 data that we put in column A “variable 1.” Indicate that the
range for the data in this group is A2:A11. Indicate that the range for “variable 2,” the data for Group 2, is B2:B11.
7. Indicate that the “hypothesized mean difference” is 0 by entering 0 in the box. This reflects the null hypothesis, µ1 5 µ2.
8. Select a range for the output—say, C15—and click OK. Expand column C so that all the output shows results in Figure 5.5.
Figure 5.5: The independent t test in Excel
Source: Microsoft Excel. Used with permission from Microsoft.
145
Section 5.5 The Confidence Interval of the Difference
The output provides descriptive statistics, including the means and variances for each group. Rather than indicate whether the result is or is not statistically significant, Excel calculates the probability that this value of the calculated value of t could have occurred by chance in the same population. The program produces results for both one-tailed and two-tailed tests.
Whenever p 5 0.05 or less, the result is statistically significant. The lower the p value in the result, the less likely it is that the difference between the groups occurred by chance. The data indicate that both the one-tailed and two-tailed tests have p values lower than 0.05. Either test would produce a statistically significant outcome.
5.5 The Confidence Interval of the Difference When an independent t test result is statistically significant, it means that the two samples probably represent different populations, or, to put it another way, they belong to popula- tions with different values for their means (µ1 2 µ2 ? 0). In such situations, the means of the samples are estimates of their respective population means. The confidence interval, called the confidence interval of the difference for the independent t test, allows us to estimate the difference between the means of those populations. The confidence interval of the differ- ence uses this formula:
Formula 5.6
CI0.95 5 6t (SEM1 2 M2) 1 |M1 2 M2|
where
CI0.95 5 a 0.95 confidence interval is used when the t test is conducted at p 5 0.05. It indicates that there is a 1 2 0.05 5 0.95 probability of capturing the true value of the difference between the means of the populations represented by the samples.
The 6t means that the table value is used twice, once as a positive value times the standard error of the difference and a second time as a negative value.
The value of t used is from the t table (rather than the calculated t value), given the df for the problem and the p value at which the test was conducted. The t value for df 5 18—the Excel problem—and p 5 0.05 is 2.101.
SEM1 2 M2 5 the standard error of the difference, the denominator in the t ratio.
|M1 – M2|. The vertical brackets around the difference between the means indicate the absolute (non-negative) value of the difference between the sample means.
A confidence interval for the Excel problem begins with calculating SEM1 2 M2 since Excel does not show it with the other results:
1. s21 5 3.389 and s 5 √s2, s 5 √3.389 5 1.841
s22 5 5.344 and s 5 √s 2, s 5 √5.344 5 2.312
SEM1 5 1.841 / √10 5 0.582
146
Section 5.6 Determining a Result’s Practical Significance
SEM2 5 2.312 / √10 5 0.731
SEM1 2 M2 5 Î (SEM1) 2 1 (SEM2)
2 5 Î 5.822 1 0.7312 5 0.934 2. Now, the confidence interval of the difference:
CI0.95 5 6t (SEM1 2 M2) 1 |M1 2 M2|
CI0.95 5 12.101(0.934) 1 (3.8) and 22.101(0.934) 1 (3.8)
CI0.95 5 5.762, 1.838
The absolute value of the M1 2 M2 difference is just the difference between the means, a value that is never negative.
The 6t (SEM1 2 M2) results, added to the difference between the means, provide the range of values within which the difference between the related population means will occur 95% of the time. Stated formally, the result is: With 0.95 confidence, the difference in the population means represented by these samples is somewhere between 1.838 and 5.762.
The statistically significant t test value indicated that the two samples probably belong to distinct populations. The confidence interval of the difference estimates how much difference is between those respective population means.
The size of the confidence interval depends upon three things:
1. the difference between the sample means 2. the critical value of t (which depends upon degrees of freedom and therefore sample
size) 3. the size of the standard error of the difference for the problem
5.6 Determining a Result’s Practical Significance Results that are statistically significant are not always important. Remember that significance indicates that a result is not likely to have occurred by chance, a non-random—but not nec- essarily important—occurrence. How do we know if an outcome has practical relevance? Researchers calculate effect sizes to gauge how important the outcome is.
When a result is significant, calculate an effect size to indicate the outcome’s importance. One very useful effect-size calculation for the independent t test is omega-squared (ω2). For the recall ability (Excel) problem in Figure 5.5, omega squared is
Formula 5.7
ω2 5 t2 2 1/(t2 1 n1 1 n2 2 1)
147
Section 5.7 Writing Up Statistics
Where,
t 5 the calculated value of t from the problem 24.066
n1 5 the number in the first group, 10
n2 5 the number in the first group, 10
ω2 5 t2 2 1 / (t2 1 n1 1 n2 2 1)
5 (24.066) 2 2 1 / [(24.066)2] 1 10 1 10 2 1
5 15.532 / 35.532 5 0.437
In Excel, the t test indicated that subjects’ use of a stimulant versus a placebo had a statis- tically significant effect on recall ability, so we know that the difference between the two groups is probably not random. The omega-squared value indicates how much of the differ- ence between the two groups’ recall ability can be attributed to the treatment.
The ω2 5 0.437 result suggests that about 44% of the variance between these two groups can be explained by whether the subjects used a stimulant or a placebo. With nearly half of the variance between groups explained by the treatment, researchers would judge this an important result. Assume for the moment that this had been a one-tailed test because the researcher asked, “Will those receiving a mild stimulant retrieve significantly more items than those receiving the placebo?” Further, assume that the calculated value of t 5 1.734, which would be a statistically significant value for p 5 0.05 and 18 degrees of freedom. The corresponding value of ω2 5 1.7342 2 1/(1,7342 2 10 1 10 2 1) 5 0.091. While that result is statistically significant (not random), we might question its practi- cal importance. The use of the stimulant versus the placebo explains just 9% of the vari- ance. The other 91% of the variance—the difference between the two groups—is unexplained.
Returning to the original Excel problem, we might wonder about the unexplained 56% of the variance. Any study will have differences between the groups that are unrelated to the treatment. Even randomly selected groups are unlikely to have identical levels of recall abil- ity to begin with, so some of the difference between the groups pre-existed the treatment. Omega squared tells us how much of the difference we can attribute to the treatment. In this case, about 44% of the difference results from the treatment, which—if the data were not contrived to begin with—would be noteworthy. In situations where t is not significant, there is no need to calculate an effect size, because any difference between means is attributed to sampling error rather than the treatment.
5.7 Writing Up Statistics The independent t test is often used in social and behavioral science research. The partic- ular language researchers use varies, but the question is whether two groups that initially
148
Section 5.8 Statistical Tests, Variables, and Research Design
represented the same population differ sufficiently as a result of the independent variable that after the treatment, they represent different populations.
Interested in the organization of sensorimotor cortices, Longo, Long, and Haggard (2012) studied the tendency of people with a missing limb to have vivid experiences of the absent limb and to consistently overestimate its size. With the presence or absence of the limb as the independent variable, they used t test to compare the size of the remaining limb, a hand and fingers, for example, to the size of the missing limb. Whether the limb was absent at birth or lost sometime later, t test results showed subjects consistently overestimated the length of fingers, and underestimated the distance between knuckles.
Bear and Babcock (2012), meanwhile, explored gender differences in negotiation. Using gen- der as the independent variable, they studied the success of women versus men in different types of negotiations and found that when subjects viewed the topic as masculine, men out- performed women. When subjects viewed the negotiated topic as feminine, no significant differences resulted.
5.8 Statistical Tests, Variables, and Research Design In any t test, the question that drives the analysis is whether the independent variable (IV) has prompted sufficient change in the groups that they no longer represent a common pop- ulation. In such questions, the t test is the analytical component of a research design, a formal plan for testing whether, in the case of the t test, two samples remain elements of the same population. Recall from Chapter 1 that a research design is something of a “blue- print” for the researchers to follow as they complete their study. Different research designs pursue different kinds of research questions, but when researchers want to know whether an independent variable has had so much impact that two samples no longer belong to the same parent population, they often use a t test analysis.
The IV is the treatment used in a research problem. Because different groups receive different levels of the treatment, the IV defines the groups involved. In the recall-ability example, the IV was whether participants were administered the stimulant or the placebo. The dependent variable (DV) is the measure that is the subject of analysis. In the recall example, the DV was the subject’s score, recall ability, which represented the number of items each participant remembered.
In an independent t test, the IV always involves a nominal scale variable. The DV is interval or ratio scale. The following questions reflect these requirements; each could be answered with an independent t test:
• Do clinically depressed people (IV) intake more caffeine (DV) than nonclinically depressed people?
• Does verbal praise (IV) prompt increased levels of verbal interaction (DV) among seminar participants?
• Do prison psychologists and school psychologists (occupation, IV) differ in salary (DV)?
149
Summary and Resources
• Do those who do or do not receive flu shots (IV) miss significantly different numbers of work days (DV)?
• Are verbal ability scores (DV) different for social sci- ence majors than for communications majors (major, IV)?
The way the groups are created also relates to research design. Although we commonly hear references to “an experiment,” truly experimental research involves the random selection of participants and also their random assignment to groups. Although participants can be randomly selected, clinical depression cannot be randomly assigned, so the first question in the list above could never be a true experiment. The second and third questions in the list could be experimental designs if a group of participants is randomly selected and then half are randomly assigned to either treatment. The last question, like the first, belongs to designs that are nonexperimental because the independent variable cannot be assigned, even if those in the two groups were randomly selected.
Although the independent t test allows just one IV at a time, later we will encounter procedures that allow for multiple IVs. A quasi-experimental design is one for which one of the IVs is assigned but another cannot be. If we are asking whether female and male students differed in recall ability after receiv- ing either a stimulant or a placebo, the experiment would be quasi-experimental. Although the stimulant can be randomly assigned, gender is beyond the researcher’s control.
Summary and Resources
Chapter Summary Gosset’s tests help to liberate the researcher. Using the one-sample test, researchers can compare samples to populations without needing to determine σM, a value which is often not available (Objective 1). The independent t test requires none of the population values and allows the researcher to examine two independent samples for significant differences. The independent t is widely used in research (Objectives 2 and 7).
In hypothesis testing, two predictions are relevant to the t tests. In the case of the one- sample test, either the sample represents the population to which it is compared, or it does not. In the case of the independent t, either both samples belong to populations with the same mean, or they do not. The hypotheses simplify the way we report statistical results, although the use of one-tailed versus two-tailed tests adds another element. Recall that one-tailed tests provide an alternate hypothesis that is directional. It predicts how the mean of the population represented by the first group will differ from the mean of the population represented by the second, instead of simply predicting a difference.
monkeybusinessimages/ iStock / Thinkstock Experimental research is research in which members of the groups in the study are randomly selected and the treatment is randomly assigned. For example, experimental research would study the correlation between flu shots and workdays missed by randomly selecting a group of participants and randomly assigning half to receive flu shots.
150
Summary and Resources
Be careful not to confuse the type of t test with the type of hypothesis. The one-sample t test compares the sample to the population. The one-tailed t test makes a prediction about how the first group differs from the second (Objectives 3 and 4).
When an independent t test result is significant, it indicates that the samples probably represent populations with different means. The confidence interval of the difference esti- mates the difference between the means of those two populations. As such it is an “interval estimate” to contrast with the “point estimate” that the sample means in the t test provide (Objective 6).
Effect-size calculations (and other procedures you will encounter later in the book) keep the researcher grounded with the reminder that a significant result only indicates that the result is not likely to be a random occurrence. Omega-squared (ω2) estimates how much an independent variable affects a dependent variable (Objective 5).
As Gosset’s tests liberated us from the need for parameters, Fisher’s analysis of variance (ANOVA) test is going to relieve us of the restriction to two groups. Chapter 6 discusses anal- ysis of variance. It is an important development in statistical analysis, but the logic involved is very consistent with t test.
alternate hypothesis The hypothesis that predicts that two samples belong to popula- tions with distinct means.
confidence interval of the difference Esti- mates, with a specified level of probability, the difference between the means of the two populations represented by the samples in the t-test.
critical value A value from a table that indicates the point at which a calculated test result is statistically significant.
distribution of difference scores The theoretical population upon which the inde- pendent t test is based.
effect sizes Indicators of the practical importance of a statistically significant outcome.
experimental research A type of research in which the group members in a study are randomly selected and the treatment is ran- domly assigned.
homogeneity of variance The variance that occurs when multiple groups manifest similar variability.
independent t test A test to determine whether two samples likely belong to popu- lations with identical means.
nonexperimental research A type of research which does not perform random selection or random assignment.
null hypothesis The hypothesis that pre- dicts that two samples belong to populations with identical means for the independent t test.
omega-squared A procedure for determin- ing how much of the difference between groups can be attributed to the independent variable.
one-sample t test A test of whether a sample is likely to belong to a specified population.
Key Terms
151
Summary and Resources
Review Questions Answers to the odd-numbered questions are provided in Appendix A.
1. A group of clients have anxiety scores as follows:
67, 55, 88, 74, 69, 81, 72, 70
a. What is the value of SEM? b. Is this group representative of the population of all who are treated for anxiety
disorders for whom µ 5 66.0?
2. Applicants to a graduate program in psychology have the following scores on the Graduate Record Exam, Quantitative portion:
375, 400, 425, 425, 490, 500, 510, 530
Do they represent the national population for whom µ 5 500?
3. A second therapist working with those who have anxiety-related disorders has clients with the following scores:
52, 58, 64, 67, 67, 69, 70, 71
Is this group significantly different from the group in Review Question 1?
4. The test statistics for the one-sample t test and the independent t test both include measures of within-group data variability. What are they?
5. A researcher conducts an independent t test using the 0.05 criterion for significance.
a. If H0 is rejected, what is the probability of alpha error? Of beta error? b. What is the alternate hypothesis if the question is whether there is a significant
difference? What if the question is whether the first group is significantly lower than the second group?
one-tailed test When a t test is based on an initial hypothesis predicting the direction of the difference between population means, μ1 > μ2, for example, the test is a one-tailed test; a significant result can occur in only one tail of the distribution.
quasi-experimental research Research in which some variables are randomly assigned, some are not.
standard error of the differences The measure of variability in the distribution of difference scores, symbolized SEM1 2 M2.
two-tailed test When a t test is based on the assumption that the sample comes from a population that has either a significantly larger or smaller value than the population to which it is compared, it is a two-tailed test; a significant result can occur in either tail of the distribution.
152
Summary and Resources
6. A researcher is interested in whether significantly more questions are asked at a town hall meeting when the politician provides verbal reinforcement for each question. The politician responds “thank you” each time someone in group 1 asks a question. The group 2 subjects do not receive any response from the politician. The number of questions group participants asked in six 30-minute sessions are as follows:
Group 1: 13, 15, 12, 17, 14, 14 Group 2: 10, 12, 12, 11, 13, 9
a. Is the difference significant? b. Write out the alternate hypothesis for this problem. c. What will omega squared indicate? What is its value?
7. A researcher wants to analyze differences in patients’ attitudes about the care they receive when they receive a rebate on their bill. One group receives a rebate; the second does not.
a. What are the IV and DV? b. The IV data in an independent t must have what scale? c. The DV data in an independent t must have what scale?
8. What is the relationship between sample size and the likelihood of a significant finding?
9. A social worker randomly selects 20 cases from among the unemployed and assigns half to each of two groups. The first group is a control group and receives no special treatment. Those in the second group receive a weekly newsletter about job oppor- tunities. During the study, one of the people in group 2 disappears. The data indicate the number of weeks unemployed. The question is whether group 2 people are unemployed for less time than the group 1 people:
Group 1: 10, 12, 14, 14, 15, 15, 15, 16, 16, 18 Group 2: 6, 9, 9, 10, 12, 12, 12, 13, 14
a. Is the test one-tailed or two-tailed? b. What is the alternate hypothesis? c. Is t significant? If so, with 0.95 confidence, what is the interval between the
means of the populations represented by these samples?
10. Twenty prison inmates who volunteer to participate in a study are given scholastic aptitude tests and found to have M 5 44.552, s 5 6.577. A group of 20 non-offending adults randomly selected from a local mall is given the same test and has M 5 49.979 and s 5 5.101. What is a 0.95 confidence interval of the difference between the means of the populations represented by these two groups?
153
Summary and Resources
Answers to Try It! Questions
1. The critical values for t are highest in distributions with the fewest degrees of free- dom. This adjusts for the fact that small distributions are less likely to emulate the characteristics of a normal population.
2. If degrees of freedom for a one-sample t test are 14, n 5 15, because df 5 n 21.
3. As long as the question is whether the two groups are significantly different, the test is two-tailed.
4. The alternate hypothesis would have been H0: µ1 ? µ2 and the decision would have been to fail to reject H0. With a critical value of 2.262 in the two-tailed test, the dif- ference would not have been statistically significant.
5. Yes, it would have changed the decision. Because the original problem was a one-tailed test (remember that the nurses’ assistants were predicted to have more compassion than health professionals generally), there is no rejection region in the other tail of the distribution. The entire rejection region is in the positive tail.
6. The answer is no. Once an analysis has been completed it is unethical to, in effect, change the hypotheses after the fact. It is entirely appropriate, however, to gather new data, frame new hypotheses, and repeat the study. In fact this type of replication is very important to developing a body of research.
7. It is a two-tailed test. The word “differs” allows for a difference in either direction.