Chi-s
Zoonar/Thinkstock
chapter 5
The t-Test
Learning Objectives
After reading this chapter, you will be able to. . .
1. explain the advantage of the one-sample t-test over the z-test.
2. compare the one-sample t-test to the independent t-test.
3. distinguish between one-sample and one-tailed t-tests.
4. explain hypothesis testing in statistical analysis.
5. determine practical significance.
6. construct a confidence interval for the difference between the means.
7. discuss research applications for the t-tests.
8. present results of a t-test and draw conclusions based on hypotheses.
9. interpret t-test results and report in APA format.
10. discuss nonparametric Mann-Whitney U-test compared to the t-test.
CN
CO_LO
CO_TX
CO_NL
CT
CO_CRD
suk85842_05_c05.indd 137 10/23/13 1:17 PM
CHAPTER 5Section 5.1 Estimating the Standard Error of the Mean
The z-test (Chapter 4) is more than just an expansion of the z score to groups because it introduces statistical significance. Those who work with quantitative data need to be able to distinguish between outcomes that probably occurred by chance and those that are likely to emerge each time the data is gathered and analyzed. When data indicates that a group of clients, each grieving over the loss of a loved one, becomes more positive and peaceful with time, the therapist needs to know whether this would have happened anyway with the passage of time, or whether it has something to do with the treatment the therapist provided. The z-test answers such questions.
The z-test has important limitations. The greatest difficulty is the need to have a value for the population standard error of the mean, sM. It is not the type of information that tends to come up as a matter of course, and it can be fairly difficult to calculate when the researcher does not have access to population data.
The second limitation is that the z-test allows only one type of comparison: a sample to a population. What if two therapists want to compare their respective groups of grieving cli- ents to see if one type of therapy for grief counseling is better than the other? The z-test pro- vides nowhere to go with such questions, which is where William Sealy Gosset comes in.
Gosset worked for Guinness Brewing during the early part of the 20th century. Part of his responsibility was quality control, and he studied ways to make sure that day-to- day brewing remained consistent with Guinness’s standards. A man of remarkable ability, Gosset devised procedures for quantifying product quality and then testing the consis- tency of the quality over time. To this end, he developed t-tests.
Gosset had some sense of how important the t-tests were and wanted to publish so that others could benefit. The roadblock was a no-publishing policy at Guinness, a policy that was begun after one of Gosset’s predecessors published what the Guinness people consid- ered trade secrets. Although he felt that Guinness would not be compromised by the pub- lication of information about t-tests, Gosset published under the pseudonym “Student.” In traditional statistics textbooks, there are still references to “Student’s t.”
The pseudonym suggests Gosset’s unpretentious nature, or at least his lack of ego. In the course of his career as a statistician, he became associated with two other giants in the field, Karl Pearson and Ronald Fisher. They were men of substantial ego, but Gosset main- tained good relations with both, even while they came to despise each other.
5.1 Estimating the Standard Error of the Mean
The first problem Gosset solved was how to work around the need for the population standard error of the mean, sM, that the z-test requires. The z-test statistic requires two parameter (population) values: the mean (mM) and the standard error of the mean (sM).
z 5 M 2 mM
sM
If the value of sM is not published, it can be difficult to determine. Although it can be cal- culated by dividing the population standard deviation by the square root of the number (sM 5 s/"n ), that is small consolation if the population standard deviation is also not available.
H1
TX_DC
BLF
TX
BL
BLL
suk85842_05_c05.indd 138 10/23/13 1:17 PM
CHAPTER 5Section 5.2 The One-Sample t-Test
Gosset’s solution was to use sample values (statistics) as estimates of the population val- ues (parameters). When samples are large and randomly selected, the sample mean (M) can estimate the value of m, and the sample standard deviation (s) can estimate s.
What if the sample is not large enough to emulate the population? When the sample is not like the population, the sampling error results in statistics that are not good estimates of parameter values. Gosset’s ingenious solution was to build in a correction for sam- pling error that conforms to the likelihood of making such errors. Because small samples have the greatest risk of sampling error, the smallest samples also require the biggest corrections.
To determine the standard error of the mean, Gosset used the sample standard deviation (s) as an estimate of the population standard deviation (s). So instead of sM 5 s/"n , he estimated the standard error of the mean this way:
SEM 5 s
"n Formula 5.1
Where
SEM 5 the estimated standard error of the mean
s 5 the standard deviation of the sample
n 5 the number in the sample
5.2 The One-Sample t-Test
The one-sample t is a test of whether a sample is likely to belong to a specified popula-tion. Using the estimated standard error of the mean (SEM) in the place of sM makes the test statistic for the one-sample t-test as follows:
t 5 M 2 mM
SEM Formula 5.2
Where
t 5 the calculated value of t
M 5 the sample mean
mM 5 the mean of the distribution of sample means
SEM 5 the estimated standard error of the mean
The numerator of the one-sample t-test is the same as the numerator for the z-test because both tests answer the same question: Is the sample characteristic of the population to which it is compared? With the t-test, we are freed from the need to have the population standard error of the mean. However, choosing to rely on statistics from samples instead of population values comes with some risk, the sampling error discussed in Chapter 4.
One important difference between z and the t-tests is that z involves just one distribution, the standard normal distribution, and its characteristics are well-known (Table A in the
suk85842_05_c05.indd 139 10/23/13 1:17 PM
CHAPTER 5Section 5.2 The One-Sample t-Test
Critical Values Tables Appendix). Because there is just the one z distribution, one value of z (1.96) is always the point at which a result is statistically significant. But there are many t distributions, and each one has slightly different characteristics. Consequently, the value that indicates statistical significance in a t-test depends on the particular distribution, and the higher the sample size the closer the t distribution is to a z distribution as seen in Fig- ure 5.1.
Figure 5.1: Comparison of t distributions to a z distribution
Recall that degrees of freedom (df ), discussed in Chapter 1 (Section 1.5), are the number of scores free to vary when the final value of a characteristic is known. The value that indi- cates significance in each t distribution is indexed to the particular distribution’s degrees of freedom. For the one-sample t-test, degrees of freedom are the number of scores in the sample, minus one, n 2 1.
• If the sample size (n) 5 10, then df 5 9. • If n 5 30, df 5 29, and so on.
Each time the sample size changes, the degrees of freedom change, and it is a different t distribution with a different critical value to indicate significance. The risk that a sample will not reflect the population accurately is greatest with the smallest samples, so to pro- tect against the errors associated with small samples, that is where the values that indicate significance are largest. As df increase, the critical values decline until ultimately the criti- cal values for t match the critical value for z. For the critical value of t to equal that of z, the sample must reach an infinite size; this is not a practical possibility, but the point still has value.
Calculating the One-Sample t
The steps for completing the one-sample t follow:
1. Calculate the sample mean (Formula 1.1) and the sample standard deviation (Formula 1.3).
2. Calculate the estimated standard error of the mean (Formula 5.1). 3. Calculate the value of t (Formula 5.2). 4. Compare the calculated t to the critical value for t according to the appropriate df.
Standard Normal z-Distribution
t-Distribution (n close to 30)
t-Distribution (n < 30)
= 0
suk85842_05_c05.indd 140 10/23/13 1:17 PM
CHAPTER 5Section 5.2 The One-Sample t-Test
A psychologist working for the police department in a major city is interested in the level of job stress among law enforcement officers. The question is whether job stress is signifi- cantly different among police officers than it is in the general population, where stress has a mean of 27.353. The tool test initiative (the “too-tite,” for short) is administered to 10 police officers at random. Their too-tite scores are as follows:
22, 26, 29, 29, 33, 35, 36, 38, 40, 42
1. Verify that for this sample, the mean (M) 5 33.0, and the sample standard devia- tion (s) 5 6.412.
Note that because m 5 27.353, mM 5 27.35. Remember that the mean of a population based on individual values equals the mean of the distribution of sample means, or m 5 mM (Chapter 4).
2. Because the estimated standard error of the mean (SEM) 5 s/"n (Formula 5.1), SEM 5 6.412/"10 5 2.03.
3. t 5 M 2 mM
SEM 5
33.0 2 27.353 2.028
5 2.785 .
4. To compare the calculated t to the critical value for t, consult Table 5.1, which appears again as Table B in the Appendix. For this example, we are interested in the columns under “Two-Tailed Tests.”
Table 5.1: t Distribution critical values
df Two-Tailed Tests One-Tailed Tests
p 5 .05 p 5 .01 p 5 .05 p 5 .01
1 12.706 63.657 6.314 31.821
2 4.303 9.925 2.920 6.965
3 3.182 5.841 2.353 4.541
4 2.776 4.604 2.132 3.747
5 2.571 4.032 2.015 3.365
6 2.447 3.707 1.943 3.143
7 2.365 3.499 1.895 2.998
8 2.306 3.355 1.860 2.896
9 2.262 3.250 1.833 2.821
10 2.228 3.169 1.812 2.764
11 2.201 3.106 1.796 2.718
(continued)
suk85842_05_c05.indd 141 10/23/13 1:17 PM
CHAPTER 5Section 5.2 The One-Sample t-Test
Table 5.1: t Distribution critical values (continued)
df Two-Tailed Tests One-Tailed Tests
p 5 .05 p 5 .01 p 5 .05 p 5 .01
12 2.179 3.055 1.782 2.681
13 2.160 3.012 1.771 2.650
14 2.145 2.977 1.761 2.624
15 2.131 2.947 1.753 2.602
16 2.120 2.921 1.746 2.583
17 2.110 2.898 1.740 2.567
18 2.101 2.878 1.734 2.552
19 2.093 2.861 1.729 2.539
20 2.086 2.845 1.725 2.528
21 2.080 2.831 1.721 2.518
22 2.074 2.819 1.717 2.508
23 2.069 2.807 1.714 2.500
24 2.064 2.797 1.711 2.492
25 2.060 2.787 1.708 2.485
26 2.056 2.779 1.706 2.479
27 2.052 2.771 1.703 2.473
28 2.048 2.763 1.701 2.467
29 2.045 2.756 1.699 2.462
30 2.032 2.750 1.697 2.457
∞ 1.96 2.576 1.645 2.326
Table adapted from SHAZAM. (2011). Critical values of the t distribution. Retrieved from: shazam.econ.ubc.ca/intro/critval.htm
• The first column in the table is for degrees of freedom. In our example, n 5 10, so df 5 9.
• The second column has the critical values of t when the test is conducted at p 5 .05. The p value indicates the probability that a significant finding will result in a type I error. The probability of a type I or alpha (a) error is always the same as the probability level (p) at which the test is conducted. This is the traditional default alpha level for statistical significance.
• The third column has the critical values of t when testing at p 5 .01.
suk85842_05_c05.indd 142 10/23/13 1:17 PM
CHAPTER 5Section 5.2 The One-Sample t-Test
For p 5 .05, the critical value for df 5 9 is 2.262. In order not to confuse the critical value of t with the calculated value from the t-test (2.785), it is helpful to write the critical value and note its df and the level of probability for the test as follows:
tcrit .05(9) 5 2.262 is less than tobs .05(9) 5 2.785
The subscripts to t indicate that the test was conducted at p 5 .05 with 9 df. Based on this information, the police psychologist concluded that the mean job stress level among law enforcement officers for the police department in a major city (M 5 33) is significantly higher (t(9) 5 2.785, p , .05) than the general population job stress level (M 5 27.353).
Interpreting t-Test Results
If these were z-test results, simply comparing the calculated value to 1.96 would indicate whether they were significant. Determining significance for the t-test requires consult- ing the table since the value that indicates significance, the critical value, depends on the degrees of freedom for the particular problem.
In the example just completed, t 5 2.785. From the table of critical values, t.05(9) 5 2.262. When the calculated value is equal to or larger than the critical value, the result is statistically sig- nificant. This is illustrated in Figure 5.2, where the plus and minus 1.96 values associated with statistical significance in a z-test are replaced with the plus and minus 2.262 for a t-test with 9 degrees of freedom. The concept is the same as for the z-test: The critical value indicates the point at which the calculated value is unlikely to have occurred by chance. The difference is that the value changes for each t distribution, and it is determined by degrees of freedom.
Is job stress significantly different between law enforcement officers and the general pub- lic? According to this data, it is. Law enforcement officers are characteristic of some popu- lation with higher job stress than what is present in the general public.
Another Example
Consider the following research on stress. The psychologist decides to compare second- ary school teachers’ levels of stress to the job stress of the general public. For 12 randomly selected classroom teachers, the stress scores are as follows:
19, 21, 22, 25, 27, 27, 32, 33, 35, 35, 36, 37
1. Verify that M 5 29.083 and s 5 6.374; mM is still 27.35.
2. SEM 5 s/"n 5 6.374/"12 5 1.84
3. t 5 M 2 µM
SEM 5
29.083 2 27.353 1.840
5 0.94
4. t.05(11) 5 2.20
A In which distributions are the critical values for t values highest?
Try It!
suk85842_05_c05.indd 143 10/23/13 1:17 PM
CHAPTER 5Section 5.2 The One-Sample t-Test
Figure 5.2: Statistical significance in a t-test with 9 degrees of freedom
Because the sample size—and therefore the df—have changed, the critical value is dif- ferent for this problem than it was for the example about stress among police officers. At p 5 .05 level and df 5 11, the difference between the secondary school teachers and the general population is not statistically significant; the calculated value of t is less than the critical value from the table. Although as sample size (and so df) increase, the critical value declines, t 5 0.940 is never going to be statistically significant, as scanning down the table of critical values will verify. At p 5 .05, there is no critical value as small as .940, and the critical values at p 5 .01 values are larger yet. Even with a sample of infinite size, the low- est critical value when testing at p 5 .05 is 1.96, the same as the critical value for the z-test.
The One-Tailed Test
The z and t significance tests to this point have all been two-tailed tests. The two tails refer to the fact that a significant z-test or t-test result can occur in either extreme of the distribu- tion. In a two-tailed test, it does not matter whether the values of z or t are positive (in the
upper tail of the distribution) or negative (in the lower tail of the dis- tribution). The result is significant as long as the absolute calculated value, the value without regard to the sign, is as large as or larger than the critical value.
In the first t-test example, if law enforcement officers’ too-tite scores had been the same distance below the mean for the population as they were above, that is, if M 5 21.706 instead of 33.0 and the SEM had remained the same at 2.028, the t value would have been
t 5 M 2 µM
SEM 5
21.706 2 27.353 2.028
5 22.785
The negative t value would have been significant in a two-tailed test because it is signifi- cantly different from the critical value. The sign of t does not matter. Sometimes, rather than significantly different, which can be either a positive or a negative difference, the
+2.262�2.262
B If the degrees of freedom for a one-sample t-test (df ) 5 14 for a problem, what must the sample size be?
Try It!
suk85842_05_c05.indd 144 10/23/13 1:17 PM
CHAPTER 5Section 5.2 The One-Sample t-Test
issue is whether a sample mean is significantly greater or less than the population mean. The police psychologist might have framed the question this way:
Are police officers significantly more stressed than the general population?
With that wording, the analysis becomes a one-tailed test. Language such as more stressed or less stressed asks for a prediction about how the sample is expected to differ from the population rather than just asking whether the sample may be different from the population. It specifies the direction of the difference (more or less) and so the specific tail in which the difference will occur, rather than allowing for a dif- ference in either tail.
Apply It!
Testing a Pilot Program
A troubled middle school in San Francisco has turned to a transcendental meditation program known as “quiet time” to relieve high stress, increase test scores, and improve behavior among participating students. The principal
of the middle school wants to determine whether the quiet time is producing results and decides to compare grade point averages (GPAs) between the quiet time students and the other students at the school.
The question is whether GPAs are significantly higher among the meditating students than they are among the student population in general. Prior research at other schools has indi- cated that GPAs will improve; therefore, the principal has specified the direction of the difference by asking if the GPAs will be higher. This wording indicates a one-tailed t-test. Because of widespread skepticism about the program from school board members, the principal chooses a conservative p 5 .01 criterion for statistical significance.
During the 2010 school year, 10% of the student population was chosen at random to par- ticipate in the quiet time program. For that year, the grade point average for the student population as a whole was 2.53. The GPAs for 12 randomly selected students who had been involved in the quiet time program were as follows:
2.49, 2.85, 3.18, 2.82, 2.89, 2.50, 3.26, 3.20, 2.48, 2.93, 2.67, 2.47
For these 12 students, M 5 2.812 and s 5 0.295.
SEM 5 s/"n 5 0.295/"12 5 0.085 (continued)
You can find many Internet applets to help calculate significance. Buckingham Browne & Nichols School provides one such resource you can explore. Follow the link provided here to learn more.
http://www.bbn-school .org/us/math/ap_stats /applets/applets.html #anchor4673498
Try It!
suk85842_05_c05.indd 145 10/23/13 1:17 PM
CHAPTER 5Section 5.2 The One-Sample t-Test
The “Statistically Significant” Region in the One-Tailed Test
Why is there so much emphasis on the wording? What difference does it make? The way the question is framed determines whether both tails of the t-distribution are examined for a significant difference, or just one.
For tests where p 5 .05, the area where a result is statistically significant is in the most extreme 5% of the distribution.
• In two-tailed tests, the difference can be in either direction, with the resulting value being positive or negative. That is, the most extreme 5% of the distribution is divided into
the lowest 2.5% of the left tail of the distribution
and the highest 2.5% of the right tail.
Distribution A in Figure 5.3 indicates a two-tailed test.
• When prior research indicates what to expect in an analysis, sometimes a one- tailed test is adopted.
In this instance, the entire 5% region in which a result is statistically signifi- cant is placed in one tail.
Apply It! (continued)
Note that because m 5 2.530, mM 5 2.530.
t 5 M 2 µM SEM
5 2.812 2 2.530
0.085 5 3.318
To compare the calculated t to the critical value for t, consult Table 5.1 and look for the col- umns for one-tailed test—they are the final two columns in the table. For this analysis with p 5 .01 and df 5 11 the critical value of t.01(11) 5 2.718.
At p 5 .01 level and df 5 11, the results are statistically significant because 3.318 . 2.718. Therefore, students who had been enrolled in the quiet time program had significantly higher GPAs than those of the general middle school student population.
By using statistics, the principal was able to present quantitative evidence that the quiet time program is associated with higher student GPAs. Based on this analysis, as well as on other tests measuring stress and behavioral issues, the principal was able to raise additional funds to expand the pilot program.
Apply It! Boxes written by Shawn Murphy.
suk85842_05_c05.indd 146 10/23/13 1:17 PM
CHAPTER 5Section 5.2 The One-Sample t-Test
Consider the first t-test example with the police officers. If the question had been, Are police officers significantly more stressed than the general pub- lic? then this would have been the one-tailed test, with a rejection region like that depicted in the B distribution in Figure 5.3.
Figure 5.3: Two-tailed and one-tailed t-tests
The Critical Values in the One-Tailed Test
One- and two-tailed tests employ different critical values. This is because two-tailed tests exclude the extreme 2.5% of the distribution in each tail, whereas one-tailed tests exclude most extreme 5% of the distribution in just one tail. Five percent in one tail marks the point at which a z or t score becomes statistically closer to the mean of the distribution than in a two-tailed test. If the difference is in the direc- tion predicted, less extreme values are statistically significant in a one- tailed test.
Highest 2½%Lowest 2½%
A. Statistically Significant Regions in a Two-tailed Test
Highest 5%
B. Statistically Significant Regions in a One-tailed Test Where the Sample Is Predicted to Be Higher Than the Population
C A researcher asks whether intelligence scores are significantly different for math majors and journalism majors. Is the researcher calling for a one-tailed or a two- tailed test?
Try It!
suk85842_05_c05.indd 147 10/23/13 1:17 PM
CHAPTER 5Section 5.3 Hypothesis Testing
• In a one-tailed z-test, for example, z 5 1.645 excludes the most extreme upper 5% of the distribution instead of the z 5 1.96 required for a two-tailed test.
• In the t-test with df 5 9, the critical value for two-tailed tests was t.05(9) 5 2.262. For the equivalent one-tailed test, it is t.05(9) 5 1.833.
• The critical values for one-tailed tests are in the last two columns of Table 5.1.
The downside to one-tailed tests is that if something unexpected occurs and there is an extreme value of t or z, and it’s in the opposite direction to the one predicted, there is no way to find the result statistically significant because the entire significance region is in the other tail. For this reason, some researchers and analysts do not use one-tailed tests. From an opposing perspective, predicting the tail as in a one-tailed test allows for a 5% critical a-level in the right or left extreme of the distribution. This increased a-level favors a more powerful test, or 5% ability to reject the null, compared to a 2.5% probability in a two-tailed test. This also increases the probability of a type I error, as there is a higher chance of falsely rejecting the null hypothesis. See Figure 5.4 for a comparison of one- versus two-tailed tests and their respective critical a-level (or rejection of the null) areas.
Figure 5.4: One-tailed versus two-tailed tests
5.3 Hypothesis Testing
In statistical testing, there are two possible outcomes. The calculated value either is, or is not, statistically significant. So before the analysis even begins, two predictions cover all possible results. A hypothesis is a prediction based on an underlying premise. The pre- diction that the result will not be statistically significant is called the null hypothesis. The prediction that the result will be significant is called the alternate (research) hypothesis. Both hypotheses make their predictions in terms of population values.
00 1.645�1.645
0.05 Critical �-level
0.05 Critical �-level
0.25 Critical �-level
0 1.96�1.96z:
0.25 Critical �-level
OR
:z:
A. One-tailed
B. Two-tailed
suk85842_05_c05.indd 148 10/23/13 1:17 PM
CHAPTER 5Section 5.3 Hypothesis Testing
• The null hypothesis predicts that the mean of the population from which the sample was drawn (m1) has the same value as the mean of the population (m2) to which it is compared, or m1 5 m2.
• The symbol for the null hypothesis is H0: and it is written this way, H0: m1 5 m2.
A statistically significant z or t value indicates that the sample probably does not represent the population to which it is compared; it is characteristic of some other population. The hypothesis that predicts this outcome is called the alternate (research) hypothesis for a two-tailed test, and its form for the z-test and t-test is
Ha: m1 ? m2.
Every z-test or t-test involves both hypotheses because either predic- tion may hold for a given test, but whatever the outcome, results are stated in terms of the null hypothesis. The two possible outcomes are these:
• When a result is statistically significant, the decision is to reject the null hypothesis. Results indicate that the sample likely belongs to a population other than the one to which it was compared, so the probability that m1 5 m2 is rejected.
• When a result is not statistically significant, the decision is to fail to reject the null hypothesis—instead of to “accept the null hypothesis.” This is because it would be extremely difficult to prove that m1 5 m2. Just because the calculated value is not great enough to reject the null hypothesis does not necessar- ily mean that m1 5 m2. We can be sure only that there is not enough of a difference to be confident in rejecting H0.
The Null and Alternate Hypotheses in a One-Tailed Test
In one-tailed tests, the null hypothesis is still H0: m1 5 m2, but the alternate hypothesis is altered to reflect the more specific prediction. Think about the example of the police offi- cers and job stress. If the analyst had predicted that law enforcement officers would have higher job stress than the general population, the alternate hypothesis would have been
Ha: m1 . m2
The symbols reflect a prediction that the population from which the sample was drawn was expected to have a higher mean stress level than the mean stress level of the general population.
Note that the first population mean, m1, represents the mean of the population from which the sample was drawn. This is consistent with the fact that in the test statistics for z and t, M precedes mM. If an analysis predicts that the sample belongs to a population with a lower mean than the population to which it is compared, the alternate hypothesis is
Ha: m1 , m2
The hypothesis stating “no difference” has been referred to as the “nil” hypothesis. Visit Null Hypothesis: The Journal of Unlikely Science to read an atypical explanation of hypothesis testing.
http://www.null -hypothesis.co.uk /science//item/what _is_a_null_hypothesis
Try It!
suk85842_05_c05.indd 149 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
Another Example
A hospital administrator has read a study indicating that compassion for patients is higher among nurse’s assistants than among health care professionals generally and wishes to test that finding at San Juan General Hospital. The mean for all health professionals on the Index of Careful Recuperative Effort (the I-CARE) is 19.600. For 10 nursing assistants, I-CARE results are
15, 17, 20, 20, 23, 23, 23, 25, 27, 28
Is the difference statistically significant?
The null hypothesis for this analysis is
H0: m1 5 m2
To test whether nursing assistants have more compassion for patients than the population of health care professionals do, the alternate hypothesis reflects a one-tailed test:
Ha: m1 . m2
To complete the test,
1. Verify that for the I-CARE sample, M 5 22.1, and s 5 4.15.
2. Note that SEM 5 s/"n 5 4.149/"10 5 1.31
3. t 5 M 2 µM
SEM 5
22.1 2 19.600 1.312
5 1.91
4. From the column for one-tailed tests, note that the critical value is t.05(9) 5 1.83.
With the entire rejection region in one tail of the distribution, the calcu- lated value of t is more extreme than the critical value from the table. The results are statistically significant. Reject H0.
5.4 The Independent t-Test
Gosset’s one-sample t-test represents an important development to anyone who needs to make a judgment about whether a particular sample belongs to a specified popula- tion. It relieves the researcher of the need to have that standard error of the mean param- eter (sM), which sometimes is not available. In Gosset’s next test, everything needed to complete the test can be calculated from the sample data. No population parameters are necessary. The formula refers to a parameter value, but it is one that remains constant for all such tests, so it is not something that an analyst or researcher needs to look for.
D Had the I-CARE problem been a two-tailed test, (a) what would the alternate hypothesis have been, and (b) what would the correct decision have been?
Try It!
E If the t value in the I-CARE problem had been –1.905 instead of positive 1.905, would that have changed the statistical decision?
Try It!
suk85842_05_c05.indd 150 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
The z-test and the one-sample t-test are important tools for researchers and data analysts, but they still leave many questions unanswered. What if the question is whether the dif- ference between two samples is statistically significant? Suppose a psychologist working with substance abusers wishes to know which of two therapeutic approaches is more productive? Questions that compare two samples to each other are the domain of the independent t-test.
The word independent in the term “independent t-test” refers to the samples in the analy- sis. This test requires that the two samples involve separate groups of people; those who are in one group cannot also be in the other.
The Population Based on Difference Scores
The z-test and the one-sample t-test are based on the distribution of sample means. A fre- quency distribution of sample means makes it possible to ask questions about whether a particular sample is likely to have been drawn from a specified population.
The independent t-test is based on a population created from “difference scores.” Rather than sampling one group at a time and then plotting the sample mean to create the distri- bution of sample means, this population is created by
• selecting two samples of the same size, • computing the sample mean for each, • subtracting the second mean from the first (M1 2 M2), and • plotting the difference between the two means in a frequency distribution.
If this process is repeated enough times, all the individual scores would be represented in the distribution, making the resulting population a distribution of difference scores. As with the distribution of sample means, the fact that the result is a population does not require that each score be registered individually, only that all scores be represented. The shape of the distribution is affected by the following outcomes:
• When the mean of the first sample is greater than the mean of the second sample (M1 . M2), the difference between the pair of sample means will be positive, a result plotted in the right half of the distribution.
• When M1 , M2, the difference is negative, and the difference is plotted in the left half of the distribution.
• Occasionally, M1 and M2 might have the same value and M1 2 M2 5 0, a score that occurs in the middle of the distribution.
The parameter mean for this distribution of difference scores is symbolized mM1 2 M2. The subscript M1–M2 reminds us that the mean is based on the differences between all the pairs of sample means. Like the z-distribution, the value of the distribution of difference scores is always 0, but not because it is mathematically manipulated.
There will usually be some difference between the mean of the first and the mean of the second; however, the mean of all those M1 2 M2 differences will be 0. This is because over many pairs of samples, the positive differences counterbalance the negative differences, mM1 2 M2 5 0.
suk85842_05_c05.indd 151 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
The variability in all the difference scores is reflected in another parameter value called the standard error of the difference: sM1–M2. The symbols in the notation indicate that this parameter is the standard deviation of all of those theoretical difference scores. In practice, the standard error of the difference is estimated, just as the standard error of the mean in the one-sample t-test is estimated.
Using the Distribution of Difference Scores to Explain Results
The mean of the distribution of difference scores is always 0, regardless of which popu- lation is involved. What is the importance of the distribution of difference scores to the independent t-test? It indicates how much M1 2 M2 difference can be expected when pairs of samples are drawn from the same population, and the mean of the second is subtracted from the mean of the first. It provides a range of the random differences we can expect when the samples are drawn from the same population. When differences exceed those expected chance differences, the explanation is that the samples probably do not represent the same population. This is the conclusion that a statistically significant value of t in the independent t-test indicates.
The Hypotheses in the Independent t-Test
The null hypothesis in an independent t-test looks the same as it does for the one-sample t-test—H0: m1 5 m2, or m1 2 m2 ? 0—but it is explained differently. The null hypothesis predicts that the mean of the population represented by the first sample is the same as the mean of the population represented by the second sample. In other words, the samples come from the same population.
The alternate hypothesis predicts that this is not so, that the samples have been drawn from different populations. If the independent t-test is one-tailed, the alternate hypothesis indicates the direction of the predicted difference, just as with the one-sample t-test. Either H0: m1 , m2 or H0: m1 . m2.
From z Scores, the z-Test, and the One-Sample t to the Independent t-Test
There is an important harmony in each of the statistical procedures discussed so far.
1. The numerator for each formula involves a difference score. • For the z score, the numerator is x 2 M. • For the z-test, the numerator is M 2 mM. • For the one-sample t-test, the numerator is also M 2 mM.
2. The denominator for each formula involves a measure of data variability. • For the z score, the numerator is s. • For the z-test, the numerator is sM. • For the one-sample t-test, the numerator is SEM.
These patterns are repeated in the independent t-test where, as with the previous tests, the numerator is a difference score and the denominator is a measure of data variability.
suk85842_05_c05.indd 152 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
The Test Statistic for the Independent t-Test
The difference score that is the numerator in the independent t-test is
M1 2 M2 2 mM1 2 M2
It is the mean of the first sample minus the mean of the second sample minus the mean of the distribution of differences. Because we already know that mM1 2 M2 5 0, that part of the term is usually omitted, so the numerator becomes just M1 2 M2.
Recall that the standard deviation of all the difference scores in the distribution of differ- ence scores is the standard error of the difference: sM1 – M2. The estimate of that parameter value is the statistic SEd, which is the denominator in the independent t-test. This is how the test statistic looks:
t 5 M1 2 M2
SEd Formula 5.3
Where
M1 5 the mean of the first sample
M2 5 the mean of the second sample
SEd 5 the estimated standard error of the difference
There are two samples in this test, and SEd must sum variability within both of them. For this reason, SEd is sometimes called a “measure of pooled variance.” When the sample sizes are equal, the formula for this statistic is as follows:
SEd 5 "1SEM1 2 2 1 1SEM2 2 2 Formula 5.4
Where
SEM1 2 5 the square of the estimated standard error of the mean for sample 1
SEM2 2 5 the square of the estimated standard error of the mean for sample 2
SEM 5 s/"n
We used the estimated standard error of the mean (SEM) to measure data variability in the one-sample t-test. It was the denominator in the t statistic. Because the independent t-test involves two samples, the standard error of the mean must be calculated for both samples so that the standard error of the difference, SEd, can combine variance from within both groups.
An Independent t-Test Example
Suppose that a sport psychologist is interested in athletes’ willingness to take risks as they participate in their sport. With research to suggest that having a winning record prompts athletes to repeat what has been successful in the past and to be more conventional in their play, the psychologist reasons that risk taking will be lower in athletes with winning
suk85842_05_c05.indd 153 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
records than in those with losing records. Using the Demonstration Assessment for Ready Evaluation (DARE), the psychologist has data from two teams of eight volleyball players, the first with a winning record, the second with a losing record. Note that the hypothesis is that risk taking will be higher in the team with the losing record. This has implications for the alternate hypothesis and the way the results are interpreted. The DARE scores for members of the two teams are as follows:
Team with the winning record: 37, 51, 57, 60, 66, 62, 68, 69
Team with the losing record: 34, 37, 44, 47, 51, 54, 57, 59
The hypotheses for this problem are
H0: m1 5 m2 Ha: m1 , m2
Because the prediction is that risk taking will be higher in the team with the losing record; if that team is the second team in the test statistic, the alternate hypothesis has the form m1 , m2.
The steps to completing the independent t-test follow:
1. Calculate the means and standard deviations for each group. 2. Calculate the standard error of the mean for each group. 3. Calculate the standard error of the difference. 4. Calculate the value of t. 5. For the independent t-test, degrees of freedom are n1 1 n2 2 2, or the number in
Group 1 plus the number in Group 2 minus 2. (In this case, the df is calculated by subtracting 1 for each group; since there are two groups, it will be 2.)
6. Compare the value of t to the critical value of t.
Following are the results. Note that the subscripts to M, s, and SEM indicate the group that the statistic represents: the team with the winning record (1) or the team with the losing record (2).
1. M1 5 58.75, s1 5 10.634
M2 5 47.875, s2 5 9.109
2. SEM1 5 s1/"n1 5 10.634/"8 5 3.760 SEM2 5 s2/"n2 5 9.109/"8 5 3.221
3. SEd 5 "1SEM1 2 2 1 1SEM2 2 2 5 "3.7602 1 3.2212 5 4.951
4. t 5 M1 2 M2
SEd =
58.75 2 47.875 4.951
5 2.197
5. df 5 8 1 8 2 2 5 14
6. The critical value for t.05(14) 5 1.761
suk85842_05_c05.indd 154 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
The critical value of t is from the table for one-tailed tests because the prediction was that the winning team would have a lower (rather than just a different) mean level of risk taking than the losing team.
The calculated t value exceeds the table value. Was the result statistically significant?
• The prediction was that the level of risk taking would be lower among play- ers who have a winning record (Group 1) than among those who have a losing record (Group 2).
• But in fact, the group with the winning record had a higher mean risk-taking score than the losers.
• Because it is a one-tailed test for which the prediction was that t would be negative, the entire rejection region was in the negative tail of the t distribution.
• No positive value of t, no matter how extreme, can be signifi- cant in a one-tailed test when the sign of t is the opposite of what was predicted.
This problem magnifies the gamble taken with one-tailed tests. If the result is in the direction opposite to what is predicted, the magnitude of the value does not matter. The proper statistical decision is to fail to reject the null hypothesis.
F Because the risk-taking problem produced a result that was not in the tail predicted and so was not statistically significant, would it be appropriate to just run the test again as a two-tailed test?
Try It!
Apply It!
Sport Science
Sport science is a discipline that studies the application of scientific principles and techniques with the aim of improving sporting performance. A sport sci- entist is interested in determining if a certain training method will change the
100-yard dash times of sprinters. Does training with a parachute on your back (a small one opened up behind you, creating a drag) affect sprint times? It might seem obvious that it improves times, but the concern is that perhaps it alters the way sprinters run in a way that leads to poorer running mechanics and slower times.
The sport scientist used high school sprinters to test his theory. No population data was available, so an independent, two-tailed t-test was used with p 5 .05. This is a two-tailed test because the scientist is interested only in determining if this training has an effect. It is unknown if this may lower or raise the average sprint times.
Sixty high school sprinters were chosen at random from throughout the country. During the track season, 30 sprinters incorporated parachutes into their normal training (Group 1), and the remaining 30 (Group 2) did not. The null hypothesis is
H0: m1 5 m2
(continued)
suk85842_05_c05.indd 155 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
Another Independent t-Test Example
A sociologist is interested in whether optimism about the future differs between those who are employed and those who are unemployed. Data is collected with the Upward Scientific Differential (the Upside) and are as follows:
Unemployed: 5, 7, 7, 8, 11, 14, 15
Employed: 7, 10, 12, 15, 15, 16, 17
1. M1 5 9.571, s1 5 3.823
M2 5 13.143, s2 5 3.625
Apply It! (continued)
The null hypothesis predicts that the mean of the population represented by the first group is the same as the mean of the population of the second group. At the end of the season, the final 100-yard sprint times were recorded for each of the 30 runners in the two groups. The means, standard deviations, and standard errors of the mean for the two groups were the following:
M1 5 12.07 seconds, s1 5 0.97 seconds
M2 5 12.21 seconds, s2 5 0.94 seconds
SEM1 5 s1/"n1 5 0.97/"30 5 0.177 SEM2 5 s2/"n2 5 0.94/"30 =0.172
The two samples are independent, and participants in each group were randomly selected. As can be seen for the standard errors of the mean, the two samples have similar variability. Therefore, the assumptions for an independent t-test are met. The measure of pooled vari- ance is calculated first.
SEd 5 "1SEM1 2 2 1 1SEM2 2 2 5 "0.1772 1 0.1722 5 0.2468
t 5 M1 2 M2
SEd 5
12.07 2 12.21 0.2468
5 20.567
The critical value for t.05(29) 5 2.045 (two-tailed test).
In this instance, the difference is not statistically significant. The results indicate that training with parachutes has no significant effect on the 100-yard-dash times of high school sprinters.
Apply It! Boxes written by Shawn Murphy.
suk85842_05_c05.indd 156 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
2. SEM1 5 s1/"n1 5 3.823/"7 5 1.445 SEM2 5 s2/"n2 5 3.625/"7 5 1.370
3. SEd 5 "1SEM1 2 2 1 1SEM2 2 2 5 "1.4452 1 1.3702 5 1.991
4. t 5 M1 2 M2
SEd 5
9.571 2 13.143 1.991
5 21.794
5. df 5 7 1 7 2 2 5 12
6. The critical value for t.05(12) 5 2.179 (two-tailed test).
With a calculated t value of 21.794, the difference is not large enough to be statistically significant. The proper decision is once again to fail to reject. In the distribution of difference scores on which the inde- pendent t-test is based, differences in optimism between these two groups are within the amount that could have been expected to occur by chance.
SEd for Unequal Samples
Formula 5.4 for the standard error of the difference is based on the assumption that the two samples have the same size. The fact that the formula calls for equal treatment of the two standard errors of the mean values bears this out. When the sample sizes differ, the t statistic looks the same, but the standard error of the difference has to accommodate the unequal sample sizes. Formula 5.5 adjusts for n values that are different in each group:
SEd 5 Å 1n1 2 1 2s21 1 1n2 2 1 2s22
1n1 1 n2 2 2 2 1 1 n1
1 1 n2
Formula 5.5
Do not be distracted by the length of the formula. It involves just the number of scores in each group and the square of the standard deviation for each group, a statistic that is called the variance that came up in Chapter 1. Be careful, however, with the order of math- ematical operations—remember, for example, to work within the parentheses first.
The steps for completing the SEd for unequal sample sizes are as follows:
1. Determine each group’s variance (s2). As the notation indicates, it is just the standard deviation squared. If your calculator does not have a standard devia- tion function and you do the variance longhand, just delete the final step, which is the square root, and the result is the variance. If your calculator has a standard deviation function, just square the result to get the variance. If the variance is calculated in Excel, the command is VAR.
2. Take the number in Group 1 minus 1 (n1 2 1) and multiply the result by the vari- ance for Group 1 (s21).
3. Repeat Step 2 for Group 2. 4. Add the results of Steps 2 and 3 together and divide their sum by the number in
the two groups minus 2 (n1 1 n2 2 2). 5. Add the result of 1/n1 plus 1/n2. 6. Multiply the result of Step 4 and the result of Step 5. 7. The last step is to find the square root of the result of Step 6.
G Because the question in the Upside example is whether optimism about the future differs between the two groups, is it a one-tailed or a two-tailed test?
Try It!
suk85842_05_c05.indd 157 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
The Independent t-Test With Unequal Sample Sizes
Now, return to the problem on the level of optimism among the employed versus the unemployed. Suppose that the unemployed person with the score of 15 is no longer will- ing to be involved in the study and insists that the score be deleted. The data in that case will be the following:
Unemployed: 5, 7, 7, 8, 11, 14
Employed: 7, 10, 12, 15, 15, 16, 17
M1 5 8.667, s1 5 3.266, which makes s 2
1 5 10.667.
Group 2 is unchanged: M2 5 13.143, s2 5 3.625, and s 2
2 5 13.143.
SEd 5 Å 1n1 2 1 2s21 1 1n2 2 1 2s22
1n1 1 n2 2 2 2 1
1 n1
1 1 n2
SEd 5 Å 16 2 1 210.667 1 17 2 1 213.143
16 1 7 2 2 2 1 1 6
1 1 7
SEd 5 Å 132.190
11 1 0.310
5 3.511
t 5 M1 2 M2
SEd 5
8.667 2 13.143 3.510
5 21.275
The critical value for t.051112 5 2.201 (two-tailed test).
Ordinarily, eliminating a score will reduce the potential for a significant result because the critical value from the table is higher with one less score. But in this case, removing the score makes the result significant. Why? Removing that one most extreme score affects the outcome in two ways. In this instance, it increases the difference between means—the mean of the first group was already lower than that of the second group, and eliminat- ing the highest score from the group that already had a lower mean value increased the distance.
Second, eliminating the highest score from the group also decreases the variability within the group. Consequently, the variance has a lower value than when the score of x 5 15 was in the group. Less within-group variability (smaller s1
2) reduces the value of the stan- dard error of the difference, SEd. With a smaller denominator in the t-ratio and more dis- tance between the means of the two groups, the result is inevitably a larger value of t. The increase is great enough that even with a larger critical value to meet because df 5 11 rather than 12, the calculated t value exceeds the table value and is statistically significant at p 5 .05.
If the adjustment for unequal sample sizes had not been made by using Formula 5.5, and if SEd had been calculated for this problem as though the sample sizes were equal, what would its value have been?
suk85842_05_c05.indd 158 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
SEM1 5 s1/"n1 5 3.266/"6 5 1.333 SEM2 5 s2/"n2 5 3.625/"7 5 1.370—this is the same as before, of course
SEd 5 "1SEM1 2 2 1 1SEM2 2 2 5 "1.3332 1 1.3702 5 1.912
It is not a large adjustment compared to the SEd 5 1.930 from the unequal sample sizes formula, but neither was the change in sample size very large (from n 5 7 to n 5 6), and Formula 5.4 is designed to adjust for sample size differences.
Assumptions Associated With the Independent t-Test
Each statistical test is based on certain assumptions or conditions. Those associated with the independent t are the following:
1. The samples are independent. 2. The participants in each group are randomly selected. 3. The two samples should have similar variability (i.e., range, s, or s2). 4. The measures of the dependent variable must be at least interval scale.
With respect to condition 3, the term for equivalent variability is homoscedasticity. The first graph in Figure 5.5 shows that equal variances are assumed—an even distribution of data points is seen between both groups where there is an equal distribution of points. In the second graph in Figure 5.5, the distribution is not equal, as Group 1 data points (lower on the x-axis) are closer together , whereas Group 2 data points (higher on the x-axis) are spread further apart thus indicating more variability than Group 1. In this within-group variability between groups (significantly different at p , .05), equal variances cannot be assumed, and heteroscedasticity has occurred.
Homoscedasticity is important because we want to ensure that within-group variances between groups are equivalent at the outset so that true variances through experimental manipulation are not due to within-group variability (or error). In other words, too much within-group variability (or error) between groups (the IV) based on the dependent vari- able (DV) will jeopardize the statistical conclusion validity of the results. Software such as SPSS provides the Levene’s test for equality of variances in which a nonsignificant Lev- ene’s test (p . .05) indicates that equal variances are assumed and that homoscedasticity is apparent and not violated. Conversely, a significant Levene’s test (p , .05) indicates that unequal variances are assumed and that heteroscedasticity has occurred resulting in an adjustment in the calculation of t. Therefore, since an assumption of homogeneity is not tenable, an adjustment must be made to correct for such differences when calculating the t value.
Although Conditions 1 and 4 are fairly strict requirements, the t-test is robust in the face of violations to Conditions 2 and 3. It takes something obviously wrong to invalidate the results of a t-test.
suk85842_05_c05.indd 159 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
Figure 5.5: Homoscedasticity versus heteroscedasticity
Distribution
A. Homoscedasticity
B. Heteroscedasticity
Group 1 Group 2
IV
DV
IV
DV
Distribution
Group 1 Group 2
IV
DV
IV
DV
suk85842_05_c05.indd 160 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
The Independent t-Test on Excel
An expert is interested in analyzing the effect of a mild stimulant on participants’ recall ability. Twenty volunteers are randomly selected from a larger group and then randomly assigned to groups. Those in the first group receive a placebo; those in the second group receive a mild stimulant. Then, members of both groups are given a complex scenario from which they are required to retrieve details. The number of details that the volunteers in each group can correctly retrieve follows:
Placebo: 2, 5, 2, 4, 7, 1, 2, 3, 4, 5
Stimulant: 6, 6, 7, 10, 12, 9, 6, 5, 5, 7
Are the retrieval differences statistically significant? We will answer the question with Excel as follows:
1. Create a data set in Excel by naming the variables “placebo” (cell A1) and “stimu- lant” (cell B1).
2. Enter the scores in their respective columns. 3. Click the Data tab and then Data Analysis. Because the ranges for the scores
in the two samples are fairly similar (5 in Group 1 and 7 in Group 2), we will assume equal variances.
4. Select t-Test: Two-Sample Assuming Equal Variances. 5. Click OK. 6. Excel calls Group 1 “variable 1.” Indicate that the range for the data in this group
is A2:A11. Indicate that the range for “variable 2,” which is actually the data for Group 2, is B2:B11.
7. Indicate that the “hypothesized mean difference” is 0 by entering 0 in the box. This reflects the null hypothesis, m1 5 m2.
8. Select a range for the output—say, C15—and click OK. If column C is expanded so that all the output shows, the result is Figure 5.6.
suk85842_05_c05.indd 161 10/23/13 1:17 PM
CHAPTER 5Section 5.4 The Independent t-Test
Figure 5.6: The independent t-test data set in Excel
The output provides descriptive statistics including the means and variances for each group. Rather than indicating whether the result is or is not statistically significant, Excel calculates the probability that this value of t could have occurred by chance if the samples both came from the same population. Results are produced for both one-tailed and two- tailed tests.
• Any time p 5 .05 or less, the result is statistically significant. • The lower the p value in the result, the less likely it is that the difference between
the groups occurred by chance. • The data indicates that both the one-tailed and two-tailed tests have p values less
than .05. Both one-tailed and two-tailed results are significant for this data.
suk85842_05_c05.indd 162 10/23/13 1:17 PM
CHAPTER 5Section 5.5 The Confidence Interval of the Mean Difference
5.5 The Confidence Interval of the Mean Difference
When an independent t-test result is statistically significant, the conclusion is that the two samples represent populations with different means. This is how it is written symbolically: m1 2 m2 ? 0. The two sample means can be thought of as estimates of their respective population means. The more difference there is between M1 and M2 when t is significant, the greater the difference there probably is between their respective popula- tion means.
In this instance, the confidence interval is called the confidence interval of the mean difference for the independent t-test and provides another way to examine the amount of difference between the means of two different populations. The confidence interval, with a specified level of probability, will indicate a range of values within which the difference between the means of the populations is represented by the first and second samples. This lower-bound limit (LL) and upper-bound limit (UL) is reported as the CI. Figure 5.7 shows the distinction in the CI of each individual distribution as a lower and upper limit based on a 95% confidence level that the actual population parameter is within this LL-UL range, while the CI of the difference between the means indicates a 95% confidence level in the precision of the calculated sample mean difference to represent the “true” population difference.
Figure 5.7: Confidence interval of the difference between means
Source: www.jedcampbell.com. Retrieved from http://www.jedcampbell.com/wp-content/uploads/2011/01/2samplettest.png
The confidence interval uses this formula:
CI.95 5 6t(SEd) 1 (M1 2 M2) Formula 5.6
Where
CI.95 5 a .95 confidence interval, which means that because the t-test was con- ducted at p 5 .05, there is a 1 2 .05 5 .95 probability of capturing the true value of the difference between the means of the populations represented by these two samples.
t 5 the value from the table, given the df for the problem and the p value at which the test is conducted. The t value for df 5 18 and p 5 .05 is 2.101.
2 Tailed Hypothesis Test: Z or t T-crit SE DF P-value
2.093Using Student’s T Distribution
Interpretation There is likely a difference at the stated confidence level.
�3.110 0.450 18 0.006
Confidence Intervals (CI) for Means: Z or t Set Lower Mean Upper
2.093Using Student’s T Distribution 1 1.671 2 2.929
2 2.998 4 4.402
Interpretation Range does not cross zero. There is likely a difference.
CI for Difference between Means: Z or t Pooled �2 Lower Mean Upper
2.093Using Student’s T Distribution 1.013 �2.342 �1.400 �0.458 0.000 0.500
1.500 1.000
2.000 2.500
3.500 3.000
4.000 4.500 5.000
1 2
Confidence Intervals
suk85842_05_c05.indd 163 10/23/13 1:17 PM
CHAPTER 5Section 5.5 The Confidence Interval of the Mean Difference
SEd 5 the denominator for the t ratio.
M1 2 M2 5 the absolute value of the difference between the sample means.
For the problem in Figure 5.7, the confidence interval is as follows:
1. Because Excel did not print the SEd, we will calculate it.
s12 5 3.389 and s 5 "s2 , s 5 "3.389 5 1.841 s22 5 5.344 and s 5 "s2 , s 5 "5.344 5 2.312
SEM1 5 1.841/"10 5 0.582 SEM2 5 2.312/"10 5 0.731
SEd 5 "1SEM1 2 2 1 1SEM2 2 2 5 ".5822 1 0.7312 5 0.934
2. Now, we calculate the confidence interval of the difference:
CI.95 5 6t(SEd) 1 (M1 2 M2)
CI.95 5 62.101(0.934) 1 (3.8)
CI.95 5 1.838 (LL) to 5.762 (UL)
3. Although the final term in the formula indicates M1 2 M2, it is the absolute value of the difference between the sample means that is needed. In addition, the 6 symbol indicates that to complete the formula, we treat the t value from the table twice, once as a positive value and once as a negative value:
a. 2.101 3 0.934 and b. 22.101 3 0.934.
The two results added to the difference between the means provide the range of values within which the difference between the related population means will occur 95% of the time. Stated formally, with 95% confidence, the difference in the population means represented by these samples is somewhere between 1.838 and 5.762.
Because we concluded that the two samples are likely to belong to dis- tinct populations—in other words, a statistically significant difference 95% unlikely due to chance—we have estimated how much difference there is between the population means.
Visit HyperStat Online to learn about an easier calculation of confidence intervals and other statistics resources.
http://davidmlane.com /hyperstat/confidence _intervals.html
Try It!
suk85842_05_c05.indd 164 10/23/13 1:17 PM
CHAPTER 5Section 5.6 Effect Sizes
5.6 Effect Sizes
Results that are statistically significant are not always important. Remember that sig-nificance indicates that a result is not likely to have occurred by chance; the result is not a random occurrence. That is different from saying that the result is important. How do we know whether an outcome has practical relevance? Effect sizes are calculated as a gauge for how important the outcome is.
When a result is significant, the next step is to calculate an effect size, or a measure of the practical importance of the result. One of the more common effect sizes for the indepen- dent t-test is Cohen’s d. Refer to the recall ability problem completed in Excel in Figure 5.6. Cohen’s d is
d 5 M1 2 M2
sdv Formula 5.7
Where
M1 2 M2 5 absolute value of the difference between the means of the two samples
sdv 5 the standard deviation for all the data/scores in both groups
Here are the means of the two samples for the problem where participants were given either a placebo or stimulant to test their recall ability:
M1 5 3.50
M2 5 7.30
The denominator in Cohen’s d is the standard deviation of all 20 scores together. Do not make the mistake of just trying to calculate a mean for the standard deviations from the two groups. The standard deviation of all 20 scores together can be quite different from the standard deviations of the two groups, averaged. Here again are those 20 scores:
Placebo: 2, 5, 2, 4, 7, 1, 2, 3, 4, 5
Stimulant: 6, 6, 7, 10, 12, 9, 6, 5, 5, 7
sdv 5 2.817
d 5 1M1 2 M2 2
sdv
d 5 13.50 2 7.30 2
2.817
d 5 1.349
It is the absolute value of this statistic that matters. It is an indicator of how much impact the independent variable (the IV, Chapter 1) has had on the dependent variable (the DV, Chapter 1). The IV is the variable that defines the groups. In this case, it is whether the individual received a mild stimulant or a placebo. The DV is the measure, recall ability in this example. Cohen’s interpretations of d values are as follows:
suk85842_05_c05.indd 165 10/23/13 1:17 PM
CHAPTER 5Section 5.7 Statistical Tests, Variables, and Research Design
d 5 0.2, a “small” effect
d 5 0.5, a “medium” effect
d 5 0.8, a “large” effect
These guidelines are “point indicators” rather than ranges of values. Cohen leaves unan- swered how to interpret, for example, a value between 0.2 and 0.5. In this example, besides being statistically significant, the difference in recall ability between those who received a stimulant and those who received a placebo is a “large” effect. If the data was not con- trived to begin with, the result would be important. In situations where t is not significant, there is no need to calculate an effect size because any difference between means is attrib- uted to sampling error.
5.7 Statistical Tests, Variables, and Research Design
The independent t-test is often used in social and behavioral science research. The test allows a researcher to select two groups that initially represent the same population but may come to differ for some reason, perhaps because a particular treatment is admin- istered to one of the groups. The analysis is a matter of determining whether the difference is great enough that the two groups represent distinct populations.
Research design is a formal plan for posing questions such as whether two samples repre- sent a common population. After the question is posed, the researcher gathers the relevant data and analyzes it to provide the answers. Research designs must specify the variables involved.
The independent variable (IV) defines the groups in a research design. In the problem about recall ability after using a stimulant, the independent variable was whether par- ticipants were administered the stimulant or the placebo. This is how the groups were created. The dependent variable (DV) is represented in each participant’s score, so the DV was the number of items each participant recalled.
The independent t-test involves a nominal scale IV divided into two groups and a DV that is interval or ratio scale. When the research design involves different numbers of groups or data of different scales, other tests are involved, some of which are discussed in later chapters, but the independent t-test is the right test for designs that pose questions such as these:
• Do prison psychologists and school psychologists (occupation is the IV) differ in salary (the DV)?
• Do those who do or do not receive flu shots (IV) miss significantly different num- bers of work days (DV)?
• Are verbal ability scores (DV) different for social science majors than for commu- nications majors (major is the IV)?
The way the groups are created also relates to research design. Although it is common to hear references to “an experiment,” truly experimental research involves the random
suk85842_05_c05.indd 166 10/23/13 1:17 PM
CHAPTER 5Section 5.8 Presenting Results
selection of participants and their random assignment to groups. The second question in the list above might fit an experiment if a group of participants is randomly selected and then half are randomly assigned to receive flu shots. The other two questions belong to designs that are nonexperimental because the independent variable cannot be assigned, even if those in the two groups were randomly selected.
In Chapter 6, factorial analysis of variance (ANOVA) will be introduced. It is conceptu- ally similar to the independent t-test except that it allows the use of more than one IV. For example, if the research on flu shots included those who received flu shots and those who did not, but it also includes groups within those two categories for males and females, this is called a quasiexperimental design: a design where one of the IVs is assigned but another cannot be—gender is not up to the researcher.
5.8 Presenting Results
Using the data from Figure 5.6, we can see that a few steps need to be taken in order to present the results. First, we need to state the hypotheses so we are aware of what conclusions we can and cannot make, as well as what we are looking for with the test results. The hypotheses should be written with statistical notation and include a written explanation identifying the independent and dependent variables. We might write the hypotheses for this study as follows:
H0: m1 5 m2
The null hypothesis states there is no significant difference between the mean number of details volunteers in the placebo group can correctly retrieve versus the number of details those who received a mild stimulant can recall.
Ha: m1 ? m2
The alternative (or research) hypothesis states there is a significant differ- ence between the mean number of details volunteers in the placebo group can correctly retrieve versus the number of details those who received a mild stimulant can recall.
Because we are focusing on identifying a difference and not making assumptions about the direction of the difference, we will utilize a two-tailed test. This can be seen in the writing of the alternative hypothesis. Since the means are stated as not being equal to each other, we are just exploring whether there is a difference at all. If we had wanted to assume that those who received the mild stimulant would retrieve more details, then we would have used Ha: m1 , m2.
Next, we need to obtain the necessary statistics from the Excel output in Figure 5.6. According to the results, the t-test statistic or calculation is 24.07. Since we conducted a two-tailed test, we would use the “P(T # t) two tail” number, 0.00, to determine the probability of getting this result by chance. The result we are focusing on is whether or
suk85842_05_c05.indd 167 10/23/13 1:17 PM
CHAPTER 5Section 5.8 Presenting Results
not the difference between the two groups is large enough to be significant or to have occurred “naturally.” In summary, it seems that it is extremely unlikely that we would have obtained this difference by chance and that the difference between the two groups is statistically significant since p , .01.
Lastly, we would determine what to do with our hypotheses. Because the difference between the two groups was statistically significant, we would reject the null hypothesis. It appears that the mild stimulant group (M 5 7.30) recalled significantly more details than the placebo group (M 5 3.50).
Using another example, let us say we want to determine if there is a significant difference in ages between our sample of males and females. In this case, our independent variable is sex (male, female) and the dependent variable is age. This is a common test done when we want to make sure our groups are similar to each other in certain attributes, such as age or other interval/ratio data, before continuing forward with our study. In this case, we would have the following hypotheses:
H0: m1 5 m2
The null hypothesis states there is no significant difference between the mean age of males and females.
Ha: m1 ? m2
The alternative (or research) hypothesis states there is a significant differ- ence between the mean age of males and females.
Figure 5.8 presents the t-test results and basic descriptive statistics. One of the major assumptions of a t-test is equal variances between the two groups. In this case, we would evaluate the F and Sig. numbers presented under the Levene’s test for equality of vari- ances. If the Sig. (or more commonly called p) level is less than .05, then the two groups have significantly different variances. If the Sig. value is more than .05, then the two groups appear to have equal or similar variances. In this case, p 5 .24, which is greater than .05 and leads us to meet the assumption of equal variances. From this point forward, we will look only at values on the row that states, “Equal variances assumed.” We would use the data presented on the row that states, “Equal variances not assumed” (the top row) if p had been greater than .05.
The t-test statistic is 2.56. This number is significant at the p 5 .01 level. Thus, we would reject the null hypothesis as it seems there is a significant difference between the mean ages of males (M 5 39.65) and females (M 5 43.19) in our sample. This may be important to note in the results of your study, or you may consider drawing a more adequate sample. We want to make sure the two groups are as “equal” as possible in areas that are not the focus of the study but might contribute to the effects seen from other variables.
suk85842_05_c05.indd 168 10/23/13 1:17 PM
CHAPTER 5Section 5.8 Presenting Results
Another important statistic to consider here is the confidence interval. In this study, the 95% CI was from 0.82 to 6.26, which means we are 95% confident that the population means represented by these two samples would differ by an amount in this range. An important check of statistical significance here can be done by determining whether 0, which is the difference we are testing via the null, is included in this interval. In this case, 0 is not included, which means there is a significant difference that was supported by the t-test results. If 0 was included, then it is possible that there could be no difference between the population means and that the null would be retained.
Calculation of Cohen’s d is by hand is demonstrated in Section 5.6. If we input our means and standard deviations for the two groups, males and females, we get an effect size of 0.36. Therefore, even though there is a signifi- cant difference between the ages of the males and females, the effect is small. This may be due to the large sample size, which affects our significance level but not the effect size.
SPSS Steps for an Independent-Samples t-Test
Analyze S Compare Means S Independent Samples t-Test. Place age into Test Variable(s) box and sex into the Grouping Variable box, click Define Groups and Use Specified Values as Group 1 (input 1 in the box) for male and Group 2 (input 2 in the box) for female. Then click Continue and OK.
Figure 5.8: SPSS output of t-test results
Female
Sex
Age Male
N Mean Std. Deviation Std. Error Mean
112
88
43.19
39.65
9.392
10.062
.887
1.073
Group Statistics
Equal variances assumed
Age Equal variances not assumed
F
Levene’s Test for Equality of Variances
t-test for Equality of Means
Sig. t df Sig. (2-tailed)
Mean Difference
Std. Error Difference
95% Confidence Interval of the
Difference
Lower Upper
1.407 .237 2.564
2.543
198
180.563
.011
.012
3.540
3.540
1.381
1.392
.817
.793
6.262
6.287
Independent Samples Test
SPSS does not directly calculate Cohen’s d. Visit Dr. Lee A. Becker’s website to learn about and use an online effect size calculator.
http://www.uccs.edu /~lbecker/
Try It!
suk85842_05_c05.indd 169 10/23/13 1:17 PM
CHAPTER 5Section 5.9 Interpreting Results
5.9 Interpreting Results
Though you should refer to the most recent edition of the APA manual for specific details on formatting statistics, Table 5.2 may be used as a quick guide in presenting the statistics covered in this chapter.
Table 5.2: Guide to APA formatting of t-test results
Abbreviation or Term Description
CI Confidence interval; presented as CI[lowest, highest]
d Cohen’s effect size
df Degrees of freedom
p Probability or significance level If statistically significant, report p , .05 or .01
If not statistically significant, report p 5 “calculated p level”
SEM Standard error of the mean; standard error of measurement
t t test statistic or score
Source: Publication Manual of the American Psychological Association, 6th edition. © 2009 American Psychological Association, pp. 119–122.
Note that d, df, p, SEM, and t are italicized, whereas CI is not. The symbol for degrees of freedom rarely appears in the presentation of results as it is replaced with its correspond- ing number in parentheses after the letter for the test that is being conducted (e.g., z, t, F, etc.). The following are some examples of how to interpret and report the results using these abbreviations, though you may use different combinations of results.
Using the results from Figure 5.8, we could interpret and report the results in the follow- ing way:
• Females were significantly older (M 5 43.19, SD 5 9.39) than males (M 5 39.65, SD 5 10.06) in a sample of participants, t(198) 5 2.56, p , .05, 95% CI [0.82, 6.26].
Using the data from Apply It! Testing a Pilot Program, we could interpret and report the results in the following way:
• Middle school students who participated in the quiet time program had signifi- cantly higher GPAs (M 5 2.81, SD 5 0.30) than the student population (m 5 2.53), t(11) 5 3.32, p , .01.
Using the data from Apply It! Sport Science, we could interpret and report the results in the following way:
• The difference between those who trained with parachutes (M 5 12.07, SD 5 0.97) and those who did not (M 5 12.11, SD 5 0.94) was not statistically significant, t(29) 5 20.57.
suk85842_05_c05.indd 170 10/23/13 1:17 PM
CHAPTER 5Section 5.10 Nonparametric Tests
5.10 Nonparametric Tests
As mentioned in the previous sections, when there are violations of assumptions in the data set (i.e., skewness and kurtosis, homogeneity of variance, and small sample sizes), then equivalent nonparametric tests are warranted. The nonparametric equivalents of the t-test are the Mann-Whitney U-test and the Wilcoxon rank-sum (Ws) test. Both of these tests are based on the rankings of each group as compared to the other and calculat- ing a z-score that is compared to a 5% critical a-level (z 5 1.645—one tailed or z 5 1.96— two tailed). According to Field (2009), both of these tests are interrelated so it does not matter which one is chosen, and since the Mann-Whitney is the more common approach, we will discuss and present this technique in greater detail.
The Mann-Whitney U-Test
The statistic named for Henry Mann and his student Donald Whitney was derived in 1947 as a test of ordinal data unlike the t-test that is used for interval data. The calculation of the Mann-Whitney U-test is for small sample sizes as it is more robust. As sample sizes get larger, the U-test resembles the t-test thereby making the former more appli- cable and versatile in testing situations. One disadvantage to this test and nonparametric tests in general is that they are more conservative, meaning that it is more difficult to reject the null hypothesis, but they are not as prone to type I errors.
The U-test is based on ranked or ordinal data sets and therefore issues of outliers (which cause variations in your sample distribution) are nullified. The U is calculated as a proportion of these summed ranks divided by their respective sample sizes. The formula for each group is as such:
U1 5 n1n2 1 n1 1n1 1 1 2
2 2 T1 Formula 5.8
Where
n1 5 sample size for Group 1
n2 5 sample size for Group 2
T1 5 the observed sum of ranks for Group 1
U2 5 n1n2 1 n2 1n2 1 1 2
2 2 T2 Formula 5.9
Where
n1 5 sample size for Group 1
n2 5 sample size for Group 2
T2 5 the observed sum of ranks for Group 2
There are several websites that will help in these calculations. Leon Avery and Richard Lowry provide two such websites that you can access via the links provided below. Use the data provided in this section to see if you can get the same results using these online resources.
http://elegans.som.vcu .edu/~leon/stats/utest .html & http://vassarstats .net/utest.html
Try It!
suk85842_05_c05.indd 171 10/23/13 1:17 PM
CHAPTER 5Section 5.10 Nonparametric Tests
To illustrate the calculation of the U-test, we will work with a sample of data in Table 5.3. The data represents driving mistakes compared by gender groups, male and female. Each group was given two classes of wine and instructed to drive a simulation track. Hitting a cone and swerving off course on the driving track was counted as a mistake.
Table 5.3: Sample data set of male and female driving mistakes
Male Participant
Number
Number of Mistakes
Initial Rank
Rank Female Participant
Number
Number of Mistakes
Initial Rank
Rank
1 9 1 1 1 18 12 11.5
2 13 5 5.5 2 15 9 8.5
3 19 14 14 3 20 15 15
4 15 8 8.5 4 18 13 12.5
5 10 2 2 5 21 16 16
6 16 10 10 6 13 6 5.5
7 12 3 3.5 7 17 11 11.5
8 12 4 3.5 8 14 7 7
48 87.5
Based on Table 5.3, the initial step is to rank all the values across groups starting with 1 as the lowest rank. If there are tied ranks (same values for two or more members), then an average of the ranks is taken. For instance, Male #4 has 15 driving mistakes, which is the same as that for Female #2. The average of their initial ranks, 8 1 9 5 8.5 as seen in the Rank column. Once rankings are complete, then the ranks are summed for each group as seen in the last row of the table.
Using the summed rank of the male group, the U is then calculated:
U1 5 (8)(8) 1 8 18 1 1 2
2 2 48
U1 5 64 1 36 2 48
U1 5 52
For the female group,
U2 5 (8)(8) 1 8 18 1 1 2
2 2 87.5
U2 5 64 1 36 2 87.5
U2 5 12.5
suk85842_05_c05.indd 172 10/23/13 1:18 PM
CHAPTER 5Section 5.10 Nonparametric Tests
The final step is to use the smaller U value to calculate a Z value (Formula 5.10) that will be compared against Zcritical of 1.96 for a two-tailed test.
Z 5 U 2 3 1n1n2 2 /2 4
"1n1n2 2 1n1 1 n2 1 1 2 /12 Formula 5.10
Z 5 12.5 2 164/2 2 "164 2 117 2 /12
Z 5 12.6 2 36
9.5
Z 5 22.47
The absolute value of the Z result is greater than Zcritical, 2.47 . 1.96, which indicates that there are significant differences in driving mistakes between males and females. Specifi- cally, females tended to have significantly more driving mistakes U 5 12.5, Z 5 –2.47, p , .05, than their male counterparts.
SPSS Steps for the Mann-Whitney U-Test
The ease of steps in SPSS also makes this a popular test. First, go to Analyze S Nonpara- metric Tests S Legacy Dialogs S 2 Independent Samples and input the grouping IV (cat- egorical) variable, select Define Groups to code your groups, and then input the DV (continuous) variable into Test Variable List. You will notice that in this screen, there is the default option of the Mann-Whitney U-test, but you can also select other nonparametric options and tests. (See an overview of each in Field, 2009, pp. 547–548.)
Interpreting Results
Using the data in Figure 5.6, Figure 5.9 was created in SPSS. (Note that the orientation of the data in the Excel spreadsheet in Figure 5.6 is a bit different from the SPSS spreadsheet in Figure 5.9.) With the aforementioned steps of a Mann-Whitney U-test using a grouping IV (Group) and DV (Recall_Ability) the test was executed (see Figure 5.10), with the SPSS output presented in Figure 5.11. In reporting the results the Recall ability of the Stimu- lant group (Mean rank 5 14.7) is significantly different from the Placebo group (Mean rank 5 6.43), U 5 8, Z= –3.203, p , .05, r 5 –.716. Consequently, the effect size (r) is not an automatic statistics in SPSS and can be calculated using Formula 5.11.
r 5 Z/"n Formula 5.11 r 5 2.3203/"20 r 5 2.716
Using the same SPSS output, the Wilcoxon sum rank test developed by Frank Wilcoxon in 1945 for equal sample sizes may be reported simply by substituting the U-value for Ws value in the table, that is, the Recall ability of the Stimulant group (Mean rank 5 14.7) is significantly different from the Placebo group (Mean rank 5 6.43), Ws 5 63, z 5 23.203, p , .05, r 5 2.716.
suk85842_05_c05.indd 173 10/23/13 1:18 PM
CHAPTER 5Section 5.10 Nonparametric Tests
Figure 5.9: The Mann-Whitney U-test data set in SPSS
suk85842_05_c05.indd 174 10/23/13 1:18 PM
CHAPTER 5Section 5.10 Nonparametric Tests
Figure 5.10: The Mann-Whitney U-test steps in SPSS
Figure 5.11: The Mann-Whitney U-test SPSS output
Recall_Ability
Group N Mean Rank Sum of Ranks
10
10
14.70
6.30
147.00
63.00
Stimulant
Placebo
Total 20
Ranks
Test Statisicsa
Recall_Ability
8.000
63.000
�3.203
.001
.001b
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
a. Grouping Variable: Group
b. Not correct for ties.
Exact Sig. [2*(1-tailed Sig.)]
suk85842_05_c05.indd 175 10/23/13 1:18 PM
CHAPTER 5Summary
Summary Gosset’s tests help liberate the researcher. In the one-sample test, samples can be compared to populations without the need for determining sM, which may not be available (Objec- tive 1). The independent t-test does not require any parameters and allows the researcher to examine two independent samples for significant differences. The independent t is widely used in research. It has been a test of tremendous importance (Objectives 2 and 7).
In hypothesis testing, there are two predictions relevant to the t-tests. In the case of the one-sample test, either the sample represents the population to which it is compared, or it does not. In the case of the independent t, either both samples belong to populations with the same mean, or they do not. The hypotheses simplify the way we report statistical results, although the use of one-tailed versus two-tailed tests adds another element. Recall that one-tailed tests provide an alternate hypothesis that is directional. It predicts how the mean of the population represented by the first group will differ from the mean of the population represented by the second, not just that the means differ.
Be careful not to confuse the type of t-test with the type of hypothesis. The one-sample t-test compares the sample to the population. The one-tailed t-test makes a prediction about how the first group differs from the second (Objectives 3 and 4).
When an independent t-test result is significant, it indicates that the samples probably rep- resent populations with different means. In effect, the sample means are estimates of the means of those distinct populations, although sampling error sometimes diminishes the accuracy of the estimate. The confidence interval of the difference is another approach to inferring the means of the two populations. Without providing actual values for the popu- lation means, the confidence interval estimates the difference between the two means. As such it is an “interval estimate” to contrast with the “point estimate” the sample means provide (Objective 6).
Calculating effect sizes helps ensure that statistical significance is not confused with prac- tical importance. Cohen’s d, an effect size for the independent t-test, indicates how much a significant result matters. Effect size calculations (and other procedures you will encoun- ter later in the book) keep the researcher grounded with the reminder that a significant result only indicates that the result is not likely to be a random occurrence. Cohen’s d and all other effect sizes estimate how much an independent variable affects a dependent vari- able (Objective 5). In addition, knowing how to present results (Objective 8) and interpret them in APA format (Objective 9) as they relate to describing t-test results is important when using and writing about statistical data. And finally, due to violations of parametric assumptions of data, nonparametric equivalents of the t-test were discussed in the Mann- Whitney U and Wilcoxon rank-sum tests with appropriate calculations and SPSS analysis (Objective 10).
If Gosset’s tests liberate us from the need for parameters; and Mann, Whitney, and Wil- coxon got rid of our use of parametric assumptions; then Fisher’s analysis of variance (ANOVA) test is going to relieve us of the restriction to two groups. Chapter 6 is about ANOVA. It is an important development in statistical analysis, but the logic involved will sound very familiar.
suk85842_05_c05.indd 176 10/23/13 1:18 PM
CHAPTER 5Key Terms
Key Terms
alternate (research) hypothesis Theory that predicts that the two samples belong to populations with distinct means.
Cohen’s d A procedure for calculating effect sizes for significant independent t-test results.
confidence interval of the mean difference For the independent t-test, the CI provides another way to examine the amount of dif- ference between the means of two different populations. The confidence interval, with a specified level of probability, will indicate a range of values within which the difference between the means of the populations as represented by both samples assuming equal variances.
critical value A value from a table that indicates the point at which a calculated test result is statistically significant.
distribution of difference scores The theo- retical population upon which the indepen- dent t-test is based.
effect sizes Results that provide an indica- tor of the practical importance of a statisti- cally significant outcome.
experimental research Research in which members of the groups in the study are randomly selected and the treatment is randomly assigned.
heteroscedasticity A violation of the homogeneity of variance where equal vari- ances cannot be assumed between groups. An adjustment in the t-test calculation is made if such a violation occurs.
homoscedasticity An assumption of hypothesis testing that there are equal vari- ances assumed between group differences. The Levene’s test is used to indicate a viola- tion of this assumption.
hypothesis A prediction based on an underlying premise. Hypotheses can be directional as in a one-tailed predic- tion or nondirectional as in a two-tailed hypothesis.
independent t-test Assessment that tests whether two samples likely belong to populations with identical means.
nonexperimental Type of research design in which random selection and random assignment are not completed.
null hypothesis For the independent t-test, statement that predicts that the two samples belong to populations with identi- cal means.
one-sample t A test of whether a sample is likely to belong to a specified population.
one-tailed test In t-tests, a one-tailed test is based on a prediction about the direction of the difference. For example, the analyst predicts that the sample represents a popu- lation with either a significantly smaller mean than the population to which it is compared or a significantly larger mean than that population.
quasiexperimental Research design in which some variables are randomly assigned and some are not.
suk85842_05_c05.indd 177 10/23/13 1:18 PM
CHAPTER 5Chapter Exercises
standard error of the differences The measure of variability in the distribution of difference scores, symbolized, sM1 2 M2.
two-tailed test In t-tests, the occurrence of a significant difference in either tail of the population distribution. In the case of a one-sample t, the mean of the population represented by the sample mean can be either significantly smaller than the popula- tion mean to which it is compared, or sig- nificantly larger than that population mean. Either outcome can result in a significant difference.
Chapter Exercises
Answers to Try It! Questions The answers to all Try It! questions introduced in this chapter are provided below.
A. The critical values for t are highest for distributions with the fewest degrees of freedom. This is an adjustment for the fact that small distributions are less likely to emulate the characteristics of a normal population.
B. If degrees of freedom for a one-sample t-test are 14, n 5 15, because df 5 n 2 1. C. As long as the question is whether the two groups are significantly different, it
is a two-tailed test. D. The alternate hypothesis would have been H0: m1 ? m2 and the decision would
have been to fail to reject H0. With a critical value of 2.262 in the two-tailed test, the difference would not have been statistically significant.
E. Yes, it would have changed the decision. Because the original problem was a one-tailed test (remember that the nursing assistants were predicted to have more compassion than health professionals generally), there is no rejection region in the other tail of the distribution. The entire rejection region is in the positive tail.
F. The answer is no. Once an analysis has been completed it is unethical to, in ef- fect, change the hypotheses after the fact. It is entirely appropriate, however, to gather new data, frame new hypotheses, and repeat the study. In fact, this type of replication is very important to developing a body of research.
G. It is a two-tailed test. The word “differs” allows for a difference in either direc- tion.
Review Questions The answers to the odd-numbered items can be found in the answers appendix.
1. A group of clients have anxiety scores as follows:
67, 55, 88, 74, 69, 81, 72, 70
a. What is the value of SEM? b. Is this group representative of the population of all who are treated for anxiety
disorders for whom m 5 66.0?
suk85842_05_c05.indd 178 10/23/13 1:18 PM
CHAPTER 5Chapter Exercises
2. Applicants to a graduate program in psychology have the following scores on the Graduate Record Exam, Quantitative portion:
375, 400, 425, 425, 490, 500, 510, 530
Do they represent the national population for whom m 5 500?
3. A second therapist working with those who have anxiety-related disorders has clients with the following scores:
52, 58, 64, 67, 67, 69, 70, 71
Is this group significantly different from the group in Exercise 1?
4. The test statistics for the one-sample t-test and the independent t-test both include measures of within-group data variability. What are they?
5. A researcher conducts an independent t-test using the .05 criterion for significance. If H0 is rejected, what is the probability of alpha error? Of beta error?
6. A researcher is interested in whether there are significantly more questions asked at a town hall meeting when the politician provides verbal reinforcement for each question. The politician responds “thank you” each time someone in Group 1 asks a question. The Group 2 subjects do not receive any response from the politician. The number of questions asked in six 30-minute sessions are as follows:
Group 1: 13, 15, 12, 17, 14, 14
Group 2: 10, 12, 12, 11, 13, 9
a. Is the difference significant? b. Write out the alternate hypothesis for this problem. c. What will Cohen’s d indicate? What is its value?
7. A researcher wants to analyze differences in patients’ attitudes about the care they receive when they receive a rebate on their bill. One group receives a rebate; the other group does not.
a. What are the IV and DV? b. The IV data in an independent t must have what scale? c. The DV data in an independent t must have what scale?
8. What is the relationship between sample size and the likelihood of a significant finding?
9. A social worker randomly selects 20 cases from among the unemployed and assigns half to each of two groups. The first group is a control group and receives no special treatment. Those in the second group receive a weekly newsletter about job opportunities. During the study, one of the people in Group 2 disappears. The data indicates the number of weeks unemployed. The question is whether Group 2 people are unemployed for less time than the Group 1 people:
suk85842_05_c05.indd 179 10/23/13 1:18 PM
CHAPTER 5Chapter Exercises
Group 1: 10, 12, 14, 14, 15, 15, 15, 16, 16, 18
Group 2: 6, 9, 9, 10, 12, 12, 12, 13, 14
a. Is the test one-tailed, or two-tailed? b. What is the alternate hypothesis? c. Is t significant? If so, d. Is the result important (d)? e. With .95 confidence, what is the interval between the means of the populations
represented by these samples?
10. Twenty prison inmates who volunteer to participate in a study are given scholastic aptitude tests and found to have M 5 44.552, s 5 6.577. A group of 20 nonoffending adults randomly selected from a local mall are given the same test and have M 5 49.979, s 5 5.101. What is a .95 confidence interval of the difference between the means of the populations represented by these two groups?
Analyzing the Research Review the article abstracts provided below. You can then access the full articles via your university’s online library portal to answer the critical thinking questions. Answers can be found in the answers appendix.
Using Sample Groups Data for a Test Instruments Study
Castro-Villarreal, F. (2010). Prior exposure to the Rorschach test and differences in selected Rorschach variables. Journal of Projective Psychology & Mental Health, 17(2), 126–134.
Article Abstract
The present study examined the effect of prior exposure to the Rorschach on selected Ror- schach variables. An availability sample of Mexican-American undergraduate students (N 5 59) was randomly assigned to one of two conditions: (a) an experimental group (n 5 28) or (b) a control group (n 5 31). Experimental participants were exposed to Card I of the Rorschach at Time 1 and one week later at Time 2. Control participants were exposed to the same card at Time 2 only. Both groups provided demographic information. Response differences were investigated using independent t-test analyses. As hypothe- sized due to increased media and pop-culture exposure and the release of all ten cards on Wikipedia, the majority of participants reported having seen the Rorschach before. However, between-group differences on the dependent variables were not significant; thus prior exposure did not differentially impact response on selected Rorschach vari- ables. Implications are presented with regard to the validity of Rorschach administration, scoring, and interpretation, support for the appropriateness of re-testing, test security issues, and to undercut efforts to opt out of being administered the Rorschach on the basis of prior exposure. Research with a Mexican-American sample is also noted as a much needed addition to the Rorschach literature base.
suk85842_05_c05.indd 180 10/23/13 1:18 PM
CHAPTER 5Chapter Exercises
Critical Thinking Questions
1. The experiment used unequal sample sizes in running independent-samples t-test. Would this be an issue? If so, how can a researcher overcome this?
2. Did the research account for homogeneity of the groups? What indication in the article was given? How can you test for this as a researcher?
3. What are the written and statistical notations for the null and alternative hypotheses for the first hypothesis in this study? Will we retain or reject the null hypotheses?
4. The fifth hypothesis predicted that the experimental group would give fewer space responses than the control group as measured by the total number of space responses given by each participant. Would a one-tailed or two-tailed significance test be used and why?
5. What type of research design is this: experimental, quasiexperimental, or nonex- perimental? Give a reason for your answer.
Using Sample Groups Data for an Education Style Study
Mebane, M., Porcelli, R., Iannone, A., Attanasio, C., & Francescato, D. (2008). Evaluation of the efficacy of affective education online training in promoting academic and professional learning and social capital. International Journal of Human-Computer Interaction, 24(1), 68–86.
Article Abstract
This study aimed to compare the efficacy of face-to-face and online education seminars in the professional training of psychology majors and in promoting the development of social capital. Forty-four university students, balanced by gender, age, and academic achievement were divided in two groups, online and face to face, taught by the same teacher. Individual learning scales were administered and students were tested on group processes tasks to measure professional skills acquisition. A follow-up interview 9 months after the end of the seminar was undertaken to assess whether social bonds, formed dur- ing the seminars, had lasted over time. An analysis of variance and Mann-Whitney test were used to analyze the data. Results show that students of both groups increased their academic knowledge and social capital, but Computer Supported Collaborative Learning students acquired more group observation skills.
suk85842_05_c05.indd 181 10/23/13 1:18 PM
CHAPTER 5Chapter Exercises
Critical Thinking Questions
1. When comparing the mean rank of online and face-to-face participants in facilitat- ing the circle time, would it have been more efficient to use the Wilcoxon rank-sum test instead of the Mann-Whitney U-test?
2. What type of test is the Mann-Whitney U-test? Why use this type of test instead of a t-test?
3. The Mann-Whitney U-test shows differences between the two groups in favor of the online group, specifically the four main dimensions used to analyze group dynamics. Is the significance value greater than or less than .05 to make this test significant?
suk85842_05_c05.indd 182 10/23/13 1:18 PM