WEEK 4 PPOL 505 Exercise 3
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS
7: MEDIA LIBRARY
Premium Videos
Core Concepts in Stats Video
· Probability and Hypothesis Testing
Lightboard Lecture Video
Difficulty Scale
(don’t plan on going out tonight)
WHAT YOU WILL LEARN IN THIS CHAPTER
· Understanding the difference between a sample and a population
· Understanding the importance of the null and research hypotheses
· Using criteria to judge a good hypothesis
SO YOU WANT TO BE A SCIENTIST
You might have heard the term hypothesis used in other classes. You may even have had to formulate one for a research project you did for another class, or you may have read one or two in a journal article. If so, then you probably have a good idea what a hypothesis is. For those of you who are unfamiliar with this often-used term, a hypothesis is basically “an educated guess.” Its most important role is to reflect the general problem statement or question that was the motivation for asking the research question in the first place.
That’s why taking the care and time to formulate a really precise and clear research question is so important. This research question will guide your creation of a hypothesis, and in turn, the hypothesis will determine the techniques you will use to test it and answer the question that was originally asked.
So, a good hypothesis translates a problem statement or a research question into a format that makes it easier to examine. This format is called a hypothesis. We will talk about what makes a hypothesis a good one later in this chapter. Before that, let’s turn our attention to the difference between a sample and a population. This is an important distinction, because while hypotheses usually describe a population, hypothesis testing deals with a sample and then the results are generalized to the larger population. We also address the two main types of hypotheses (the null hypothesis and the research hypothesis). But first, let’s formally define some simple terms that we have used earlier in Statistics for People Who (Think They) Hate Statistics.
SAMPLES AND POPULATIONS
As a good scientist, you would like to be able to say that if Method A is better than Method B in your study, this is true forever and always and for all people in the universe, right? Indeed. And, if you do enough research on the relative merits of Methods A and B and test enough people, you may someday be able to say that.
But don’t get too excited, because it’s unlikely you will ever be able to speak with such confidence. It takes too much money ($$$) and too much time (all those people!) to do all that research, and besides, it’s not even necessary. Instead, you can just select a representative sample from the population and test your hypothesis about the relative merits of Methods A and B on that sample.
Given the constraints of never enough time and never enough research funds, with which almost all scientists live, the next best strategy is to take a portion of a larger group of participants and do the research with that smaller group. In this context, the larger group is referred to as a population, and the smaller group selected from that population is referred to as a sample. Statistics as a field, in fact, is all about looking at a sample and inferring to the population it represents. Indeed, the word statistic technically means a number that describes a sample (and the word we use for a number that describes a population is parameter).
A measure of how well a sample approximates the characteristics of a population is called sampling error . Sampling error is basically the difference between the values of the sample statistic and the population parameter. The higher the sampling error, the less precision you have in sampling, and the more difficult it will be to make the case that what you find in the sample indeed reflects what you expected to find in the population. And just as there are measures of variability regarding distributions, so are there measures of the variability of this difference between a sample measure and a population measure. This is often called the standard error—it’s basically the standard deviation of the difference between these two values.
Samples should be selected from populations in such a way that the sample matches as closely as possible the characteristics of the population. You know, to minimize the sampling error. The goal is to have the sample be as much like the population as possible. The most important implication of ensuring similarity between the two is that the research results based on the sample can be generalized to the population. When the sample accurately represents the population, the results of the study are said to have a high degree of generalizability.
A high degree of generalizability is an important quality of good research because it means that the time and effort (and $$$) that went into the research may have implications for groups of people other than the original participants.
It’s easy to equate “big” with “representative.” Keep in mind that it is far more important to have an accurately representative sample than it is to have a big sample (people often think that big is better—only true on Thanksgiving, by the way). Having lots and lots of participants in a sample may be very impressive, but if the participants do not represent the larger population, then the research will have little value.
THE NULL HYPOTHESIS
Okay. So we have a sample of participants selected from a population, and to begin the test of our research hypothesis, we first formulate the null hypothesis .
The null hypothesis is an interesting little creature. If it could talk, it would say something like “I represent no relationship between the variables that you are studying.” In other words, null hypotheses are statements of equality demonstrated by the following real-life null hypotheses taken from a variety of popular social and behavioral science journals. Names have been changed to protect the innocent.
· There will be no difference between the average score of 9th graders and the average score of 12th graders on a memory test.
· There is no difference between the effectiveness of community-based, long-term care and the effectiveness of in-home, long-term care in promoting the social activity of older adults when measured using the Margolis Scale of Social Activities.
· There is no relationship between reaction time and problem-solving ability.
· There is no difference between high- and low-income families in the amount of assistance families offer their children in school-related activities.
What these four null hypotheses have in common is that they all contain a statement that two or more things are equal or unrelated (that’s the “no difference” and “no relationship” part) to each other.
The Purposes of the Null Hypothesis
What are the basic purposes of the null hypothesis? The null hypothesis acts as both a starting point and a benchmark against which the actual outcomes of a study can be measured.
Let’s examine each of these purposes in more detail.
First, the null hypothesis acts as a starting point because it is the state of affairs that is accepted as true in the absence of any other information. For example, let’s look at the first null hypothesis we stated earlier:
There will be no difference between the average score of 9th graders and the average score of 12th graders on a memory test.
Given absolutely no other knowledge of 9th and 12th graders’ memory skills, you have no reason to believe that there will be differences between the two groups, right? If you know nothing about the relationship between these variables, the best you can do is guess. And that’s taking a chance. You might speculate as to why one group might outperform another, using theory or common sense, but if you have no evidence a priori (“from before”), then what choice do you have but to assume that they are equal?
This lack of a relationship as a starting point is a hallmark of this whole topic. Until you prove that there is a difference, you have to assume that there is no difference. And a statement of no difference or no relationship is exactly what the null hypothesis is all about. Such a statement ensures that (as members of the scientific community) we are starting on a level playing field with no bias toward one or the other direction as to how the test of our hypothesis will turn out.
Furthermore, if there are any differences between these two groups, then you have to assume that these differences are due to the most attractive explanation for differences between any groups on any variable—chance! That’s right: Given no other information, chance is always the most likely and attractive explanation for the observed differences between two groups or the relationship between variables. Chance explains what we cannot. You might have thought of chance as the odds of winning that $5,000 jackpot at the penny slots, but we’re talking about chance as all that other “stuff” that clouds the picture and makes it even more difficult to understand the “true” nature of relationships between variables.
For example, you could take a group of soccer players and a group of football players and compare their running speeds, thinking about whether playing soccer or playing football makes athletes faster. But look at all the factors we don’t know about that could contribute to differences. Who is to know whether some soccer players practice more or whether some football players are stronger or whether both groups are receiving additional training of different types?
What’s more, perhaps the way their speed is being measured leaves room for chance; a faulty stopwatch or a windy day can contribute to differences unrelated to true running speed. As good researchers, our job is to eliminate chance factors from explaining observed differences and to evaluate other factors that might contribute to group differences, such as intentional training or nutrition programs, and see how they affect speed.
The point is, if we find differences between groups and the differences are not due to training, then we have no choice but to attribute the difference to chance. And, by the way, you might find it useful to think of chance as being somewhat equivalent to the idea of error. When we can control sources of error, the likelihood that we can offer a meaningful explanation for some outcome increases.
The second purpose of the null hypothesis is to provide a benchmark against which observed outcomes can be compared to see if these differences are due to some other factor. The null hypothesis helps to define a range within which any observed differences between groups can be attributed to chance (which is the null hypothesis’s contention) or are due to something other than chance (which perhaps would be the result of the manipulation of some variable, such as training in our example of the soccer and football players).
Most research studies have an implied null hypothesis, and you may not find it clearly stated in a research report or journal article. Instead, you’ll find the research hypothesis clearly stated, and this is where we now turn our attention.
CORE CONCEPTS IN STATS VIDEO
Probability and Hypothesis Testing
Hypothesis testing involves analyzing data to test our research hypothesis. We've collected data and are ready to analyze it. We'll need to complete three steps. First, examine the data by creating tables and figures in order to understand it. Second, calculate descriptive statistics-- measures of central tendency and variability-- to describe the data. Finally, calculate inferential statistics to test our research hypothesis. Think of this process like a criminal trial. State an initial assumption and a mutually exclusive alternative to this assumption. Analyze the collected evidence, the data. In this scenario, the jury has two decisions to make when deciding whether or not to reject the initial assumption-- guilty or not guilty. These three steps-- making an initial assumption-- known as the null hypothesis-- and an alternative to that assumption; analyzing the collected data; and making a decision whether to reject or accept the initial assumption-- or null hypothesis-- are all critical parts of the process of hypothesis testing. The decision you make is based on the probability of obtaining a value of a statistic. If your statistic has a low probability of occurring, the likelihood of this happening is so low because of chance you reject the initial assumption, or null hypothesis. This is similar to what happens in jury trials, where if the jury decides that the likelihood an innocent person did these things is so low, they'll reject the presumption of innocence and instead decide on a guilty verdict. What do we mean by "low probability"? If the probability of a statistic is less than to reject the null hypothesis. Think about flipping coins. The first step is to state the null and the alternative hypotheses. In flipping coins, we start with the null hypothesis assumption that coins are fair. There is an equal probability of getting heads or tails. The mutually exclusive alternative to this assumption is that coins are not fair and there is not an equal probability of getting a heads or tails. The second step would be to calculate a statistic. In flipping coins, you could flip a sample of coins and then count the number of heads. The next step is to make a decision about the null hypothesis. You can either reject or accept the null hypothesis-- that coins are fair. If the number of heads in your sample of coin flips has a low probability of occurring-- less than that coins are fair in favor of the alternative hypothesis, that coins are not fair. We use these examples to illustrate the process of hypothesis testing. But the reality is that research isn't about flipping coins and counting the number of heads. And that's what research is about-- collecting data from samples.
THE RESEARCH HYPOTHESIS
Whereas a null hypothesis is usually a statement of no relationship between variables or that a certain value is zero, a research hypothesis is usually a definite statement that a relationship exists between variables. For example, for each of the null hypotheses stated earlier, here is a corresponding research hypothesis. Notice that we said “a” and not “the” corresponding research hypothesis because there certainly could be more than one research hypothesis for any one null hypothesis.
· The average score of 9th graders is different from the average score of 12th graders on a memory test.
· The effectiveness of community-based, long-term care is different from the effectiveness of in-home, long-term care in promoting the social activity of older adults when measured using the Margolis Scale of Social Activities.
· Slower reaction time and problem-solving ability are positively related.
· There is a difference between high- and low-income families in the amount of assistance families offer to their children in school-related activities.
Each of these four research hypotheses has one thing in common: They are all statements of inequality. They posit a relationship between variables and not an equality, as the null hypothesis does.
The nature of this inequality can take two different forms—a directional or a nondirectional research hypothesis. If the research hypothesis posits no direction to the inequality (such as only saying “different from”), the hypothesis is a nondirectional research hypothesis. If the research hypothesis posits a direction to the inequality (such as “more than” or “less than”), the research hypothesis is a directional research hypothesis.
The Nondirectional Research Hypothesis
A nondirectional research hypothesis reflects a difference between groups, but the direction of the difference is not specified.
For example, the following research hypothesis is nondirectional in that the direction of the difference between the two groups is not specified:
The average score of 9th graders is different from the average score of 12th graders on a memory test.
The hypothesis is a research hypothesis because it states that there is a difference, and it is nondirectional because it says nothing about the direction of that difference.
A nondirectional research hypothesis, like this one, would be represented by the following equation:
(7.1)
H1:¯¯¯X9≠¯¯¯X12,H1:X¯9≠X¯12,
where
· H1 represents the symbol for the first (of possibly several) research hypotheses,
· ¯¯¯X9X¯9 represents the average memory score for the sample of 9th graders,
· ¯¯¯X12X¯12 represents the average memory score for the sample of 12th graders, and
· ≠ means “is not equal to.”
The Directional Research Hypothesis
A directional research hypothesis reflects a difference between groups, and the direction of the difference is specified.
For example, the following research hypothesis is directional because the direction of the difference between the two groups is specified:
The average score of 12th graders is greater than the average score of 9th graders on a memory test.
One is hypothesized to be greater than (not just different from) the other.
Examples of two other directional hypotheses are these:
· A is greater than B (or A > B).
· B is greater than A (or A < B).
These both represent inequalities (greater than or less than). A directional research hypothesis such as the one described earlier, where 12th graders are hypothesized to score better than 9th graders, would be represented by the following equation:
(7.2)
H1:¯¯¯X12>¯¯¯X9,H1:X¯12>X¯9,
where
· H1 represents the symbol for the first (of possibly several) research hypotheses,
· ¯¯¯X9X¯9 represents the average memory score for the sample of 9th graders,
· ¯¯¯X12X¯12 represents the average memory score for the sample of 12th graders, and
· > means “is greater than.”
What is the purpose of the research hypothesis? It is this hypothesis that is directly tested as an important step in the research process. The results of this test are compared with what you expect if you were wrong (reflecting the null hypothesis) to see which of the two is the more attractive explanation for any differences between groups or variables you might observe.
Table 7.1 gives the four null hypotheses and accompanying directional and nondirectional research hypotheses.
Another way to talk about directional and nondirectional hypotheses is to talk about one- and two-tailed tests. A one-tailed test (reflecting a directional hypothesis) posits a difference in a particular direction, such as when we hypothesize that Group 1 will score higher than Group 2. A two-tailed test (reflecting a nondirectional hypothesis) posits a difference but in no particular direction. We talk about “tails” because we often understand statistical results by applying them to a normal curve that has two “tails.” The importance of this distinction begins when you test different types of hypotheses (one- and two-tailed) and establish probability levels for rejecting or not rejecting the null hypothesis. More about this in Chapters 8 and 9. Promise.
Table 7.1 ⬢ Null Hypotheses and Corresponding Research Hypotheses
|
Null Hypothesis |
Nondirectional Research Hypothesis |
Directional Research Hypothesis |
|
There will be no difference in the average score of 9th graders and the average score of 12th graders on a memory test. |
Twelfth graders and 9th graders will differ on a memory test. |
Twelfth graders will have a higher average score on a memory test than will 9th graders. |
|
There is no difference between the effectiveness of community-based, long-term care for older adults and the effectiveness of in-home, long-term care for older adults when measured using the Margolis Scale of Social Activities. |
The effect of community-based, long-term care for older adults is different from the effect of in-home, long-term care for older adults when measured using the Margolis Scale of Social Activities. |
Older adults exposed to community-based, long-term care score higher on the Margolis Scale of Social Activities than do older adults receiving in-home, long-term care. |
|
There is no relationship between reaction time and problem-solving ability. |
There is a relationship between reaction time and problem-solving ability. |
There is a positive relationship between reaction time and problem-solving ability. |
|
There is no difference between high- and low-income families in the amount of assistance families offer their children in educational activities. |
The amount of assistance offered by high-income families to their children in educational activities is different from the amount of support offered by low-income families to their children in educational activities. |
The amount of assistance offered by high-income families to their children in educational activities is more than the amount of support offered by low-income families to their children in educational activities. |
Some Differences Between the Null Hypothesis and the Research Hypothesis
Besides the null hypothesis usually representing an equality and the research hypothesis usually representing an inequality, the two types of hypotheses differ in several other important ways.
First, for a bit of review, the two types of hypotheses differ in that one (the null hypothesis) usually states that there is no relationship between variables (an equality), whereas the research hypothesis usually states that there is a relationship between the variables (an inequality). This is the primary difference.
Second, null hypotheses always refer to the population, whereas research hypotheses usually refer to the sample. We select a sample of participants from a much larger population. We then try to generalize the results from the sample back to the population. If you remember your basic philosophy and logic (you did take these courses, right?), you’ll remember that going from small (as in a sample) to large (as in a population) is a process of induction.
Third, because the entire population cannot be directly tested (again, it is impractical, uneconomical, and often impossible), you can’t say with 100% certainty that there is no real difference between segments of the population on some variable. Rather, you have to infer it (indirectly) from the results of the test of the research hypothesis, which is based on the sample. Hence, the null hypothesis must be indirectly tested, and the research hypothesis can be directly tested.
Fourth, in statistics, the null hypothesis is written with an equal sign, while the research hypothesis is written with a not equal to, greater than, or less than sign.
Fifth, null hypotheses are always written using Greek symbols, and research hypotheses are always written using Roman symbols. Thus, the null hypothesis that the average score for 9th graders is equal to that of 12th graders is represented like this:
(7.3)
H0:μ9=μ12,H0:μ9=μ12,
where
· H0 represents the null hypothesis,
· µ9 represents the theoretical average for the population of 9th graders, and
· µ12 represents the theoretical average for the population of 12th graders.
The research hypothesis that the average score for a sample of 12th graders is greater than the average score for a sample of 9th graders is shown in Formula 7.2 (presented earlier).
Finally, because you cannot directly test the null hypothesis, it is an implied hypothesis. But the research hypothesis is explicit and is stated as such. This is another reason why you rarely see null hypotheses stated in research reports and will very often see a statement (be it in symbols or words) of the research hypothesis.
LIGHTBOARD LECTURE VIDEO
Hypothesis Testing
Hypothesis testing is a weird sort of statistics thing. We pretend that we think something is true, when in fact we think the opposite is true. Let me show you what I mean. Here is a correlation coefficient. And your research hypothesis might be hey, A and B are correlated. In reality, the correlation is either greater than isn't. It's either greater than Your null hypothesis is going to be that it's equal to But what you really think is that it's greater than So you have what's really out there, and you can't affect that at all. Then you have your guesses. And your guesses are either going to be that it's greater than exactly equal to All you have to prove is that it's greater than You don't have to prove it's exactly .26 or any of that stuff. Now the guess that says it's greater than your research hypothesis. That's what you really think. The guess that says it's equal to that's your null hypothesis. So you've got these two situations-- whatever is true in the real world and whatever your guess was. So there's really only four possibilities, right? You could be right. You could be wrong. There's two ways to be right and two ways to be wrong. Let's think. If your guess is that it's the correlation is greater than and it really is greater than If your guess is that is greater than and it's not greater than then that's really sad and you're wrong. If your guess is the null hypothesis, which is how we typically do things, you're guessing that the correlation is exactly equal to And if it really is, that, strangely enough, also makes you really happy. But if you're guessing that the correlation is exactly equal to that makes you sad. So a researcher usually has this research hypothesis-- in this case, that the correlation is greater than And if they're right, that's the best possible outcome.
WHAT MAKES A GOOD HYPOTHESIS?
You now know that hypotheses are educated guesses—a starting point for a lot more to come. As with any guess, some hypotheses are better than others right from the start. We can’t stress enough how important it is to ask the question you want answered and to keep in mind that any hypothesis you present is a direct extension of the original question you asked. This question will reflect your personal interests and motivation and your understanding of what research has been done previously. With that in mind, here are criteria you might use to decide whether a hypothesis you read in a research report or one that you formulate is acceptable.
To illustrate, let’s use an example of a study that examines the effects of providing afterschool child care to employees who work late on the parents’ adjustment to work. Here is a well-written directional research hypothesis:
Parents who enroll their children in afterschool programs will have a more positive attitude toward work, as measured by the Attitude Toward Work survey, than will parents who do not enroll their children in such programs.
Here are the criteria.
First, a good hypothesis is stated in declarative form and not as a question. (It ends with a period or, if you’re really excited, an exclamation mark!) While the preceding hypothesis may have started in the researcher’s mind as the question “What are the benefits of afterschool programs at work … ?” it was not posed because hypotheses are most effective when they make a clear and forceful statement.
Second, a good hypothesis posits an expected relationship between variables. The hypothesis in our example clearly describes the expected relationships among afterschool child care and parents’ attitude. These variables are being tested to see if one (enrollment in the afterschool program) has an effect on the others (attitude).
Notice the word expected in the second criterion. Defining an expected relationship is intended to prevent a fishing trip to look for any relationships that may be found (sometimes called the “shotgun” approach), which may be tempting but is not very productive. You do get somewhere using the shotgun approach, but because you don’t know where you started, you have no idea where you end up.
In the fishing-trip approach, you throw out your line and take anything that bites. You collect data on as many things as you can, regardless of your interest or even whether collecting the data is a reasonable part of a scientific investigation. Or, to use a shotgun analogy, you load up them guns and blast away at anything that moves, and you’re bound to hit something. The problem is, you may not want what you hit, and worse, you may miss what you want to hit, and worst of all (if possible), you may not know what you hit! Big data and data mining (see Chapter 19) anyone? Good researchers do not want just anything they can catch or shoot. They want specific results. To get them, researchers need their opening questions and hypotheses to be clear, forceful, and easily understood.
Third, hypotheses reflect the theory or literature on which they are based. As you read in Chapter 1, the accomplishments of scientists rarely can be attributed to just their own hard work. Their accomplishments are always due, in part, to many other researchers who came before them and laid the framework for later explorations. A good hypothesis reflects this, in that it has a substantive link to existing literature and theory. In the aforementioned example, let’s assume that there is literature indicating that parents are more comfortable knowing their children are being cared for in a structured environment and that parents can then be more productive at work. Knowing this would allow one to hypothesize that an afterschool program would provide the security parents are looking for. In turn, this allows them to concentrate on working rather than calling or texting to find out whether Rachel or Gregory got home safely.
Fourth, a hypothesis should be brief and to the point. You want your hypothesis to describe the relationship between variables in a declarative form and to be as direct and explicit as possible. The more to the point it is, the easier it will be for others (such as your master’s thesis or doctoral dissertation committee members!) to read your research and understand exactly what you are hypothesizing and what the important variables are. In fact, when people read and evaluate research (as you will learn more about later in this chapter), the first thing many of them do is find the hypotheses to get a good idea as to the general purpose of the research and how things will be done. A good hypothesis tells you both of these things.
Fifth, good hypotheses are testable hypotheses—and testable hypotheses contain variables that can be measured. This means that you can actually carry out the intent of the question reflected by the hypothesis. You can see from our example hypothesis that the important comparison is between parents who have enrolled their child in an afterschool program and those who have not. Then, attitude will be measured. Identifying these two groups of parents and measuring the variable of attitude are both reasonable objectives. Attitude is measured by the Attitude Toward Work survey (a fictitious title, but you get the idea), and let’s assume that the validity and reliability of that measure have been established. Think how much harder things would be if the hypothesis were stated as follows: Parents who enroll their children in afterschool care feel better about their jobs. Although you might get the same message, the results might be more difficult to interpret given the ambiguous nature of the phrase feel better.
In sum, research hypotheses should
· be stated in declarative form,
· posit a relationship between variables,
· reflect a theory or a body of literature on which they are based,
· be brief and to the point, and
· be testable.
When a hypothesis meets each of these five criteria, you know that it is good enough to continue with a study that has a good chance of answering the research question from which the hypothesis was derived.
Real-World Stats
You might think that the use of null and research hypotheses is beyond question in the world of scientific research as the best way to social science. Well, you would be wrong. Here’s just a sampling of some of the articles published in professional journals over the past few years that raise concerns. Most aren’t ready to throw out hypothesis testing as the common way of conducting science, but it’s not a bad idea to every now and then question whether the method is always the best model to use.
Want to know more? Go online or to the library and find …
· Jeff Gill raises a variety of issues that call into question the use of the null hypothesis significance-testing model as the best way to evaluate hypotheses. He focuses on political science and how the use of the technique is widely misunderstood. Major problems are discussed and some solutions offered. You can find the article by looking for this reference: Gill, J. (1999). The insignificance of null hypothesis testing. Politics Research Quarterly, 52, 647–674.
· Howard Wainer and Daniel Robinson take these criticisms one step further and suggest that the historical use of such procedures was reasonable but that modifications to significance testing and the interpretations of outcomes would serve modern science well. Basically, they are saying that other tools (such as effect size, which we discuss in Chapter 11) should be used to evaluate outcomes. Read all about it in Wainer, H., & Robinson, D. H. (2003). Shaping up the practice of null hypothesis significance testing. Educational Researcher, 32, 22–30.
· Finally, in the really interesting article “A Vast Graveyard of Undead Theories: Publication Bias and Psychological Science’s Aversion to the Null,” Christopher Ferguson and Moritz Heene raise the very real issue that many journals refuse to publish outcomes where a null result is found (such as no difference between groups). They believe that when such outcomes are not published, false theories are never taken to task and never tested for their truthfulness. Thus, the replicability of science (a very important aspect of the entire scientific process) is compromised. You can find more about this in Ferguson, C. J., & Heene, M. (2012). A vast graveyard of undead theories: Publication bias and psychological science’s aversion to the null. Perspectives on Psychological Science, 7, 555–561.
We hope you get the idea from the preceding examples that science is not black-and-white, cut-and-dried, or any other metaphor indicating there is only a right way and only a wrong way of doing things. Science is an organic and dynamic process that is always changing in its focus, methods, and potential outcomes.
Summary
A central component of any scientific study is the hypothesis, and the different types of hypotheses (null and research) help form a plan for answering the questions asked by the purpose of our research. The null hypothesis provides a starting point and benchmark for research, and we use it as a comparison as we evaluate the acceptability of the research hypothesis. Now let’s move on to how null hypotheses are actually tested.
Time to Practice
1. Go to the library online or in person and select five empirical research articles (those that contain actual data) from your area of interest. For each one, list the following:
a. What is the null hypothesis (implied or explicitly stated)?
b. What is the research hypothesis (implied or explicitly stated)?
c. And what about those articles with no hypothesis clearly stated or implied? Identify those articles and see if you can write a research hypothesis for them.
2. While you’re looking through the journal, select two other articles from an area in which you are interested and write a brief description of the sample and how it was selected from the population. Be sure to include some words about whether the researchers did an adequate job of selecting the sample; be able to justify your answer.
3. For the following research questions, create one null hypothesis, one directional research hypothesis, and one nondirectional research hypothesis.
a. What are the effects of attention span on out-of-seat classroom behavior?
b. What is the relationship between the quality of a marriage and the quality of the spouses’ relationships with their siblings?
c. What is the best way to treat an eating disorder?
4. Go back to the five hypotheses that you found in Question 1 and evaluate each using the five criteria that were discussed at the end of the chapter.
5. What kinds of problems might using a poorly written or ambiguous research hypothesis introduce?
6. What is the null hypothesis, and what is one of its important purposes? How does it differ from the research hypothesis?
7. What is chance in the context of a research hypothesis? And what do we do about chance in our research studies?
8. Why does the null hypothesis presume no relationship between variables?
Student Study Site
Get the tools you need to sharpen your study skills! Visit edge.sagepub.com/salkindfrey7e to access practice quizzes, eFlashcards, original and curated videos, data sets, and more!
8 PROBABILITY AND WHY IT COUNTS FUN WITH A BELL-SHAPED CURVE
8: MEDIA LIBRARY
Premium Videos
Core Concepts in Stats Video
Lightboard Lecture Video
Time to Practice Video
Difficulty Scale
(not too easy and not too hard but very important)
WHAT YOU WILL LEARN IN THIS CHAPTER
· Understanding probability and why it is basic to the understanding of statistics
· Applying the characteristics of the normal, or bell-shaped, curve
· Computing and interpreting z scores and understanding their importance
WHY PROBABILITY?
And here you thought this was a statistics class! Ha! Well, as you will learn in this chapter, the study of probability is the basis for the normal curve (much more on that later) and the foundation for inferential statistics.
Why? First, the normal curve provides us with a basis for understanding the probability associated with any possible outcome (such as the chance of getting a certain score on a test or the chance of a coin flip coming up “heads”).
Second, the study of probability is the basis for determining the degree of confidence we have in stating that a particular finding or outcome is “true.” Or, better said, that an outcome (like an average score) may not have occurred due to chance alone. For example, let’s compare Group A (which participates in 3 hours of extra swim practice each week) and Group B (which has no extra swim practice each week). We find that Group A differs from Group B on a test of fitness, but can we say that the difference is more than would be expected just randomly (and, therefore, maybe due to the extra practice)? The tools that the study of probability provide allow us to determine the exact mathematical likelihood that the difference is not due to chance but something else.
All that time we spent on hypotheses in the previous chapter was time well spent. Once we put together our understanding of what a null hypothesis and a research hypothesis are with the ideas that are the foundation of probability, we’ll be in a position to discuss how likely certain outcomes (formulated by the research hypothesis) are.
THE NORMAL CURVE (AKA THE BELL-SHAPED CURVE)
What is a normal curve? Well, the normal curve (also called a bell-shaped curve , or bell curve) is a visual representation of a distribution of scores that has three characteristics. Each of these characteristics is illustrated in Figure 8.1.
Figure 8.1 ⬢ The normal, or bell-shaped, curve
The normal curve represents a distribution of values in which the mean, the median, and the mode are equal to one another. You probably remember from Chapter 4 that if the median and the mean are different, then the distribution is skewed in one direction or the other. The normal curve is not skewed. It’s got a nice hump (only one), and that hump is right in the middle.
Second, the normal curve is perfectly symmetrical about the mean. If you fold one half of the curve along its center line, the two halves would lie perfectly on top of each other. They are identical. One half of the curve is a mirror image of the other.
Finally (and get ready for a mouthful), the tails of the normal curve are asymptotic —a big word. What it means is that they come closer and closer to the horizontal axis but never touch. See if you have some idea (in advance, because we will talk about it later) why this is so important, because it’s really a cornerstone of all this probability stuff.
The normal curve’s shape looks kind of like a bell; this gives the graph its other name, the bell-shaped curve.
When one of your devoted authors, Neil, was knee-high, he always wondered how the tail of a normal curve can approach the horizontal or x-axis yet never touch it. Try this. Place two pencils one inch apart and then move them closer (by half) so they are one-half inch apart, and then closer (one-quarter inch apart), and closer (one-eighth inch apart). They continually get closer, right? But they never (and never will) touch. Same thing with the tails of the curve. The tail slowly approaches the axis on which the curve “rests,” but the tail and the axis can never really touch.
Why is this important? As you will learn later in this chapter, the fact that the tails never touch the x-axis means that there is an infinitely small likelihood that a score can be obtained that is very extreme (way out under the left or right tail of the curve). If the tails did touch the x-axis, then the likelihood that a very extreme score could be obtained would be nonexistent.
Hey, That’s Not Normal!
We hope your next question is, “But there are plenty of sets of scores where the distribution is not normal or bell shaped, right?” Yes (and it’s a big but).
First, most of the time when scores are allowed to vary and we measure a lot of people, the shape of the distribution of all those people will look pretty normal. In nature in general, many things are distributed with the characteristics that we call normal. That is, there are lots of events or occurrences right in the middle of the distribution but relatively few on each end, as you can see in Figure 8.2, which shows the distribution of IQ and height in the general population.
Figure 8.2 ⬢ How scores can be distributed
Even if individual scores aren’t normally distributed, though, researchers tend to make statistical inferences about summaries of scores, like measures of central tendencies, and the distribution of those values will tend to be normal regardless of the distribution of individual scores. When we deal with big sample sizes (more than 30), and we take repeated samples from a population, the means of those samples will distribute themselves pretty closely to the shape of a normal curve. This is very important, because a lot of what we do when we talk about inferring from a sample to a population is based on the assumption that the means of those samples are distributed normally. And that’s just another way to say that the sample’s characteristics continue to approach those characteristics of the population.
For example, there are very few people who are brilliant and very few who are intellectually or cognitively at the absolute bottom of the group. There are lots who are right in the middle and fewer as we move toward the tails of the curve. Likewise, there are relatively few very tall people and relatively few very short people, but lots of people fall right in the middle. In both of these examples, the distribution of intellectual skills and of height approximates a normal distribution.
Consequently, those events that tend to occur in the extremes of the normal curve have a smaller probability associated with each occurrence. We can say with a great deal of confidence that the odds of any one person (whose height we do not know beforehand) being very tall (or very short) are just not very great. But we know that the odds of any one person being average in height, or right around the middle, are pretty good. Those events that tend to occur in the middle of the normal curve have a higher probability of occurring than do those in the extreme. And this is true for height, weight, general intelligence, weight-lifting ability, number of Star Wars action figures owned, and on and on… .
More Normal Curve 101
You already know the three main characteristics that make a curve normal or make it appear bell shaped, but there’s more to it than that. Take a look at the curve in Figure 8.3.
Figure 8.3 ⬢ A normal curve divided into different sections
The distribution represented here has a mean of 100 and a standard deviation of 10. We’ve added numbers across the x-axis that represent the distance in standard deviations from the mean for this distribution. You can see that the x-axis (representing the scores in the distribution) is marked from 70 through 130 in increments of 10 (which is the standard deviation for the distribution), the value of 1 standard deviation. We made up these numbers (100 and 10), so don’t go nuts trying to find out where we got them from.
So, a quick review tells us that this distribution has a mean of 100 and a standard deviation of 10. Each vertical line within the curve separates the curve into a section, and each section is bound by particular scores. For example, the first section to the right of the mean of 100 is bound by the scores 100 and 110, and this section represents 1 standard deviation from the mean (which is 100).
And below each raw score (70, 80, 90, 100, 110, 120, and 130), you’ll find a corresponding standard deviation (−3, −2, −1, 0, +1, +2, and +3). Each standard deviation in our example is 10 points. So 1 standard deviation above the mean (which is 100) is the mean plus 10 points or 110. One standard deviation below the mean is the mean minus 10 points or 90. Not so hard, is it?
If we extend this argument further, then you should be able to see how the range of scores represented by a normal distribution with a mean of 100 and a standard deviation of 10 is 70 through 130 (which includes −3 to +3 standard deviations).
Now, here’s a big fact that is always true about normal distributions, means, and standard deviations: For any distribution of scores (regardless of the value of the mean and standard deviation), if the scores are distributed normally, almost 100% of the scores will fit between −3 and +3 standard deviations from the mean. This is very important, because it applies to all normal distributions. Because the rule does apply (once again, regardless of the value of the mean or standard deviation), distributions can be compared with one another. We’ll get to that again later.
With that said, we’ll extend this idea a bit more. If the distribution of scores is normal, we can also say that between different points along the x-axis (such as between the mean and 1 standard deviation), a certain percentage of cases will fall. In fact, between the mean (which in this case is 100—got that yet?) and 1 standard deviation above the mean (which is 110), about 34% (actually 34.13%) of all cases in the distribution of scores will fall. That’s about a third of all scores. Because the normal curve is normal, this is true for going the other direction, too. About a third of all scores fall between the mean and 1 standard deviation below the mean! This is a fact you can take to the bank because it will always be true.
LIGHTBOARD LECTURE VIDEO
Normal Curve
So it turns out when you measure almost anything in the natural world, and the scores are allowed to vary, that when you plot or graph all those scores, it always looks like the exact same shape. That's nice. It always looks a little something like this. And that's not perfectly drawn. It should be perfectly symmetrical. But what are you going to do? This is called the normal curve. And sometimes it's called the bell-shaped curve or the bell curve because it looks like the bottom of a bell that's ringing. And on a curve like this-- this is like other graphs and charts you've seen, where along the bottom here, what we call the x-axis, are often the scores that might range from a low score to some high score. And along this axis, what we call the y-axis, is something having to do with the frequency, how common these scores are. By the way, we call this the y-axis-- and the teacher told me this once-- because if you go like this, like the letter Y, it reaches up and down like that. So that's vertical. They'll cut all this out. So the normal curve-- let's figure out what the normal curve is. It is perfectly symmetrical. And right down the middle is the mean. And on the normal curve, half the scores are above the mean and half are below the mean. So that means the mean is also the median. And also on the normal curve, the highest score, the one that happens the most often, is also the mean and the median. So that means the mode and the mean and the median are all the same thing. The best part about a normal curve, though, is that we've defined it based on how many standard deviations you are away from the mean because we know that 34% of all the scores on a normal curve are between the mean and one standard deviation below. And we also know that another 34% of all the scores are between the mean and a standard deviation above. So if you fill all this in, the amount of ink here actually is going to be-- let's see-- 34 plus 34-- 68 percent of all the colors, all the ink I would use to fill this curve are within a standard deviation of the mean. Most scores are close to the mean on the normal curve. That means if someone gets a score far away from the mean, like, say, two standard deviations out, that's much less common than if they get a score that's close to the mean. In fact, we know that this is 34% and 34%. If you're between one standard deviation and two standard deviations out, that's another 14% of the entire range of scores possible. And we're already adding up to something like 96% of all the scores. You could be further out. You could be out in this little area here, be two standard deviations out and beyond. But there's only room for about another 2%. Now, the normal curve goes on forever. So there's actually some small possibilities you could be even further away than three standard deviations. But by knowing the normal curve and knowing that almost everything ends up being charted like a normal curve, you can figure out how rare any score is.
Want to go further? Take a look at Figure 8.4. Here, you can see the same normal curve in all its glory (the mean equals 100 and the standard deviation equals 10) and the percentage of cases that we would expect to fall within the boundaries defined by the mean and the standard deviation.
Figure 8.4 ⬢ A normal curve divided into different sections
Here’s what we can conclude:
|
The distance between … |
includes … |
and the scores that are included (if the mean = 100 and the standard deviation = 10) are … |
|
the mean and 1 standard deviation |
34.13% of all the cases under the curve |
from 100 to 110 or 100 to 90 |
|
1 and 2 standard deviations |
13.59% of all the cases under the curve |
from 110 to 120 or 90 to 80 |
|
2 and 3 standard deviations |
2.15% of all the cases under the curve |
from 120 to 130 or 80 to 70 |
|
3 standard deviations and above |
0.13% of all the cases under the curve |
130 and above or 70 and below |
If you add up all the values in either half of the normal curve, guess what you get? That’s right (just about almost), 50%. Why? The distance between the mean and all the scores to the right or to the left of the mean underneath the normal curve includes 50% of all the scores.
And because the curve is symmetrical about its central axis (each half is a mirror image of the other), the two halves together represent 100% of all the scores. Not rocket science, but important to point out, nonetheless.
Now, be sure to keep in mind that we are using a mean of 100 and a standard deviation of 10 only as a particular example. Obviously, there are all sorts of distributions with different means and standard deviations, although measurement folks tend to design standardized tests so they will have easy to remember means (like 100) and standard deviations (like 10 or 15).
All of this is pretty neat, especially when you consider that the values of 34.13% and 13.59% and so on are absolutely independent of the actual values of the mean and the standard deviation. These percentages are due to the shape of the curve and do not depend on the value of any of the scores in the distribution or the value of the mean or standard deviation. In fact, you could draw a normal curve on a piece of cardboard and cut it out, so you had a bell-shaped piece of cardboard. Then if you cut out the area between the mean and 1 standard deviation and weighed it, it would tip the scale at exactly 34.13% of the entire piece of bell-shaped cardboard. (Try it—it’s true.) Or imagine that you are filling in the curve’s shape with ink or paint; that area between the mean and 1 standard deviation would take 34% of the total ink used!
In our example, this means that (roughly) 68% (34.13% doubled) of the scores fall between the raw score values of 90 and 110. What about the other 32%? Good question. One half of the scores remaining (16%, or 13.59% + 2.15% + 0.13%) fall above (to the right of) 1 standard deviation above the mean, and one half fall below (to the left of) 1 standard deviation below the mean. And because the curve slopes, and the amount of area decreases as you move farther away from the mean, it is no surprise that the likelihood that a score will fall more toward the extremes of the distribution is less than the likelihood it will fall toward the middle. That’s why the curve has a bump in the middle and is not skewed in either direction. And that’s why scores that are farther from the mean have a lower probability of occurring than scores that are closer to the mean.
OUR FAVORITE STANDARD SCORE: THE Z SCORE
You have read more than once that distributions differ in their measures of central tendency and variability.
But in the general practice of applying statistics (and using them in research activities), we find ourselves working with distributions that are indeed different, yet we will be required to compare them with one another. And to do such a comparison, we need some kind of a standard.
Say hello to standard scores . These are scores that are comparable because they are standardized in units of standard deviations. For example, a standard score of 1 in a distribution with a mean of 50 and a standard deviation of 10 means the same as a standard score of 1 from a distribution with a mean of 100 and a standard deviation of 5; they both represent 1 standard deviation and are an equivalent distance from their respective means. Also, we can use our knowledge of the normal curve and assign a probability to the occurrence of a value that is 1 standard deviation or farther from the mean. We’ll do that later.
Although there are other types of standard scores, the one that you will see most frequently in your study of statistics is called a z score. This is the result of dividing the amount that a raw score differs from the mean of the sample scores by the standard deviation, as shown in Formula 8.1:
(8.1)
z=X−¯¯¯Xs,z=X−X¯s,
where
· z is the z score,
· X is the individual score,
· ¯¯¯XX¯ is the mean of the distribution, and
· s is the distribution’s standard deviation.
For example, in Formula 8.2, you can see how the z score is calculated if the mean is 100, the raw score is 110, and the standard deviation is 10:
(8.2)
z=(110−100)10=+1.0.z=(110−100)10=+1.0.
It’s just as easy to compute a raw score given a z score as the other way around. You already know the formula for a z score given the raw score, the mean, and the standard deviation. But if you know only the z score and the mean and the standard deviation, then what’s the corresponding raw score? Easy: Just use the formula X=z(s)+¯¯¯XX=z(s)+X¯ . You can easily convert raw scores to z scores and back again if necessary. For example, a z score of −0.5 in a distribution with a mean of 50 and an s of 5 would equal a raw score of X = (−0.5)(5) + 50, or 47.5.
As you can see in Formula 8.1, we use ¯¯¯XX¯ and s for the mean and the standard distribution, respectively. In some books (and in some lectures), the population mean is represented by the Greek letter mu, or μ, and the standard deviation is represented by the Greek letter sigma, or σ. One can be strict about when to use what, but for our purposes, we will use letters of the Roman alphabet.
CORE CONCEPTS IN STATS VIDEO
Normal Distributions
Let's move to the next part of analyzing data. Testing hypotheses, where you'll rely on another type of distribution. Normal distributions, these are theoretical distributions representing populations that consist of an infinite number of scores calculated from a mathematical formulas. The first characteristic is symmetry. They also have an infinite number of scores extending to both ends known as tails. Because these distributions are created from a mathematical formulas, you can generate an infinite number of scores. And because of that, there is no score with a frequency of zero. That's why these distributions never touch the x-axis. The second aspect of normal distributions is modality. And we're going to represent the modality to the distribution with the mean. Given that normal distributions represent populations, what's the mean of a normal distribution? It's the population mean that we represent with the symbol mu. The third aspect of normal distributions is variability. And we represent the variability of a distribution with the standard deviation. For a normal distribution, this is the standard deviation in the population represented by the lowercase symbol sigma. Given how hypothetical and theoretical these distributions are, you might be wondering, why do we use them. The statistics that we use to test hypotheses are based on the assumption that variables are normally distributed. But there's a problem with normal distributions. And the problem is that how you interpret a score depends on how the variable is measured. For these two variables, you could have a score of 76. But you would interpret the score very differently for height versus weight. A height of 76 which is 6 foot 4 would probably be considered to be above the average. On the other hand, a weight of 76, or 76 pounds, would most likely be considered to be well below the average for the population. So what do we need? We need one normal distribution that can be applied to any normally distributed variable, and that's what's known as a standard normal distribution. In essence, it's just one other type of normal distribution. In terms of its symmetry, the standard normal distribution is symmetrical with an infinite number of scores in both tails, just like any normal distribution. However, in terms of modality, the mean of the standard normal distribution is not mu, but In terms of the variability of the standard normal distribution, it's equal to 1 and not sigma. Let's look at an example. Assuming height in the population is normally distributed with a mean, or mu, of 69 inches, and a standard deviation, sigma, of three inches, how could you evaluate a height, an x, of six foot one, or 73 inches? We would transform it into a z-score using the formula. So z equals x minus mu over sigma. So 73 minus 69 over three, which is 4 over three, which is 1.33. We can evaluate a z equal to 1.33 in two ways. One is the distance of the score from the mean of zero. We could say that a height of 6 foot 1, or 73 inches, is 1.33 standard deviations above the mean. The other way to evaluate a z score is by its position relative to all of the other z-scores in the distribution. To do this, we would draw the distribution and locate our z-score. So here's the standard normal distribution. We know that the mean is And now we would locate our z-score of 1.33. Since it's positive, we would put it to the right. Let's start with our z-score, which is 1.33. Move down the z column until we reach that. Now move to the right, and notice that there's two areas that are represented. The area between the mean and that z-score, and the area beyond z. So to answer our question, we would move down the beyond column until we reach our z of 1.33 to reach the number that 9.18% of the population is taller than six foot one.
The following data show the original raw scores plus the z scores for a sample of 10 scores that has a mean of 12 and a standard deviation of 2. Any raw score above the mean will have a corresponding z score that is positive, and any raw score below the mean will have a corresponding z score that is negative. For example, a raw score of 15 has a corresponding z score of +1.5, and a raw score of 8 has a corresponding z score of –2.0. And, of course, a raw score of 12 (or the mean) has a z score of 0 (because 12 is no distance from the mean). Z scores are kind of weird, because the average person scores a zero!
|
X |
X−¯¯¯XX−X¯ |
z Score |
|
12 |
0 |
0.0 |
|
15 |
3 |
1.5 |
|
11 |
–1 |
–0.5 |
|
13 |
1 |
0.5 |
|
8 |
–4 |
–2.0 |
|
14 |
2 |
1.0 |
|
12 |
0 |
0.0 |
|
13 |
1 |
0.5 |
|
12 |
0 |
0.0 |
|
10 |
–2 |
–1.0 |
Following are just a few observations about these scores, as a little review.
First, those scores below the mean (such as 8 and 10) have negative z scores, and those scores above the mean (such as 13 and 14) have positive z scores.
Second, positive z scores always fall to the right of the mean and are in the upper half of the distribution. And negative z scores always fall to the left of the mean and are in the lower half of the distribution.
Third, when we talk about a score being located 1 standard deviation above the mean, it’s the same as saying that the score is a z score of 1. For our purposes, when comparing scores across distributions, z scores and standard deviations are equivalent. In other words, a z score is simply the number of standard deviations from the mean.
Finally (and this is very important), z scores across different distributions are comparable. Here’s another table, similar to the one earlier, that will illustrate this last point. These 10 scores have a mean of 57.3 and a standard deviation of about 15.61.
|
Raw Score |
X−¯¯¯XX−X¯ |
z Score |
|
67 |
9.7 |
0.621 |
|
54 |
–3.3 |
–0.211 |
|
65 |
7.7 |
0.493 |
|
33 |
–24.3 |
–1.557 |
|
56 |
–1.3 |
–0.083 |
|
76 |
18.7 |
1.198 |
|
65 |
7.7 |
0.493 |
|
33 |
–24.3 |
–1.557 |
|
48 |
–9.3 |
–0.596 |
|
76 |
18.7 |
1.198 |
In the first distribution you saw, with a mean of 12 and a standard deviation of 2, a raw score of 12.8 has a corresponding z score of +0.4, which means that a raw score of 12.8 is 0.4 standard deviations from the mean. In the second distribution, with a mean of 57.3 and a standard deviation of 15.61, a raw score of 64.8 has a corresponding z score of +0.4 as well. A miracle? No—we did that on purpose to point out how you can compare performance based on scores from different sets of data or distributions. Both raw scores of 12.8 and 64.8, relative to one another, are equal distances (and equally distant) from the mean. When these raw scores are represented as standard scores, then they are directly comparable to one another in terms of their relative location in their respective distributions. Whoever got those two scores did the same compared to others who took those tests.
People Who Loved Statistics
The normal curve is everywhere, especially in the natural world. In fact, understanding distributions and how they tell us almost everything we need to know about the probability of any particular outcome is what allows biostatisticians (statisticians interested biology) to do their work. Someone who loves statistics and uses probability to evaluate the effectiveness and safety of drugs is Dionne L. Price. Dr. Price is a research director for the Food and Drug Administration and a Fellow of the American Statistical Association. She had an interest in math at an early age and was raised by a school teacher as part of a large education-oriented family. Although it used to be rare for girls to pursue mathematics and statistics, Dr. Price did not let that dissuade her. As she explains, “mathematics can open up doors of opportunities unimagined. The sky is the limit, and I urge students interested in mathematics to go for it!”
What z Scores Represent
You already know that a particular z score represents not only a raw score but also a particular location along the x-axis of a distribution. And the more extreme the z score (such as –2.0 or +2.6), the farther it is from the mean.
Because you already know the percentage of area that falls between certain points along the x-axis (such as about 34% between the mean and a standard deviation of +1, for example, or about 14% between a standard deviation of +1 and a standard deviation of +2), we can make the following true statements as well:
· Eighty-four percent of all the scores fall below a z score of +1 (the 50% that fall below the mean plus the 34% that fall between the mean and a z score of 1).
· Sixteen percent of all the scores fall above a z score of +1 (because the total area under the curve has to equal 100%, and 84% of the scores fall below a score of +1.0).
Think about both of these facts for a moment.
What we are saying is that, given the normal distribution, different areas of the curve are encompassed by different values of standard deviations or z scores.
Okay—here it comes. These percentages or areas can also easily be seen as representing probabilities of a certain score occurring. For example, here’s a big scary question of the kind you can now ask and answer (drum roll, please):
In a distribution with a mean of 100 and a standard deviation of 10, what is the probability that any person will score 110 or above?
The answer? The probability is 16% or 16 out of 100 or .16. How did we get this?
First, we computed the corresponding z score, which is +1 [(110 − 100)/10]. Then, given the knowledge we already have (see Figure 8.4), we know a z score of 1 represents a location on the x-axis below which 84% (50% plus 34%) of all the scores in the distribution fall. Above that is 16% of the scores or a probability of .16.
In other words, because we already know the areas between the mean and 1, 2, or 3 standard deviations above or below the mean, we can easily figure out the probability that the value of any particular z score has of occurring or the probability of any person getting that score.
Now the method we just went through is fine for z values of 1, 2, and 3. But what if the value of the z score is not a whole number like 2 but is instead 1.23 or −2.01? We need to find a way to be more precise.
How do we do that? Simple—learn calculus and apply it to the curve to compute the area underneath it at almost every possible point along the x-axis, or (and we like this alternative much more) use Table B.1 found in Appendix B (the normal distribution table). This is a listing of all the values (except the very most extreme) for the areas under a curve that correspond to different z scores.
Table B.1 has two columns. The first column, labeled “z Score,” is simply the z score that has been computed. The second column, “Area Between the Mean and the z Score,” is the exact area underneath the curve that is contained between the two points.
For example (and you should turn to Table B.1 and try this as you read along), if we wanted to know the area between the mean and a z score of +1, we would find the value 1.00 in the column labeled “z Score” and read across to the second column, where we would find the area between the mean and a z score of 1.00 to be 34.13. Seen that before?
Why aren’t there any plus or minus signs in this table (such as −1.00)? Because the curve is symmetrical, it does not matter whether the value of the z score is positive or negative. The area between the mean and 1 standard deviation in any direction is always 34.13%.
Here’s the next step. Let’s say that for a particular z score of 1.38, you want to know the probability associated with that z score. If you wanted to know the percentage of the area between the mean and a z score of 1.38, you would find in Table B.1 the corresponding area for the z score of 1.38, which is 41.62, indicating that more than 41% of all the cases in the distribution fall between a z score of 0 and 1.38. Then we know that about 92% (50% plus 41.62%) will fall at or below a z score of 1.38. Now, you should notice that we did this last example without any raw scores at all. Once you get to this table, they are just no longer needed.
But are we always interested only in the amount of area between the mean and some other z score? What about between two z scores, neither of which is the mean? For example, what if we were interested in knowing the amount of area between a z score of 1.5 and a z score of 2.5, which translates to a probability that a score falls between the two z scores? How can we use the table to compute the answer to such questions? It’s easy. Just find the corresponding amount of area each z score encompasses and subtract one from the other. Often, drawing a picture helps, as in Figure 8.5.
Figure 8.5 ⬢ Using a drawing to figure out the difference in area between two z scores
For example, let’s say that we want to find the area between raw scores of 110 and 125 in a distribution with a mean of 100 and a standard deviation of 10. Here are the steps we would take.
1. Compute the z score for a raw score of 110, which is (110 − 100)/10, or +1.
2. Compute the z score for a raw score of 125, which is (125 − 100)/10, or +2.5.
3. Using Table B.1 in Appendix B, find the area between the mean and a z score of +1, which is 34.13%.
4. Using Table B.1 in Appendix B, find the area between the mean and a z score of +2.5, which is 49.38%.
5. Because you want to know the distance between the two, subtract the smaller from the larger: 49.38 − 34.13 = 15.25%. Here’s the picture that’s worth a thousand words, in Figure 8.5.
Okay—so we can be pretty confident that the probability of a particular score occurring can be best understood by examining where that score falls in a distribution relative to other scores. In this example, the probability of a score occurring between a z score of +1 and a z score of +2.5 is about 15%.
Here’s another example. In a set of scores with a mean of 100 and a standard deviation of 10, a raw score of 117 has a corresponding z score of 1.70. This z score corresponds to an area under the curve of 95.54% (50% + 45.54%), meaning that the probability of this score occurring between a score of 0 and a score of 1.70 is 95.54% or 95.5 out of 100 or .955.
Just two things about standard scores. First, even though we are focusing on z scores, there are other types of standard scores as well. For example, a T score is a type of standard score that is computed by multiplying the z score by 10 and adding 50. One advantage of this type of score is that you rarely have a negative T score. As with z scores, T scores allow you to compare standard scores from different distributions.
Second, a standard score is a whole different animal from a standardized score. A standardized score is one that comes from a distribution with a predefined mean and standard deviation. Standardized scores from tests such as the SAT and GRE (Graduate Record Exam) are used so that comparisons can easily be made between scores from different forms or administrations of the test, which all have the same mean and standard deviation.
What z Scores Really Represent
The name of the statistics game is being able to estimate the probability of an outcome. If we take what we have talked about and done so far in this chapter one step further, we can determine the probability of some event occurring. Then, we will use some criterion to judge whether we think that event is as likely, more likely, or less likely than what we would expect by chance. The research hypothesis presents a statement of the expected event, and we collect data and then use our statistical tools to evaluate how likely that event is.
That’s the 20-second version of what inferential statistics is, but that’s a lot. So let’s take everything from this paragraph and go through it again with an example.
Let’s say that your lifelong friend, trusty Lew, gives you a coin and asks you to determine whether it is a “fair” one—that is, if you flip it 10 times, you should come up with 5 heads and 5 tails.
We would expect 5 heads (or 5 tails) because the probability is .5 of a head or a tail on any one flip (if it’s a legit coin). On 10 independent flips (meaning that one flip does not affect another), we should get 5 heads and so on. Now the question is, “How many heads would disqualify the coin as being fake or rigged?”
Let’s say the criterion for fairness we will use is that if, in flipping the coin 10 times, we get heads (or heads turn up) less than 5% of the time, we’ll say the coin is rigged and call the police on Lew. This 5% criterion is one standard that is used by statisticians. If the probability of the event (be it the number of heads or the score on a test or the difference between the average scores for two groups) occurs in the extreme (and we’re saying the extreme is defined as less than 5% of all such occurrences), it’s an unlikely, or in this case an unfair, outcome.
Back to the coin and Lew.
Because there are 2 possible outcomes (heads or tails) and we are flipping the coin 10 times, there are 210 or 1,024 possible outcomes, such as 9 heads and 1 tail, 7 heads and 3 tails, 10 heads and 0 tails, and on and on. Here’s the distribution of how many heads you can expect, just by chance alone, on 10 flips. For example, the probability associated with getting 6 heads in 10 flips is about 21%.
|
Number of Heads |
Probability |
|
0 |
0.00 |
|
1 |
0.01 |
|
2 |
0.04 |
|
3 |
0.12 |
|
4 |
0.21 |
|
5 |
0.25 |
|
6 |
0.21 |
|
7 |
0.12 |
|
8 |
0.04 |
|
9 |
0.01 |
|
10 |
0.00 |
So, the likelihood of any particular outcome is known. The likelihood of 6 heads from 10 tosses? About .21, or 21%. Now it’s decision time. Just how many heads would one have to get on 10 flips to conclude that the coin is fixed, biased, busted, broken, or loony?
Well, as many good statisticians do, we’ll define the criterion as 5%, which we discussed. If the probability of the observed outcome (the results of our 10 flips) is less than 5%, we’ll conclude that it is so unlikely that something other than chance must be responsible—and that something is a bogus coin.
If you look at the table, you can see that 8, 9, or 10 heads all represent outcomes that have less than 5% probability of occurring. So if the result of 10 coin flips were 8, 9, or 10 heads, the conclusion would be that the coin is not a fair one. (Yep—you’re right: 0, 1, and 2 qualify for the same decision. Sort of the other side of the coin—groan.)
The same logic applies to our discussion of z scores earlier. Just how extreme a z score would we expect before we could proclaim that an outcome is not due just to chance but to some other factor? If you look at the normal curve table in Appendix B, you’ll see that the cutoff point for a z score of 1.65 includes about 45% of the area under the curve. If you add that to the other 50% of the area on the other side of the curve, you come up with a total of 95%. That leaves just 5% above that point on the x-axis. Any score that represents a z score of 1.65 or above is then into pretty thin air—or at least in a location that has a much smaller chance of occurring than others.
Hypothesis Testing and z Scores: The First Step
What we showed you here is that any event can have a probability associated with it. And we use those probability values to decide how unlikely we think an event might be. For example, getting only 1 head and 9 tails in 10 tosses of a coin is highly unlikely. We also said that if an event seems to occur only 5 out of 100 times (5%), we will deem that event to be rather unlikely relative to all the other events that could occur.
It’s much the same with any outcome related to a research hypothesis. The null hypothesis, which you learned about in Chapter 7, claims that there is no difference between two values, such as two group means, or some sample value and zero. We try to test the armor of the null for any chinks that might be there.
In other words, if, through the test of the research hypothesis, we find a difference and calculate that the likelihood of that difference occurring by chance is somewhat extreme, then the research hypothesis is a more attractive explanation than the null. So, if we find a z score (and remember that z scores have probabilities of occurrence associated with them as well) that is extreme (how extreme?—less than a 5% chance of occurring), we like to say that the reason for the extreme score is something to do with treatments or relationships or a real difference between groups and not just chance. We’ll go into much greater detail on this point in the following chapter.
Using SPSS to Compute z Scores
SPSS does lots of really cool things, but it’s the little treats like the one you’ll see here that make the program such a great timesaver. Now that you know how to compute z scores by hand, let’s let SPSS do the work.
To have SPSS compute z scores for the set of data you see in the first column in Figure 8.6 (on page 156), follow these steps.
1. Enter the data in a new SPSS window.
2. Click Analyze → Descriptive Statistics → Descriptives.
3. Double-click on the variable Score to move it to the Variable(s): box.
4. Click Save standardized values as variables in the Descriptives dialog box.
5. Click OK.
You can see in Figure 8.6 how SPSS data compute the corresponding z scores. (Be careful—when SPSS does almost anything, it automatically takes you to an Output window where you will not see the computed z scores! You have to switch back to the Data View.)
Figure 8.6 ⬢ Having SPSS compute z scores for you
FAT AND SKINNY FREQUENCY DISTRIBUTIONS
You could certainly surmise by now that distributions can be very different from one another in a variety of ways. In fact, there are four different ways in which they can differ: average value (you know—the mean, median, or mode), variability (range, variance, and standard deviation), skewness, and kurtosis. Those last two are new terms, and we’ll define them as we show you what they look like. Let’s discuss each of the four characteristics and then illustrate them.
Average Value
We’re back once again to measures of central tendency. You can see in Figure 8.7 how three different distributions can differ in their average value. Notice that the mean for Distribution C is more than the mean for Distribution B, which, in turn, is more than the mean for Distribution A.
Figure 8.7 ⬢ How distributions can differ in their average score
Variability
In Figure 8.8, you can see three distributions that all have the same average value but differ in variability. The variability in Distribution A is less than that in Distribution B and, in turn, less than that found in Distribution C. Another way to say this is that Distribution C has the largest amount of variability of the three distributions and Distribution A has the least.
Figure 8.8 ⬢ How distributions can differ in variability
Skewness
Skewness is a measure of the lack of symmetry, or the lopsidedness, of a distribution. In other words, one “tail” of the distribution is longer than another. For example, in Figure 8.9, Distribution A’s right tail is longer than its left tail, corresponding to a smaller number of occurrences at the high end of the distribution. This is a positively skewed distribution. Because the tail on the right, where higher values are, is longer, we call it positively skewed. This might be the case when you have a test that is very difficult, such that only a few people get scores that are relatively high and many more get scores that are relatively low. Distribution C’s right tail is shorter than its left tail, corresponding to a larger number of occurrences at the high end of the distribution. This is a negatively skewed distribution and would be the case for an easy test (lots of high scores and relatively few low scores). And Distribution B—well, it’s just right, with equal lengths of tails and no skewness.
Kurtosis
Even though this sounds like a medical condition, it’s the last of the four ways in which we can classify how distributions differ from one another. Kurtosis has to do with how flat or peaked a distribution appears, and the terms used to describe this characteristic are relative ones.
For example, the term platykurtic refers to a distribution that is relatively flat compared with a normal, or bell-shaped, distribution. The term leptokurtic refers to a distribution that is relatively peaked, taller, compared with a normal, or bell-shaped, distribution. In Figure 8.10, Distribution A is platykurtic compared with Distribution B. Distribution C is leptokurtic compared with Distribution B. Figure 8.10 looks similar to Figure 8.8 for a good reason—distributions that are platykurtic, for example, are relatively more dispersed than those that are not. Similarly, a distribution that is leptokurtic is less variable or dispersed relative to others.
Figure 8.9 ⬢ Degree of skewness in different distributions
Figure 8.10 ⬢ Degrees of kurtosis in different distributions
While skewness and kurtosis are used mostly as descriptive terms (such as “That distribution is negatively skewed”), there are mathematical indicators of how skewed or kurtotic a distribution is. For example, skewness is computed by subtracting the value of the median from the mean. If the mean of a distribution is 100 and the median is 95, the skewness value is 100 − 95 = 5, a positive number, and the distribution is positively skewed. If the mean of a distribution is 85 and the median is 90, the skewness value is 85 − 90 = −5, and the distribution is negatively skewed. There’s an even more sophisticated formula, which uses the standard deviation of the distribution so that skewness indicators can be compared with one another (see Formula 8.3):
Sk=3(¯¯¯X−M)s,Sk=3(X¯−M)s,
where
· Sk is Pearson’s (he’s the correlation guy you learned about in Chapter 5) measure of skewness,
· ¯¯¯XX¯ is the mean,
· M is the median, and
· s is the standard deviation.
Here’s an example: The mean of Distribution A is 100, the median is 105, and the standard deviation is 10. For Distribution B, the mean is 120, the median is 116, and the standard deviation is 10. Using Pearson’s formula, the skewness of Distribution A is –1.5, and the skewness of Distribution B is 1.2. Distribution A is negatively skewed, and Distribution B is positively skewed. However, Distribution A is more skewed than Distribution B, regardless of the direction.
Let’s not leave kurtosis out of this discussion. It, too, can be computed using a fancy formula as follows:
(8.4)
K=Σ(X−¯¯¯Xs)4n−3,K=Σ(X−X¯s)4n−3,
where
· K = measure of kurtosis,
· ∑ = sum,
· X = the individual score,
· ¯¯¯XX¯ = the mean of the sample,
· s = the standard deviation, and
· n = the sample size.
This is a pretty complicated formula that basically looks at how flat or peaked a set of scores is. You can see that if each score is the same, then the numerator is zero and K = 0, indicating no skewness. K equals zero when the distribution is normal or mesokurtic (now there’s a new word to throw around). If the individual scores (the Xs in the formula) differ greatly from the mean (and there is lots of variability), then the curve will probably be quite flat.
Real-World Stats
More about obesity in children… .
You have probably heard about all the concerns regarding childhood obesity. These researchers investigated the possible reduction of obesity in children by focusing on physical activity as an intervention. What might this have to do with z scores? A z score was one of their primary outcome or dependent variables: mean body mass index (BMI) z score = 3.24, SD = 0.49.
The participants were invited to participate in a 1-week sports camp, and after the camp, a coach from a local sports club supported the children during participation in a chosen activity for 6 months. Weight, height, body composition, and lifestyle were measured at baseline and after 12 months. The results? Children who participated in the intervention had a significant decrease in BMI z score.
Why did the researchers use z scores? Most probably because the children who were being compared with one another came from different distributions of scores (they had different means and standard deviations), and by using a standard score, those differences (at least in the variability of the scores) were equalized.
Want to know more? Go online or to the library and find …
Nowicka, P., Lanke, J., Pietrobelli, A., Apitzsch, E., & Flodmark, C. E. (2009). Sports camp with six months of support from a local sports club as a treatment for childhood obesity. Scandinavian Journal of Public Health, 37, 793–800.
Summary
Being able to figure out a z score, and being able to estimate how likely it is to occur in a sample of data, is the first and most important skill for understanding the whole notion of inference. Once we know how likely a test score (or other outcome values, such as a difference between groups) is, we can compare that likelihood with what we would expect by chance and then make informed decisions. As we start Part IV of Statistics for People Who (Think They) Hate Statistics, we’ll apply this model to specific examples of testing questions about the difference.
Time to Practice
1. What are the characteristics of the normal curve? What human behavior, trait, or characteristic can you think of that is distributed normally? What makes you think it may be distributed normally?
2. To compute a standard score, what three bits of information do you need?
3. Standard scores, such as z scores, allow us to make comparisons across different samples. Why?
4. Why is a z score a standard score, and why can standard scores be used to compare scores from different distributions with one another?
5. The mean of a set of test scores is 50, and the standard deviation is 5. For a raw score of 55, the corresponding z score is +1. What’s the z score when the standard deviation is half as much, or 2.5? From this example, what can you conclude is the effect of decreasing the amount of variability in a set of scores on a standard score (given all else is equal, such as the same raw score), and why is this effect important?
6. For the following set of scores, fill in the cells. The mean is 74.13, and the standard deviation is 9.98.
|
Raw Score |
z Score |
|
68.0 |
? |
|
? |
–1.6 |
|
82.0 |
? |
|
? |
1.8 |
|
69.0 |
? |
|
? |
–0.5 |
|
85.0 |
? |
|
? |
1.7 |
|
72.0 |
? |
7. For the following set of scores, compute standard scores. Do it using SPSS (easy) and do it manually to keep SPSS honest (not as easy as using SPSS but once you get the hang of it—easy enough). Notice any differences?
1. 18
2. 19
3. 15
4. 20
5. 25
6. 31
7. 17
8. 35
9. 27
10. 22
11. 34
12. 29
13. 40
14. 33
15. 21
Time to Practice Video
Chapter 8: Problem 7
Chapter eight, problem seven wants us to compute a z-score, and it wants us to do it two ways, one, with SPSS, and the other by hand. Since I'm using a computer, I'm not going to use pencil and paper. But I'll show you how to do it in Excel, so that way you can see the steps that we follow, and that you could do it actually by hand if you chose. Question seven gives us the data that we need, which I've entered into SPSS. Computing a z-score in SPSS is super simple. When we click here, we're just going to go under Analyze, Descriptive Statistics, and then Descriptives. First, you're going to want to move the scores themselves into the right-hand box, which is the box that does the analysis. And then in the lower left, you're going to see this little box here that says Save standardized values as variables. You click that. Hit OK. Our actual output doesn't matter. But when we look here at the z-scores, they've popped up into our spreadsheet. So that's how simple it is to do it in SPSS. But this doesn't show you the steps that you need. So let's look at Excel. When you open up an Excel document, you have to label everything. So I've gone ahead and labeled each of the columns, so we know what steps we want to follow. The two bits of information we need, besides the individual score, is the average and standard deviation. So let's figure those out together. First, when we click here, we want to average all the scores together, which is really easy in SPSS. You just click under the little Sum sign and go to Average, and it's going to show you, and it highlights all of it, A2 through A16. Enter, and you have your average of a 25.73. And I'm going to add that here, just so I don't forget. Going take out that information. The next thing I want to do is compute the standard deviation. Go back to the Sum sign. Under More Functions, click, and look for standard deviation or the abbreviation of standard deviation. Click there. Double-click. And it's also showing you A2 through A16, which is what we want. Hit Enter. When we click out of there, you'll see that the standard deviation of 7.63 is there. And so enter that in here. So we keep that information for ourselves. Now, we first need to compute the difference between the score and the average. So to do that, let's just add in the average here. And then if we simply highlight and drag it down, it will automatically populate every cell. Excel is really friendly that way for automatic population. Then we want to compute the difference. And to compute the difference, enter in an equal sign and then 18 or click A2. Then you're just going to hit minus B2, and then Enter, and it shows us the difference. Since we've entered a formula in, if we click here, and we hover over that little box on the bottom and drag it down, it will automatically populate with that formula. So then we have our standard deviation, which we know is 7.63. So let's enter that information in. And make sure it's a plus, not a comma. And once we have that, we populate down. And then we want the z-score, which is the different score divided by the standard deviation. So again, we're going to do an equal, the difference, the divide by sign. Click on D2, and hit Enter. That gives us our z-score. We drag it down. And now we have all of our z-scores. And so the question is, are they the same? Do they look the same as what we had? If we look at our data set here, you'll notice that they really are pretty similar. The differences between them would be rounding errors, since we rounded up to 25.73 and 7.63. So this is how you do it both in SPSS and by hand or in a different type of spreadsheet. A z-score is really simple to compute, but you want to make sure you have all the data in the right order.
1. Questions 8a through 8d are based on a distribution of scores with ¯¯¯X=75X¯=75 and standard deviation = 6.38. Draw a small picture to help you see what’s required.
a. What is the probability of a score falling between the raw scores of 70 and 80?
b. What is the probability of a score falling above a raw score of 80?
c. What is the probability of a score falling between a raw score of 81 and 83?
d. What is the probability of a score falling below a raw score of 63?
2. Jake needs to score in the top 10% of his class to earn a physical fitness certificate. The class mean is 78, and the standard deviation is 5.5. What raw score does he need to get that valuable piece of paper?
3. Imagine you are in charge of a program in which members are evaluated on five different tests at the end of the program. Why doesn’t it make sense to simply compute the average of the five scores as a measure of performance rather than compute a z score for each test for each individual and average those?
4. Who is the better student, relative to his or her classmates? Here’s all the information you ever wanted to know:
|
Math |
|||
|
Class mean |
81 |
|
|
|
Class standard deviation |
2 |
|
|
|
Reading |
|||
|
Class mean |
87 |
|
|
|
Class standard deviation |
10 |
|
|
|
Raw Scores |
|||
|
|
Math Score |
Reading Score |
Average |
|
Noah |
85 |
88 |
86.5 |
|
Talya |
87 |
81 |
84.0 |
|
z Scores |
|||
|
|
Math Score |
Reading Score |
Average |
|
Noah |
__________ |
__________ |
__________ |
|
Talya |
__________ |
__________ |
__________ |
5. Here’s an interesting extra-credit question. As you know, one of the defining characteristics of the normal curve is that the tails do not touch the x-axis. Why don’t they touch?
Student Study Site
Get the tools you need to sharpen your study skills! Visit edge.sagepub.com/salkindfrey7e to access practice quizzes, eFlashcards, original and curated videos, data sets, and more!