WK1
CHAPTER 2 SCIENTIFIC METHODS FOR STUDYING PEOPLE IN INTERACTION
Could simply thinking about smart people improve your performance? Dutch researchers hypothesized that merely contemplating professors could make people more knowledgeable. Fifty-eight Dutch college students volunteered for a pair of pilot studies to develop materials for future psychology experiments. Seated in individual computer cubicles for the first pilot study, two-thirds of the participants started out by imagining and listing a typical professor's behaviors, lifestyle, and appearance; some did so for two minutes and some for nine minutes. The remaining third of the participants skipped this task. Then, in the second pilot study, all participants completed a 60-item general knowledge test, composed of items from the Trivial Pursuits game. Examples included “Who painted La Guernica?” (a. Dali, b. Miro, c. Picasso, d. Velasquez) and “What is the capital of Bangladesh?” (a. Dhaka, b. Hanoi, c. Yangon, d. Bangkok). Participants typically answered about 50% correctly (where 25% would be chance). None of the participants saw any link between the first pilot study and the second. In fact, however, subjects who had spent time thinking about typical professors actually performed far better on the knowledge test than those who did not (Table 2.1): Professorial thoughts gained them an advantage of 6% to 14% on this multiple-choice test of factual knowledge. What happened? Did participants become more knowledgeable merely as a result of thinking about professors? (We wish.) The experiment's authors (Dijksterhuis & van Knippenberg, 1998) speculated that having “professor” on the mind may have caused them to work harder, use better strategies, or trust their hunches. Other studies will unravel this mystery (see Chapter 4). For the present, however, this study illustrates the simultaneously elegant and provocative nature of a well-designed experiment.
TABLE 2.1 Trivial Pursuit Score, after Thinking about Professors
|
|
Time Spent Thinking about the Typical Professor |
||
|
|
0 Minutes |
2 Minutes |
9 Minutes |
|
Percent Correct |
45.2 |
51.8 |
58.9 |
Source: From Dijksterhuis & van Knippenberg, 1998, Experiment 2. Copyright © American Psychological Association. Adapted with permission.
This chapter aims to help you understand the unsurpassed power of the scientific experiment but also its limitations. Whether studying the influence of thinking about social roles, as in the professor study, or the impact of group decision making, as in Lewin's organ-meats study from Chapter 1, all social psychologists grapple with determining the scientific methods best suited to test their hypotheses. Methods form the bedrock on which our field rests, and experiments are the most solid scientific rocks of all. Careful scientific methods and outcomes provide the best information about the plausibility of our ideas, whether the hypotheses concern the power of roles, the influence of group decision making, or the causes of aggression.
To explain social psychology's methods, this chapter first describes forming scientific hypotheses, which involves the process of conceptualization. Second, the chapter describes testing hypotheses, that is, the process of operationalization, whereby researchers create empirical, working definitions of their concepts. Third, the chapter describes three research strategies that utilize social psychology's operational methods: descriptive, correlational, and experimental. Fourth, we will see that some methodological challenges are peculiar to social settings and the core social motives that people bring to them. Finally, the chapter includes a brief discussion of research ethics. By the end of the chapter, I hope you will see how the research enterprise operates and what dilemmas it confronts, that is, how social psychologists conduct their science.
FORMING HYPOTHESES: CONCEPTUALIZATION
The first chapter noted that social psychology is scientific. As a science, it involves the dynamic interplay among theory, method, and application—all three of which influence each other. This first section will tackle conceptualization: how social psychologists develop hypotheses, based on both theory and application to social problems (see Fiske, 2004 for other sources of ideas). The second section addresses operationalization: how social psychologists test scientific hypotheses, whether theory-based or application-oriented.
Application as a Source of Hypotheses
Most textbooks and lectures assume theory as the basis of research, but most introductory students assume real-world issues as the basis of research ideas, so the chapter starts there for the moment. Many social psychologists in fact go into the field because of an interest in improving the world, and many government agencies that fund research also emphasize the importance of applications.
Indeed, some of the most famous social-psychological research focuses on an important social issue, rather than a specific theory. For example, Craig Haney, Curtis Banks, and Phillip Zimbardo (1973) started from the premise that prisons fail on humanitarian, pragmatic, and economic grounds. Their approach stemmed from their social-psychological analysis of why this apparent failure occurs. The most frequent explanation for the failure of prisons, they argued, is the dispositional hypothesis: That is, the supposed nature of the people who populate prisons (both correction officers and inmates) creates the awful conditions in prisons. By contrast, a social-psychological account attends to the prison as a social situation, which itself creates the awful conditions and failures to rehabilitate. Thus, the researchers' goal was to evaluate the dispositional hypothesis (and by extension, the situational one as an alternative account) as a way to understand prisons. From an applied problem and a social-psychological analysis of the setting came their situational hypothesis.
They arranged to have Stanford student volunteers “arrested” by real police in the students' own residences and taken to a temporary basement prison, where they were stripped of their normal clothes; made to wear only muslin smocks, ankle chains, and rubber sandals; forced to cover their hair with nylon stocking caps; and constantly bossed and degraded by guards. For example, guards insulted prisoners ten times as often as vice versa and threatened them five times as often; the guards acted aggressively and spoke primarily in commands, without using names or other individuating references, while the prisoners resisted and spoke primarily to ask questions (Table 2.2). “Despite the fact that guards and prisoners were free to engage in any form of interaction (positive or negative, supportive or affrontive, etc.), the characteristic nature of their encounters tended to be negative, hostile, affrontive, and dehumanizing” (p. 80). The researchers felt compelled to halt the study when the cruelty of the guards (randomly assigned students) made the prisoners (other randomly assigned students) unbearably miserable. Five of the ten prisoners had already been released early because of extreme emotional reactions (depression, crying, rage, acute anxiety).
TABLE 2.2 Behavior as a Function of Randomly Assigned Roles: Prison Study
|
|
Role |
|
|
Behavior |
Guard |
Prisoner |
|
Verbal threats |
27 |
5 |
|
Deprecation/insults |
61 |
6 |
|
Physical resistance |
2 |
32 |
|
Weapon or tool use |
23 |
6 |
Entries are numbers of instances of each behavior.
Source: Data from Haney et al., 1973.
Before the study terminated, the guards had learned to dislike and victimize the prisoners; the prisoners adopted the guards' negative views and became passive. The study is a controversial classic in American social psychology courses, for methodological and ethical reasons, as we will see. Nevertheless, it illustrates the power of the situation—randomly assigned roles—to create extreme behavior.
As another example of application driving hypotheses, consider the Lewin organ-meat study described in the last chapter. In that instance, an applied problem, namely, getting families to consume unusual and economical foods in a time of scarcity, motivated the research. But Lewin's brilliance lay in his conceptual analyses of the social-psychological processes, namely, group decisions, that are most likely to prove effective.
When novice scientists become interested in a problem, they often leap directly from the application (e.g., gender stereotyping, sexual attraction, casual lies) to the method. Nevertheless, great social psychologists analyze the problem more carefully, to identify the social forces at work, in order to conceptualize the process more scientifically and ultimately more usefully. Lewin is widely quoted as having said, “Nothing is so practical as a good theory” (although in fact he attributed this bit of wisdom to an anonymous businessman). Whatever the source, the point is that application is improved by good theory, and theory is improved by relevant application.
Theory as a Source of Hypotheses
Hypotheses most often derive from theory, in what is called basic research. A theory, as a system of logical principles, attempts to explain or account for relationships among natural, observable phenomena. The theory is stated in abstract, general terms and generates more specific hypotheses (testable propositions), discussed later. What makes a good theory that can generate hypotheses for research? A good theory, first, posits causal relationships and also attempts to be coherent, parsimonious, and falsifiable. Let's address each feature in turn.
CAUSAL RELATIONSHIPS
Social psychologists create theories that attempt to explain people's interactions with each other. As such, social-psychological theories attempt to explain why people behave as they do in social situations. In explaining why, the theories posit causal relationships: Some behavior occurs because of some prior situation. Thus, social psychologists aim to make valid causal inferences (what causes what), a topic that will focus most of this chapter. Scientists are interested not only in describing and predicting people's social reactions but also in being able to say what causes them. Knowing how and why a phenomenon occurs creates a deeper understanding of it, not to mention the possibility of influencing it.
For example, if we knew that merely assigning people to low- and high-power roles in a highly regulated environment (such as a prison) can cause people to abuse other people, then we would understand much more about social power and abuse, as well as prisons, than if we merely knew that people differ in need for social power and in abusiveness, even if we know power and abuse are sometimes correlated. If we knew that merely making people think about professors, intelligence, and great books can cause people to retrieve their knowledge more effectively, then we would understand much more about mind-set and intellectual performance than if we merely knew that people differ in mind-sets and differ in performance, even if the two tend to be correlated. Positing causal relationships—not just correlations—features in any scientific theory.
COHERENCE
Another characteristic of good scientific theories is that they are coherent, that is, (a) all the parts of the theory agree with each other, so it is not self-contradictory, and (b) the theory makes internal sense. The last chapter gave an example of how common sense could be self-contradictory (opposites attract, but birds of a feather flock). As another example, the expression “out of sight, out of mind” directly contradicts the expression “absence makes the heart grow fonder.” Taken together, those would not be good scientific theory, in part because they contradict each other.
Coherence goes beyond simply not contradicting each other, for science must also tell a good story that hangs together and makes internal sense. Some theories in social psychology may be perfectly correct, but they do not tell a good story, so they do not generate much research. In the broad sense, coherence is an element in the aesthetics of scientific explanation; a coherent explanation is more pleasing to consider, is more feasible to test, and generates more research.
PARSIMONY
In addition to causality and coherence, scientific theories ideally are parsimonious; they are compact, explaining things simply, using as few concepts as absolutely necessary. Why garbage up the explanatory system with extra components? To predict one person's violence, surely a thousand predictors could combine to do a good job, but what use is that? If we can predict violence using three or four major factors, that is neater, more compact, and ultimately more useful, because those three or four factors will be the important ones and we can feasibly measure them. Scientists seek compact explanations, for both practical and aesthetic reasons, to the extent that reality allows simplicity and elegance.
FALSIFIABILITY
Finally, for theory to guide research, the ideas have to be falsifiable. That is, the theory has to be stated in such a way that it is possible to produce data that would make researchers conclude that the theory is wrong or that it is wrong in part, that is, not supported in the particular set of observations. If every conceivable observation that one can imagine can be explained by the theory, then something is wrong with the theory. Many a theory has lost its credibility within psychology precisely because it could explain everything. The theory was not so interesting or useful anymore because it could explain any result and its opposite.
For example, the useless prediction that the weather tomorrow will be between −80 and +120 degrees F will be correct but too vague to be falsifiable. (It is also uninteresting.) As another example, the statement that “all behavior is caused by some kind of instinct” proved to be unfalsifiable because every time a behavior seemed not to be predicted by a known instinct, a new instinct was invented. As instincts proliferated, they became useless as explanatory mechanisms (Boring, 1950). Some people have accused Freudian theory of explaining virtually every kind of observation. This probably is not totally fair, for it depends on whose version of Freudian theory is meant and whether one treats it as a scientific theory or as a more humanist project. From a basic science perspective, however, if a theory digests every conceivable thing in its path, it is not falsifiable. Theories have to predict concrete observations that would support or contradict the theory.
Theories are not typically destroyed by a single critical experiment that by itself falsifies the theory. Rather, the accumulating data build up contradictions that do not fit the theory, and confidence in the theory diminishes over time. Eventually, a new theory comes along to do a better job of accounting for the overall pattern of research results. Although a theory can be undermined and discarded, established scientists rarely use the term “disproved.” The process is more cumulative than that, and often the theory may hold under certain limited circumstances.
Conversely, novices often consider that a theory can be “proved”; it cannot. A theory can be supported by quantities of data that are consistent with it, or the theory can fail to gain support from data and be undermined by data that are inconsistent with it. But even if a theory is well supported, some researcher, somewhere, sometime, potentially could come up with some data that contradict the accumulated evidence, or some theorist, somewhere, sometime, could create a theory that accounts for the available data better than previous theories do. Thus, a theory can never be considered conclusively proved for all time.
With this caution, some theories may be extremely well supported, and the current consensus of scientists may agree that a given theory explains the available data better than any other available theory. Reviews of the research can even quantify the direction of results and what factors modify the results. This technique—called a meta-analysis—isolates the same research question across many studies, extracting a quantitative estimate of the size of the effect, and combines those effects over studies. This text uses meta-analyses whenever possible (for 322 examples, see Richard, Bond, & Stokes-Zoota, 2003) because they most reliably indicate the direction taken by a particular line of research. Meta-analysis is one method for scientists to focus on the weight of the data pro and con. Precisely because theories cannot be proved, but rather depend on the weight of the evidence, the theory must provide the possibility of enough supportive data, as well as the chance for contrary data that could falsify it.
SUMMARY OF THEORIES AS A SOURCE OF HYPOTHESES
Scientific theories specify causal relationships and aim for coherence, parsimony, and falsifiability. Given a coherent, parsimonious, falsifiable theory of causality, scientists generate more specific hypotheses that apply to particular situations. Whereas a theory is a broad system of ideas—for example, evolutionary theory, or the social-psychological theories in the rest of this book—a single hypothesis is a more specific, testable statement.
Hypotheses
A hypothesis is a statement of the relationship that is expected to exist between two or more conceptual variables. Ideally, a hypothesis takes the form of a declarative sentence consisting of “something-verb-something else.” The verb in the middle usually takes the form: affects, changes, increases, decreases, is related to. The hypothesis usually has an explicitly causal structure to it—one factor influences another factor: Group decisions influence individual behavior; thinking about intelligence improves performance; being powerless reduces confrontation; violent television increases aggression.
Although nothing should be simpler, finding the hypothesis explicitly stated in some social scientific articles can be hard, as my students often discover, to their surprise. When you participate in an experiment, if you have that option, or when you read and think critically about a scientific study, the first step is to know what the hypothesis is. (It is not enough simply to copy down what the experimenter tells you or what it says somewhere in the abstract.) Put the hypothesis into this form: “a does something to b.” Sometimes the hypothesis is “a and b do something to c and d.” More than one factor can appear on either side of this verb-arrow. Stating the hypothesis as “something(s) affect something(s)” reveals the structure of the researcher's argument.
Variables
Besides the verb (such as “affect”) that connects them, hypotheses are made up of variables: Variable a influences variable b. A variable may be defined as a characteristic that varies or changes either across people or in the same person across time and place. As an active learning exercise, you might pause and generate some social-psychological variables.
Actually, many of the chapter headings from here on constitute social-psychological variables, or pointers to some: self (self-concept, self-esteem), attitudes and persuasion, attraction, helping, aggression, stereotyping and prejudice, conformity and obedience. For example, helping, as a variable, can influence other variables, such as liking, and helping can result from other variables, such as attraction.
In addition, the most common applications of social psychology create variables. As some of the Chapter 1 examples noted, health affects people's social situation, and the social situation affects health, even survival. Social psychologists Ellen Langer and Judith Rodin (Rodin & Langer, 1977) concluded that elderly patients actually lived longer when given control over apparently trivial aspects of their lives (Table 2.3). One group of residents heard a lecture on the importance of taking responsibility for themselves and were offered a houseplant to water, as well as being able to choose movie times; the comparison group received a lecture about staff responsibility for their care, a houseplant that the staff would water, and staff-scheduled movie times. Over a year after the intervention, nurses evaluated the residents on measures that included whether they seemed happy, actively interested, sociable, self-initiating, and vigorous. The experimental group scored higher and also lived longer. The study is not flawless, and we will come back to it. However, the point here is its striking result on a variety of social-psychological variables, including survival. Although social psychologists rarely deal with death as an outcome variable, it could be one, as in this study. In other applied areas, other real-world variables enter in: arrest rates, voting participation, consumer choice.
TABLE 2.3 Effects of Perceived Control on Mortality in a Nursing Home
|
|
Condition |
|
|
Measure, 18 Months Later |
Responsibility Induced |
Comparison |
|
Mortality rate |
15% |
30% |
|
Nurses' evaluation of health (scale from 45 to 630) |
352.33 |
262.00 |
Source: From Rodin & Langer, 1977. Copyright © American Psychological Association. Adapted with permission.
Here is another variable: “What about gender?” you ask. Isn't gender a social-psychological variable? Some variables, such as gender, education, socioeconomic class, ethnicity, and age, are demographic variables; such population characteristics are one kind of person variable. Although often used in social psychology, these person variables are less often the focus of research. Social psychologists tend to look less at person variables—features of a research participant that the person carries all the time—and look more at situation variables that can change according to the social context. Later, we discuss more examples of social situation variables: how many other people were present, what role a person has in a group, whether two people depend on each other. Those are not intrinsic features of the participant; those are variables that depend on the situation. Social psychology focuses on such situational variables, although the less typical kind, person variables, also come up as well. Sometimes social psychologists will examine person variables in combination with situation variables, as we will see. Recall from Chapter 1 that field founder Kurt Lewin originally described behavior as a function of both person and situation.
Conceptual Variables
A hypothesis focuses on the general version of any given variable. The conceptual variable is the abstract version of the variable used in the hypothesis; it occurs in the dictionary. In studying aggression, one could look it up in the dictionary to get a conceptual point of view and an abstract definition. Or one could examine theoretical discussions by social psychologists, for example, about the nature and causes of aggression, to consider the conceptual version of the variable. In the same way, a researcher could conceptualize attractiveness, love, helping, or prejudice; all of these are social-psychological conceptual variables. All of them have definitions within social psychology that usually resemble the dictionary definition. This, then, is the abstract, conceptual version of the variable, used in stating the hypothesis.
Many conceptual variables are also hypothetical constructs, that is, ideas that link together various other ideas or observations as sharing some common property. For example, an attitude is a (well-established) hypothetical construct. We cannot directly observe an attitude, but we can observe many manifestations that are consistent with the concept. And the attitude concept links to other concepts (prejudice, attraction, self-esteem) in useful ways. Hypothetical constructs often are nonobservable entities inside people's heads, but they are posited because they bring together many phenomena that seem to be related. Goals, instincts, and schemas are other hypothetical constructs. People can debate their utility for clarifying scientific theory and generating research.
In contrast, other conceptual variables are less often considered hypothetical constructs because they were not invented to serve psychological theories, as the attitude concept was, but instead are considered to have some intrinsic reality. Helping, for example, while an abstract concept that may be expressed in multiple ways, occurs mainly as an observable behavior, and few would label it a hypothetical construct. Thus, all hypothetical constructs are conceptual variables, but not all conceptual variables are hypothetical constructs.
Summary of Hypotheses
More specific than theories are hypotheses, which state an expected relationship between two or more variables. Variables are factors that can vary across people, contexts, and time. Situation variables are most common in social psychology, but person variables enter as well. Hypotheses use the abstract or conceptual versions of variables.
TESTING HYPOTHESES: OPERATIONALIZATION
Having formulated a hypothesis, the researcher turns from conceptualizing to operationalizing, the focus of this section.
Operational Variables
As noted, hypotheses use conceptual versions of variables. Then there is the operational variable—the working definition or operationalization, which refers to the concrete, specific, precise empirical definition of the variable, used in testing the hypothesis. The translation from the conceptual version to the operational version constitutes the critical step in research. How concepts are operationalized matters because social psychologists generate evidence from concrete versions of the variable: measuring it or, in the case of an experiment, manipulating it. The evidence is only as good as the operationalization.
The operational version of the variable has to be so specific and concrete that one can literally point to it. This degree of concreteness goes to what is called public reproducibility, discussed in the next section. The operationalized version of the variable has to be so specific that researchers can show other researchers exactly how they could study it too in precisely the same way. Whether the operational variable is a survey questionnaire measuring attitudes, a computer key-press measuring speed of response, or a scenario whereby two strangers meet in a waiting room, researchers have to be able to describe their operationalized variables so precisely that someone else can replicate exactly what they did, down to the last detail.
Sometimes, the potential gap between the researchers' idea (e.g., the concept of aggression) and what they actually do in their research (the operational or working definition) is problematic. Professional reviewers but also novices often judge whether the researcher's idea has been operationalized properly—whether the research effectively and realistically captures the phenomenon. This is a valid criticism, and a practical alternative can emerge from the criticism.
For example, students might feel that the operational definition trivializes a concept that matters to them: “I can't believe that they are studying love by having people meet each other for five minutes and report how attracted they are.” (The answer is that it depends on what aspect of love is being studied; as an operationalization of initial attraction, that method might be acceptable, but as the close relationships chapter shows, it will not do to operationalize love that way.) For any topic, say, aggression or love, contradictions between what some people think it is (i.e., their concept) and how others study it (i.e., the particular operationalizations) can lead to creative new research directions.
Levels of a Variable
One of the key decisions in operationalizing a variable is picking appropriate levels of the variable to use. If a variable is defined as a characteristic that varies, then it must vary through different levels, values, qualities, or quantities. For example, how many different levels or values does gender have? (In other words, how many different genders are there?) Two. A reasonable answer, although for some purposes, degrees of masculinity and femininity might matter, or transgender individuals might be relevant.
How many different levels or values does income have? The answer depends on how specific the researcher wants to be. Usually scientists break down income, as in a survey; the researchers provide ranges, and participants select the range that includes their own income. That way, instead of having practically infinite numbers of options ($12,001, $12,002, ….), the researchers get answers that are accurate enough for their purposes ($15,000 or less; $15,001 to $25,000; etc.). But that is a decision on the part of the researcher, depending on the purpose of the research. For instance, in studying students, one would use income categories different from those in studying established doctors.
Take aggression as another example. When researchers consider aggression as a variable, the levels they use depend on whom they are studying and for what purpose. If they are investigating kindergarten children and focusing on interpersonal violence, homicide is not a likely level of aggression. However, if they are examining urban violence, homicide is a relevant level of the variable, whereas shoving somebody is probably not too relevant (although it can lead to homicide). In the Stanford prison study, insults and verbal put-downs constituted levels of interpersonal aggression. When social psychologists choose a variable, then, they make a choice about what levels or values of that variable they are interested in studying, depending on how detailed and inclusive they want to be.
Take, as another variable, comfort with the drinking habits of one's fellow students; such norms about alcohol consumption do influence people's own drinking. Comfort with the local customs has been measured on an 11-point scale, from 0 = not at all comfortable to 10 = very comfortable (Prentice & Miller, 1993). Participants' own comfort contrasted with what they perceived to be the comfort of all the other students, a phenomenon called pluralistic ignorance. For example, in September, male students on average believed other students were 1.64 scale-points more comfortable with local drinking habits than they themselves were; by December, their own comfort had increased, nearly to match the (falsely) perceived norm (Table 2.4). Women showed the opposite pattern; their discomfort did not change over the semester, as they perceived other students' comfort to increase. For men, this fit a pattern of drinking more over the course of the year, to fit in with the (at first falsely) perceived social norms.
TABLE 2.4 Ratings of Own and Others' Comfort with Norms about Drinking
|
Measure |
Self |
Average Student |
|
Men: comfort in September |
5.84 |
7.48 |
|
Men: comfort in December |
7.08 |
7.58 |
|
Women: comfort in September |
6.08 |
7.16 |
|
Women: comfort in December |
5.94 |
7.74 |
Source: From Prentice & Miller, 1993. Copyright © American Psychological Association. Adapted with permission.
Consider the issue of levels in the comfort variable. If the researchers had, alternatively, simply looked at a dichotomy (comfortable/uncomfortable), they would not have uncovered people's illusions so easily as they did with an 11-point scale. Conversely, they did not need to ask respondents to distinguish along a more detailed continuous range (on a scale from 1 = extremely uncomfortable to 100 = extremely comfortable) in order to find their effect. Even when researchers choose to use an intermediate range (typically 5–15-point scales), they must decide how to break up that range, how to label it (e.g., strongly disagree, moderately disagree, neutral, moderately agree, strongly agree), and whether to omit or include a neutral point.
Sometimes, consumers of research, both students and scientists, identify variables that they think are mistakenly broken up into inappropriate levels, variables that for one reason or another tend to group together people or responses that do not belong together. For example, the census faces a major dilemma in measuring ethnicity. Traditionally, the government and many survey researchers often collapse together all different kinds of Latinos into one category and all different kinds of Asians into another. Depending on the purpose of the research, that might be appropriate or not. Each of those categories lumps together people from many different backgrounds who might, in terms of their own social identity, know it as a mistaken lumping. “Asian” and “Latino” are social creations on the part of the culture, as is calling somebody “black.” Whether the researchers are studying people's own identities or some people's categories for other people, different levels of the variable ethnicity might apply. Studying identity is likely to require many levels (for Asian, at least South Asian, Southeast Asian, East Asian, but more probably the specific country of ancestors and identity as well as current nationality). Ethnicity might also include, as the 2000 census did for the first time, the possibility of checking multiple categories, indicating increasingly common multiracial identities. Depending on context, global identity (e.g., European) or subgroup identity (e.g., British) might matter (Crisp, Stone, & Hall, 2006). On the other hand, studying stereotypes might require only the global category “Asian” or even the mistaken belief that “they” are all “Chinese.”
Conceptual and operational versions of variables both include the assumption that variables have levels, but specifying levels of variables often arises in the process of operationalization. The purpose of the research, the hypothesis in particular, helps determine the appropriate levels of the variables.
Scientific Standards in Operationalizing Variables
In operationalizing variables for testing a scientific hypothesis, scientific standards require operational precision, public reproducibility, and accuracy.
OPERATIONAL PRECISION
In scientific methods, as noted earlier, an operationalization is the working definition of a concept for the purposes of studying it. Operational precision then forms the cornerstone of standards for scientific methods; it entails first executing one's research consistently and exactly, then describing one's sampling, procedures, manipulations, and measures in specific detail. All this entails specifying what a concept means. If social psychologists want to study aggression, for example, operationalizing the concept of aggression forces them to be scientifically precise, in order to be clear about what exactly they mean by aggression. Operational precision requires picking the exact observable variable. In studying aggression, shall they study insults, punches, homicide, or spouse abuse? As noted earlier, scientists could operationalize aggression many different ways, and choosing is the first step in operational precision.
Then, the method must be conducted in the same way every time. The exacting nature of asking the survey question consistently and of conducting the experiment the same way every time conveys operational precision. In choosing methods and conducting studies, researchers have to compromise between everyday reality and precise, feasible, scientific standards. In this compromise, operational precision helps to meet important scientific standards.
Finally, the detail of the methods sections of empirical articles also expresses the importance of operational precision. As a researcher, one must document one's method in complete detail and down to the last nuance. If one cannot do that, then it is not truly scientific. Operational precision forces social researchers to be specific and unambiguous. It requires the kind of scientific rigor necessary to progress.
PUBLIC REPRODUCIBILITY
Scientific methods also must be publicly reproducible, which means that others can examine and replicate the methods and the data they produce. Science depends on replication from one laboratory to another.
Private evidence is not evidence. One example of not being publicly reproducible comes from the early days of psychology. Around the end of the 19th century, the first laboratory psychologists, Wilhelm Wundt (1897) and Hermann Ebbinghaus (1885), relied a great deal on themselves as research subjects and sometimes just on introspection. They and their graduate students trained themselves to observe their own thought processes, specifically, their own memory systems. They would observe and reflect on how they memorized a list of numbers, for example. The problem is that if they developed a theory based on how they memorize numbers, and other scientists developed theories based on how they memorize numbers, how does one decide which is right?
Social psychologists have the same challenge because most people have strong intuitions about how they interact with other people, and they may or may not be right. People cannot tell more than they can know (Nisbett & Wilson, 1977a). In one study, students watched a teacher on one of two videotapes, one in which he behaved warmly, encouraged discussion, and described appealing assignments and another in which he behaved coldly, discouraged discussion, and gave unappealing assignments (Nisbett & Wilson, 1977b). Roughly 70% of students rated his physical appearance in his warm guise as appealing, and in his cold guise roughly the same percentage rated it as irritating. Despite these clear effects, 90% to 95% of the students claimed that his likeability had no effect or the opposite effect on their ratings of his appearance (Figure 2.1). Participants could not accurately report what had influenced their responses.
Figure 2.1 Inaccuracy of Verbal Reports
Source: From Nisbett and Wilson, 1977b. Copyright © American Psychological Association. Adapted with permission.
People may be unaware of a stimulus that affects their response, unaware of the response, or unaware of the connection between the stimulus and the response. Accordingly, psychologists cannot rely on research participants' verbal reports about why they responded as they did, because those reports are unreliable. In the same way, researchers cannot report on their own internal processes, using them as a form of evidence.
Psychologists cannot get inside the head and directly observe affective and cognitive processes, so we must use outward manifestations of the interior thinking and feeling processes. Those outward manifestations then encompass scientific methods that any researcher can use. Public reproducibility (being able to replicate someone else's methods exactly) is impossible for introspection. Introspective “methods” are not appropriate in modern science, except when—along with theory and application—researchers use them in first generating hypotheses. But introspection, intuition, hunches, and common sense do not belong in the main stages, when researchers go beyond generating a hypothesis. The key to being publicly reproducible is that one laboratory has to be able to replicate what another laboratory has already done. Conducting research carefully and writing methods down in detail (operational precision) constitute the first steps, but what is done and written needs to be observable and repeatable by others (public reproducibility).
OBSERVATIONAL ACCURACY
Social psychologists try to generate accurate observations that conform to scientific standards. Observational accuracy involves both precision and lack of bias (Rosenthal, 1995).
Observational precision minimizes the random error—or the noise—in measurements. An example is an antique clock that runs, unpredictably, sometimes fast and sometimes slow. This form of inaccuracy is imprecision; the clock would be approximately right, but not reliably off by exactly the same amount or even the same direction each time. A sloppy human observer, coder, or data-entry typist also would lack precision. Being careful, social psychologists (and other social and behavioral scientists) acknowledge that our data are likely to contain some random error, so we average over many observations, for the errors to cancel out.
The other form of accuracy entails lack of observational bias, which describes a constant error. That is, suppose a watch is always fast or slow by a constant amount. For example, some people set their watches ahead by five minutes, a constant bias that improves their on-time record. Which would be preferable, a watch that has a constant bias or a watch that has random error? Having a watch with a lot of random error and noise would prevent effectively adjusting for its inaccuracy, for one would never know whether it was five minutes fast or five minutes slow. With constant bias, as in setting one's watch ahead, one can always adjust to get the accurate answer, although one may not always remember in a pinch (hence, this trick's usefulness for the consistently tardy). With scientific observations, of course, the trick is knowing when observations are biased and by how much. More frequently, social psychologists simply attempt to prevent bias, as described later.
From Concept to Operation: Some Examples
As we have seen so far, both application and theory generate hypotheses, which state relationships between conceptual variables. Operational variables then test a hypothesis, adhering to scientific standards.
Consider a specific hypothesis as an example: Attractiveness increases persuasion (Table 2.5). Without going into theoretical detail (see attraction chapter), this hypothesis seems plausible. “What is beautiful is good” applies to people, at least in some respects, according to a meta-analysis. In a quantitative review of 76 studies (Eagly, Ashmore, Makhijani, & Longo, 1991), attractive people are consistently credited with being sociable, well-adjusted, powerful, and intellectually competent, although not necessarily trustworthy or concerned about others. On some counts, but not all, attractive people might well be persuasive. Certainly, advertisers think so. And so do political parties; think of all the attention paid to managing candidates' appearance.
TABLE 2.5 From Concept to Operation: Attractiveness Increases Persuasion
|
|
Hypothesized Cause |
→ |
Effect |
|
Conceptual Variables |
Attractiveness |
→ |
Persuasion |
|
Operational variables |
Pretested photographs |
|
Topic: college issues, health, consumer goods, or social policy |
|
|
or |
|
|
|
|
Same person, with better or worse grooming |
|
Measure: scale of agreement (choose levels, wording) |
|
|
or |
|
or |
|
|
Same person, smiling or not |
|
behavior (e.g., signing petition, buying product) |
Suppose, then, that our hypothesis is that attractiveness increases persuasiveness. Notice that “attractiveness” and “persuasiveness” are the abstract, conceptual versions of the variables used in the hypothesis. How might one operationalize this hypothesis? One could operationalize attractiveness in many ways. The meta-analysis by Eagly et al. (1991) documents several methods in use, but the most common one is preselecting people as attractive and unattractive, according to judges' ratings, and showing them in head-and-shoulders photographs or videotapes. A minority of studies change the appearance of the same target, for example, with makeup, hairstyle, and expression.
Suppose we pick simply the amount of smiling the person does, because we could set that up in an experiment with the same average-looking person smiling either a lot or very little. The meta-analysis indicates that using the same person twice produces stronger results than preselected attractive and unattractive photographs of different people. Then, to get down to the most specific form of the operationalization, we could present half the people with photographs of a communicator who is smiling and the other half with the same person not smiling. If we use a color photograph instead of black and white, the meta-analysis indicates the effects are likely to be stronger, so that would be wise. What's more, the effect is likely to be stronger if we do not present a lot of additional information about the person.
What about the operational version of persuasiveness? One could measure persuasion in lots of different ways also. First, we have to pick a topic. Another meta-analysis (Johnson & Eagly, 1989) lists topics that range from the merits of disposable razors to senior comprehensive exams to media coverage of hijacking. College issues, health, consumer products, and social policy are the most common topics in persuasion studies.
Then we have to measure persuasion. Examining the meta-analysis indicates, reasonably, that several questionnaire items tend to produce larger effects than one item, presumably because more items create more reliable measures (random error cancels out). Most often, researchers use an agreement scale (e.g., from 1 to 7), but we would have to decide the number of levels that make sense and how to label them. Or we could use a behavioral measure, such as signing a petition or buying a product. Each of these operational decisions would come up in measuring persuasion, to test the hypothesis that attractiveness increases persuasion.
Any decision about operationalizing a variable in a particular form is a kind of hypothesis itself. For example, a hypothesis lurks behind the idea to use smiling and unsmiling pictures to operationalize attractiveness, namely, the hypothesis that smiling is attractive. This too is an empirical question. In one study, Reis and nine students (1990) tested the hypothesis that smiling increases perceived attractiveness. Thirty undergraduates served as stimulus people. Each one was photographed with either a neutral or smiling facial expression. Attractiveness thus had two levels: The undergraduates either smiled or not. Then the researchers asked 100 other college students to rate the photographs on 20 trait adjectives, which thus measured attractiveness (Table 2.6). They found, for example, that smiling faces seemed more sincere and likeable, sociable and exciting, as well as competent and intelligent. But a smile also indicates submission, so smiling faces seemed less independent and self-assured, and less masculine, more feminine. Thus, smiling does reliably increase certain kinds of attractiveness, so our operationalization of attractiveness as smiling makes some sense, and increases in perceived sincerity and competence might well predict increased persuasion. However, our operationalization is weak in the persuasion context because smiling people seemed less independent and self-assured. All operationalizations present trade-offs, and researchers' skill partly involves balancing among them to pick the most reasonable and convincing operationalization, which after all is simply a working definition.
TABLE 2.6 Trait Perception Scores as a Function of Smiling
|
Dimension |
Not Smiling |
Smiling |
|
Sincere, likable, modest, sensitive, kind, trustworthy, nurturing, cooperative |
3.99 |
4.58 |
|
Sociable, interesting, exciting, sexually warm |
3.89 |
4.61 |
|
Competent, intelligent |
4.31 |
4.53 |
|
Independent, self-assured, strong |
4.36 |
4.22 |
|
Masculine, not feminine |
4.11 |
3.92 |
Source: From Reis et al., 1990. Copyright © Wiley. Adapted with permission.
Summary of Testing Hypotheses via Operationalization
Operationalization involves concrete versions of variables, the working definitions in the specific prediction for a particular study. Scientific standards for operationalization include operational precision, public reproducibility, and observational accuracy (precision and lack of bias), as indicated in several examples.
CHOOSING A RESEARCH STRATEGY
All research involves variables and hypotheses. Having formulated a hypothesis with conceptual variables, and now in the process of operationalizing the variables, the researcher must choose an overall research strategy. One way to understand this choice is to consider three commonly identified basic strategies (e.g., Rosenthal, 1995; Stangor, 2010: descriptive, correlational, and experimental. Each research strategy differs in the type of hypothesis it investigates, as well as particular issues it raises.
Descriptive Research
Descriptive research aims to depict accurately some characteristic in a population of interest. Descriptive research focuses on only one variable at a time, assessing the amount or average level of a given variable in a population. It concerns an atypical kind of hypothesis, raises specific issues about sampling, and addresses a unique kind of question.
NOT A TRUE HYPOTHESIS
In descriptive research, the “hypothesis” is an exception to our earlier definition; descriptive research does not have an explicit hypothesis in the same sense as other kinds of research. Remember that a hypothesis was defined as a statement of the expected relationship between two or more variables. Because descriptive research addresses the amounts of a single given characteristic, it cannot very well look at the relationship between two or more variables. Typical questions might include: What is the percentage of athletes on campus? How productive is the average professor? What is the violent-crime rate in New York City?
A common kind of descriptive research is a public opinion survey. What proportion of the people in the United States think we should get rid of welfare, vote Republican in a given election, or believe the president is doing a good job? The media report such poll results all the time, especially during presidential election years. Public opinion surveys address other pressing issues: In November 2007, the Harris Poll reported that 41% of adults believe in ghosts and 31% believe in witches.
The census also provides descriptive research results. Its most important figures describe population numbers by area. These numbers determine a region's share of representatives in Congress, federal allocations of money, and other critical resources. The census also paints a portrait of our lifestyle, based on a subset of questionnaires that are longer than usual. For example, which type of pet is more common: cats or dogs? Answer: Dog-owning households outnumber cat-owning households (31.2 versus 27.0 million), but more pets are cats, because cat households average 2.2 each, while dog households average 1.7 each (U.S. Bureau of the Census, 1997). Almost all (88%) claim their pet as a member of the family, according to a December 2007 Harris Poll.
RANDOM SAMPLING
The crucial issue in descriptive research is whether the researchers have selected a good, unbiased sample. Is it a random sample of the population to be described? For example, suppose a researcher wants to know what proportion of undergraduates on campus want to go to graduate school. A researcher could ask people in the introductory social psychology class if they are planning to go to graduate school, but these students might be a biased sample, not representative of the larger campus population. Even psychology students in general are not a good random sample of the campus population. If a researcher wanted a good, accurate description of the campus population, the person could ask literally everyone on campus, but random sampling provides a still better way. Researchers do not actually have to poll the entire student body to get an accurate estimate of the proportion of people planning to go to graduate school. They can use a sample more efficiently, but the sample does have to be random for the estimate to be accurate. In a good random sample, everyone on the campus is equally likely to be included.
A true random sample requires that every member of the population of interest has an equal chance of being in the sample. A random sample provides a reliable, cost-effective estimate of the population as a whole. (That is why presidential election polls can sample as few as 1,000 respondents nationwide to estimate the whole adult population within a few percentage points.) In the national census, some politicians are suspicious of sampling and insist on a person-by-person count, even though (a) such supposedly exhaustive counts miss a lot of people (primarily poor and homeless people), (b) intensive sampling could save the country a great deal of money, and (c) modern techniques make sampling potentially even more accurate than attempting a person-by-person count. Again, in the year 2000 presidential election, the crucial Florida votes might have been better assessed by the exit poll samples than by the unreliable machine counts and the confusing paper ballots with their incomplete punch marks. Ironically, the television networks were criticized for relying on exit polls that may have been the more accurate measure of people's intended votes.
Consider another sampling problem. Suppose researchers want to study the proportion of people on campus who have been victims of violence. They could examine campus police records, but what kinds of violence might be omitted from those reports? In particular, fights, rape, and abuse are not always reported to the police. What if researchers advertised in the campus newspaper for people to volunteer to report instances of aggression anonymously? Volunteers who would come forward for a study would be an incomplete, biased sample as well. The best approach is to draw a truly random sample (picking people by using a random number table) and then work hard to get as many of the sample as possible to respond. Indeed, victimization reports to survey researchers may best estimate crime rates. Researchers often aim to get at least a 60–70% response rate from their intended sample, to minimize bias from nonresponders. This is not easy, but researchers have found that incentives, such as unconditional gifts, do help (Tourangeau, 2004).
Why can't researchers just use volunteers, instead of contacting unsuspecting random recruits and then hounding them to respond? Unfortunately, the kinds of people who spontaneously volunteer for studies differ systematically from the kinds of people who do not (Rosenthal & Rosnow, 1975). Volunteers are better educated, higher social status, more intelligent, more sociable, and higher in need for social approval. They may also be more arousal-seeking (if the study concerns stress, sensory isolation, or hypnosis), more unconventional (especially for a study of sex), more likely to be female (unless the study is physically or emotionally stressful), less authoritarian (i.e., more comfortable with loose hierarchies), and more likely to be Jewish or perhaps Protestant than Catholic. Moreover, paying people the modest amounts researchers can usually afford does not dramatically change the volunteer biases. Going back to the most recent example, do you think the typical volunteer may be more likely, less likely, or about average on exposure to violence on campus? For such reasons, then, a random sample would be better than relying on volunteers, even paid volunteers.
The time of the semester at which people volunteer also turns out to matter. Reliable personality differences correlate with the time of the semester at which people volunteer for a study. People who participate earlier are more organized, more likely to plan ahead, and more conforming, whereas people who participate at the end of the semester tend to be more hostile to authority (Neuberg & Newson, 1993). If researchers were trying to study people exposed to violence, even the time of the semester at which they solicited their volunteers might bias their sample.
If we consider moving outside a student population for a random sample, what about going to shopping malls? Consider some of the biases in who shows up at a mall. Would they be a good random sample for assessing the incidence of violence? Consider who shows up at different times of the day or week. Would Friday night shoppers be more or less likely than Monday morning shoppers to be victims of violence? A mall does not provide a truly random, representative sample of the country. Online surveys also do not randomly sample but still do better than college students.
Descriptive research asks what proportion of the population has experienced violence, believes in space aliens, gets divorced, or possesses a home computer. Or it may ask, What is the average level of trust in government? Time spent holiday shopping? To get accurate answers to these questions requires a truly random sample, in which every single member of the population has an equal chance of participating.
COMPARED WITH WHAT?
The utility of descriptive research comes in the comparison of the results (e.g., an average or percentage) with some baseline expectation. When pollsters say 40% of the U.S. population thinks our president is doing a good job, why is that a meaningful statistic? The statistic matters only in comparison with that president's own prior ratings, other presidents' ratings, or one's current impressions of the president's popularity. If people thought that everyone approved, then the 40% figure would surprise people as low. Or if people thought that nobody approved, then 40% percent is not so bad. Alternatively, perhaps 40% compares favorably with other presidents at the same point in their term. As another example, crime statistics interest people mainly when they have gone up or down (compared with the last report), that is, when they differ from people's expectations.
What always makes descriptive research interesting is the “compared-with-what” question, which implies an implicit or explicit baseline. A baseline sets a standard for comparison with new data. When somebody quotes a survey result, or when the newspaper reports one, think about precisely why the statistic matters. The implicit “compared-with-what” question always lurks behind the descriptive result: compared with what one personally might expect, what pollsters reported last time, or what people generally think? A 41% belief in ghosts is surprising because many of us would not expect the number to be so high; also, it is surprising compared with 30 years earlier, when only 11% reported believing in ghosts. If pollsters say only 25% of U.S. teens can identify Winston Churchill's country or the year of the United States' independence, why is that interesting? It would be interesting because the compared-with-what baseline is the standard that everyone ought to know history.
No single comparison is right or wrong: Different comparisons answer different questions. Knowing that, for black Americans, the last 50 years have seen steady improvements in infant survival, life expectancy, years of schooling, home ownership, and numbers of elected officials (Pettigrew, 1996; see Figure 2.2), what comparison does this make? It compares black Americans with themselves previously, and the results look encouraging. But when one compares black Americans with whites, on each of these measures, blacks are disadvantaged. The useful comparison depends on the question to be answered.
Figure 2.2 Blacks' Progress and Disparities Compared with Themselves and with Whites
Source: From Pettigrew, 1996. Copyright © Allyn & Bacon. U.S. Bureau of the Census, 1979, 1992a. Adapted with permission.
SUMMARY OF DESCRIPTIVE RESEARCH
Descriptive research reports the estimated amount or value of a particular variable in the population. For accuracy, the sample needs to be carefully selected at random. Behind every descriptive statistic is the baseline or compared-with-what question that answers whether and why the result matters. In that sense, then, the hypothesis in a descriptive study would indeed concern the relationship between two variables, the estimated amount of the variable in a population, compared with an implicit baseline.
Correlational Research
Correlational research exposes the implicit comparison lurking behind descriptive research; it specifies the variable being compared with another variable. Correlational research investigates whether changes in one variable are related to changes in another variable. Thus, it always considers at least two variables, whereas descriptive research considers only one variable at a time. If descriptive research addresses questions such as “What is the level of violence on campus?” correlational research addresses the relationship between being a perpetrator of violence and being (for example) an athlete.
HYPOTHESIS
Because a hypothesis in correlational research takes the form that changes in one variable are related to changes in another variable, correlational research always considers at least two variables, asking whether they reliably go together. At the conceptual level, a correlational hypothesis could propose, for example, that changes in stress relate to changes in aggression: As stress increases, aggression increases. Alternatively, to go back to an earlier hypothesis: Attractiveness increases with persuasiveness, so as somebody is more attractive, the person is also more persuasive.
Moving from the conceptual to the operational, researchers could examine for instance the stress-aggression correlation among college students, prison guards, or nursing home patients. Researchers could look at the attractiveness-persuasion hypothesis among politicians, rocket scientists, or salespeople. Focusing on the latter for a moment, one can ask, are the more attractive salespeople also the more persuasive ones? One operationalization suggested earlier was the number of smiles and the listeners' agreement to buy a product (for example, shampoo). The specific prediction in the study would be that number of salesperson smiles correlates with listeners' agreeing to buy the shampoo.
Let's go back to the stress-aggression hypothesis. What specific ways could researchers operationalize that hypothesis into observable variables for a correlational study? Researchers could measure, for instance, college students' reports of stress on a standardized scale of stressful life events (e.g., unusual pressure from academics, activities, family, relationships) and see whether the stress score correlates with self-reported or acquaintance-reported aggression. Indeed, meta-analysis suggests a clear relationship (Carlson & Miller, 1988). Whether considering the operational prediction or the conceptual hypothesis, correlational research always addresses the relationship between at least two variables.
TYPE OF RELATIONSHIP
As long as I teach social psychology, I will continue to borrow Robert Rosenthal's class demonstration of types of correlational relationships: One way to think about correlational studies is that each person in a sample has two variables (imagine the person in Figure 2.3 “holding” one in each hand). In the case of a positive correlation, the person is high on one variable and high on the other (both hands raised). This would include high attractiveness and high persuasiveness, or high stress and high aggression. A positive correlation also predicts (both hands lowered) low attractiveness and low persuasiveness, or low stress and low aggression. Medium levels of each variable go together as well (both hands waist-level). You can think of it as two variables moving together: high-high, low-low, or medium-medium (in class, I am now flapping my arms). Lots of variables have positive correlations, such as the amount of sleep that you get and how well you do on your exam the next day, the number of people you know in a group and how comfortable you feel, threat to your ethnic group and prejudice against outsiders.
Figure 2.3 Illustrating One Way to Think about Correlations
In contrast, negative correlations mean that participants are high on one variable and low on the other (one hand up, the other down, looking roughly like a semaphore). For example, people who are rude are probably not too persuasive. High on rudeness, low on persuasion. Low on rudeness, high on persuasion. Another negative correlation would be distance and attraction; people are often attracted to people who live nearby, not far away: low distance, high attraction. If the variables go opposite to each other, that is a negative correlation.
Finally, in zero correlations, one may know the person's value on one variable (x), but it does not predict anything about the value on the other variable (y). What is the correlation between the amount of blue you wear in your wardrobe and how much you exercise? Zero. What is the correlation between a professor's research productivity and teaching ratings? Close to zero. (On some dimensions, such as being up-to-date, organized, committed, and enthusiastic, the correlation is somewhat positive, whereas on others, such as facilitating interaction or managing the course, it is zero; Feldman, 1987; Hattie & Marsh, 1996).
Correlations range from −1.00 to +1.00. The ones in social psychology all tend to be much less than plus one and much greater than minus one. In social psychology, we never get perfect 1.00 correlations between anything; if someone does, then chances are that somebody made up the data. A strong correlation in social science is about .50 or above, a medium one about is .30, and a small one is about .10, according to psychological statisticians (Cohen, 1992). Similarly, −.50 would be a strong correlation, −.30 would be moderate, and −.10 would be weak. (Statistical significance, a separate matter, would depend on both the size of the correlation and the size of the sample, a topic beyond the scope of this text.) Correlations between social science variables rarely approach +1.00 or −1.00, in part because all our variables are measured imperfectly, so the data carry a lot of noise (random error), which attenuates the correlations. A 1.00 correlation is not plausible unless the researcher (mistakenly) measured the exact same thing twice, and even then some error would be expected.
More important, though, in social psychology, the most interesting results are the ones that are not obvious. If the correlation between two variables is perfect, they may be redundant (measuring the same thing), so it may not be interesting. The most interesting correlations show that two quite different variables are, in fact, related: for example, a correlation between warm weather and student protests.
Some people like to get a visual sense of how patterns of data generate correlations. Consider a study of stereotypes about various groups (Fiske, Cuddy, Glick, & Xu, 2002, Study 2). A group's perceived status could correlate with its perceived competence, in various ways. People could take (a) the just-world view that groups get what they deserve (a positive status-competence correlation) or (b) the sour-grapes view that high-status groups get arbitrary and unfair breaks, whereas struggling low-status groups are held back (a zero or even negative correlation). Figure 2.4 shows people's average views of various groups, where each dot is one group. Because the cloud of points runs from “southwest to northeast,” it shows that the perceived status-competence correlation is positive and high (.88), supporting the just-world view that as perceived status increases, perceived competence does too.
Figure 2.4 Positive and Negative Correlations in Out-Group Stereotypes
Source: Data from Fiske et al., 2002, Study 2, unpublished figure.
The study also examined the groups' perceived competitiveness and perceived warmth. If people think that groups who compete with the mainstream are not sincere, trustworthy, and kind, then the correlation between perceived competition and perceived warmth should be negative, which it is: −.31. The cloud of points would run from “northwest to southeast.” As perceived competition increases (special breaks, power struggles), then perceived warmth decreases.
Regardless of whether one examines the numbers or figures, the overall point is that correlations express a type of relationship that can be positive, negative, or zero, and stronger or weaker.
CAUSAL AGENDAS
The main issue for correlation: Researchers or reporters usually are hiding a causal agenda. Usually they are not really just saying A goes with B, but they have a hidden agenda claiming that A causes B. Let's take our example of attractiveness and persuasion as correlated. Consistent with Table 2.5, you probably thought that attractiveness causes people to be more persuasive. In a typical correlational test of that hypothesis, however, the data might show only that across a series of local elections, the attractiveness of the candidate is correlated with the proportion of the vote garnered; the more attractive candidate tends to win. Given these correlational data, however, we do not really know that the attractiveness was causing the persuasion (i.e., gaining the votes), but that was our implicit idea.
In correlational research, causality is usually an implicit hidden agenda. But, in fact, researchers do not have a right to say, in a correlational study, what causes what because they do not actually know. Take this obvious example: Attractiveness causes persuasion. What about the opposite? What if winning elections makes people more attractive? It could; success and self-confidence can make people smile, or at least get them to the tailor and the hairstylist, any of which could make them more attractive. So the causality could indeed go the other way.
A correlation potentially represents any of three possible relationships: A causes B, B causes A, or some third variable causes both of them (Table 2.7). As an example of a third variable, money could cause both winning elections and being attractive. The more money politicians have, the better their health, the better they eat, the better they dress, and the more handlers they can pay to tell them what to say, how to smile, and which awkward mannerisms to ditch. This way, a third variable, money, could cause both attractiveness and electoral success. Every time you hear about a correlation between two variables, think hard about which of these causal patterns could be operating, because the person reporting the correlation almost always has a hidden causal agenda in mind. But the opposite pattern could hold, or a third variable could cause both.
TABLE 2.7 Three Patterns of Causality for a Correlation
|
Observed Positive Correlation |
Attractiveness and Persuasion |
||
|
Possible causality |
Attractiveness |
→ |
Persuasion |
|
|
Persuasion |
→ |
Attractiveness |
|
|
Money |
→ |
Attractiveness |
|
|
Money |
→ |
Persuasion |
Take some other examples: The weekly consumption of ice cream and the crime rate go up and down together, so sugar causes crime, right? What is an obvious third variable? Heat. Another interesting one: storks and babies. There is a positive correlation between storks nesting and babies arriving. Do storks cause babies to arrive? No, there is a third variable: heat again. Certain storks like to nest in warm chimneys, people stay inside more in cold weather, and babies result. Take measured IQ and school performance. Does IQ cause better school performance? Maybe. But doing well in school can also increase performance on an intelligence test. And third variables, such as social class and cultural experience, strongly affect both. IQ and school performance seem like obvious correlates, but people do not take account of third variables that strongly affect both those variables. In short, the problem with inferring causality from correlations is that correlations indicate only association: All one knows is that two variables go together. Storks and babies, ice cream and crime, IQ and school performance. But the correlation does not tell why or how.
INFERRING CAUSALITY
Inferring causality requires three factors identified originally by philosopher John Stuart Mill, plus a fourth that will help us to think about some of social scientists' assumptions. The first necessary factor is association, which we have just been discussing as correlation. For one variable plausibly to cause another variable, they must be associated or correlated in some fashion, positive or negative. For example, if researchers hypothesize that violent television causes aggression, then they have to show that the two are associated; people who watch more televised violence also commit more violence. Association allows survey researchers to predict one variable from another. For example, an actuary or demographer can predict someone's mortality with a reasonable degree of probability, given the person's age, gender, income, and family history. But the prediction-by-association is mute about causality.
Inferring causality in addition requires temporal priority, meaning the alleged cause has to come before the alleged effect. This seems obvious; causality does not run backward in time. However, in specific cases, temporal priority may not be so easy to establish. For example, to make the argument that attractiveness causes electoral success is easy because one normally considers attractiveness to be a fairly stable aspect of a person that predates running for office. But researchers would have to be careful to measure attractiveness before the election season began, in case winning (or the anticipation of it) actually made people more attractive (they cheer up and look better). Similar temporal issues arise with the example of television violence. Does a habit of watching violent television develop before or after a person becomes aggressive? In each case, temporal priority is a critical ingredient to inferring causality.
The third condition for inferring causality is ruling out alternative explanations or establishing a nonspurious relationship. A spurious relationship just means that a third variable explains the relationship between the two variables of interest. If cold weather causes the storks to nest in warm chimneys and people to be inside making babies, then the relationship between storks and babies is spurious. If ice cream consumption and crime are both related to hot summers, then the direct relationship between the two is spurious. In contrast, if no plausible third variable causes the two variables that are correlated, then the relationship is not spurious and one is closer to inferring causality. Establishing temporal priority and nonspurious relationships (ruling out third variables) can be tough with correlational data. However, some statistical techniques (beyond the scope of this text) can help researchers examine potential causality. Even so, correlation does not establish causation.
The other factor some social scientists would require for inferring causality is a theoretical rationale or reasoning. A researcher cannot convincingly argue for causality without making a plausible case for psychological mechanisms, exactly why and how one variable affects the other. For example, most scientific psychologists do not accept ESP (extrasensory perception or mind-reading) as a reliable phenomenon, in part because most available studies either have failed to produce evidence or have been plagued with methodological flaws. That is, nonspurious association has not been established. But even if the studies did show convincing support, the other trouble with ESP is that no one has posited plausible mechanisms by which it could operate (not to mention reliable association, temporal priority, and nonspurious relations). Lacking a theoretical rationale, researchers are reluctant to pursue causality.
SUMMARY OF CORRELATIONAL RESEARCH
Correlational hypotheses hold that one variable relates to another variable; the relationship can be positive (the variables tend to go together) or negative (the variables tend to go in opposite directions). Reports of correlation often have a hidden causal agenda, but three patterns are possible: causation, reverse causation, or a third variable. Correlation does not imply causation, because at best correlation can show only association, with temporal priority and theoretical rationale potentially provided as well. But the difficulty comes in showing that the correlational relationship is not spurious, that no third variable accounts for it. As a partial solution, researchers can measure plausible third variables and make convincing arguments that the patterns of correlations are consistent with their hypothesized view of causality. But only one formal method demonstrates a nonspurious relationship and therefore establishes causality, namely, the experimental method.
Experimental Research
Let's begin the discussion of experiments by comparing them to the two previous kinds of research and then work through an example regarding media violence, using it to see how experiments allow causal inferences. With that background, we will formally define experiments and walk through various examples of experimental designs. All of this should make you a more informed consumer of experimental research presented in public forums, not to mention in this book.
ADVANTAGE OVER DESCRIPTIVE RESEARCH
Consider the following poll results: 61% of Americans report being bothered some or a lot by television violence, according to a 2005 Pew poll, and 83% believe that television violence causes violence in real life, according to a 2000 First Amendment Center poll. Most people apparently think that media violence numbs people, making them insensitive to violence (76%) and telling them that violence is fun and acceptable (71%) (“Are music and movies killing America's soul?” 1995).
Suppose you are advising parents and media moguls about television and movie programming. What do these poll results tell you? Many people say they are against media violence and think it has a bad influence. These are excellent and useful descriptive statistics. Do these results tell whether media violence actually does make people more likely to aggress against other people? No.
ADVANTAGE OVER CORRELATIONAL RESEARCH
Suppose researchers want to pursue the hypothesis that television violence really does increase aggression; the hypothesis is stated in terms of a positive association between the two conceptual variables (television violence and aggression), with an explicit causal agenda concerning which causes which. The operational version of these variables might be, for television violence, the reported number of violent television shows regularly watched, and for aggression in children, their aggressiveness as rated by the teacher. Suppose further that, as the aggression chapter describes, researchers discover a positive association. As a statement about children's natural television-viewing habits and spontaneous aggression, this is an informative result, but it does not provide the fullest support for the hypothesis.
As just noted, three possible patterns of causality could underlie that finding:
· Television violence may indeed cause aggressive behavior.
· Aggressive behavior might cause a preference for violent television shows.
· Poor social skills (or some other third variables) cause a preference for violent television and aggressive behavior toward peers.
Clearly, a theory could account for each of these patterns of causality (and some theories will, in the chapter on aggression), so plausibility of rationale does not help differentiate among these alternatives. One could also attempt to establish temporal priority, by measuring each variable at two points in time and seeing whether television at Time 1 predicts aggression at Time 2, or aggression at Time 1 predicts television at Time 2, or both. (This would support temporal priority.) Thus, researchers could work on association, rationale, and temporal priority, given these correlations.
But what can we do about those pesky third variables, such as social skills, rotten environment, or aggressive personality? Under some circumstances, researchers can make an excellent argument for causality in correlational data, and one of the major advantages of the correlational method is that one can measure important variables in the real world. Nevertheless, despite what researchers can do statistically to control for third variables (as stated earlier, a topic beyond this book), with correlational data, researchers can never completely rest easy that they have ruled out all the alternatives, that is, that the correlation is not spurious.
EXAMPLE AND DEFINITION OF AN EXPERIMENT
Now consider an experiment on the same hypothesis that media violence increases aggression (Bushman, 1995): 148 college students watch Karate Kid III, a movie rated 7.2 on an 11-point scale for violence, and another 148 watch Gorillas in the Mist, a movie rated 2.2 on the same scale for violence; both are rated equally exciting, action-packed, and entertaining. Immediately afterward, each participant competes with an ostensible opponent in a reaction time game, in which the person slower to respond receives an unpleasant blast of noise. Just before each trial, each player sets the intensity of the noise blast to be administered if the opponent loses. Then researchers measure whether the first group plays more aggressively than the second group, that is, sets higher levels of noise for the opponent. Indeed, the simple effect of the violent video is to make participants select noise at 4.6 on a scale of 10, whereas nonviolent-video watchers selected 3.9, so some initial evidence fits the hypothesis. (We discuss more results later.) How do this design and set of results improve on a correlational study?
First, let's get some terms straight. In experiments, the potentially causal variable is called the independent variable, and that causal variable can affect the dependent variable, the measure. The independent variable is manipulated by the experimenter, such that some participants receive one level and others receive a different level (for example, viewing the violent or nonviolent movie). The independent variable is independent of other features of the individual participant, because the experimenter controls and assigns its levels. The dependent variable is measured by the experimenter, such that participants provide responses that the experimenter records. The dependent variable depends on the independent variable. That is, the manipulated independent variable causes changes in the measured dependent variable. Learn this distinction. When you read about experiments in this book, or if you participate in an experiment, you should be able to identify the hypothesis and the independent and dependent variables. What's the manipulation? What's the measure? The experiment will be more understandable if you know the independent variable and the dependent variable, because then you know what the researcher was trying to do.
A true experiment contains three components: manipulation, randomization, and control. First, the independent variable must be a manipulation set by the researcher. The experimenter can change the levels of the independent variable for different people, by assigning them to different experimental conditions. That is, the experimenter must be able at will to put some people in the experimental condition and others in the control condition, for example, give some people the violent movie and others the nonviolent movie. It is easy to assign different people to watch different videos in the laboratory, but it would be difficult to assign them different patterns of television or movies over a longer period.
The necessity to be able to manipulate the independent variable in an experiment determines the kinds of variables that can be operationalized experimentally. For example, an experimenter could manipulate the apparent characteristics of a partner in a two-person task (e.g., attractiveness or membership in a stereotyped group), by providing a photograph of an expected interaction partner or by employing a confederate. Equally, an experimenter could manipulate characteristics of a persuasive communication (e.g., argument quality or personal relevance). But an experimenter could not manipulate a research participant's own long-term attractiveness or membership in a stereotyped group per se, nor could an experimenter manipulate all the persuasive communications in a participant's environment. Likewise, an experimenter cannot manipulate a participant's personality, gender, age, or other person variables (called demographic variables when related to population variables such as ethnicity and age; called individual difference variables when related to personality). The first feature of an experiment, the ability to manipulate the independent variable at will, determines the kinds of independent variables experimenters can investigate. Nevertheless, you may be entertained and impressed by the ingenuity of some of the experiments in social psychology.
The second feature of a true experiment is randomization, that is, random assignment of participants to the different levels of the independent variable. In randomization, every participant has an equal chance of receiving each level of the independent variable. Note the similarities and differences between random assignment and random sampling to participate in the study in the first place, discussed under correlational research. In random sampling, every member of the population of interest has an equal chance of participating in the study; in random assignment (randomization), every member of the sample has an equal chance of receiving each level of the independent variable. A study can have both randomization and random assignment or either one alone, but true experiments require only random assignment to conditions.
What techniques constitute truly random assignment? Flipping a coin, if the independent variable has two levels, would randomly assign participants to conditions. A random number table could serve the same purpose. But alternating participants, one to each condition, in order of running the study, is not truly random. (What if people sign up with their heterosexual dating partners, and the women tend to sign up first? All the men could end up in one condition and all the women in the other.) What about running one condition on one day of the week and another condition on another day of the week? (Not good. Maybe some majors have labs or studios on Wednesdays and Fridays, and others on Tuesdays and Thursdays.) Morning and afternoon would not work either, because maybe morning people differ from night people.
One flaw of the nursing home study, making it less than a true experiment, is that the residents were assigned by floor, with one entire floor assigned to receive the experimental condition (control over their environment), and the other floor to serve as the baseline comparison group. Residents may not be assigned to a floor totally at random, and indeed, the experimental floor residents were healthier before the study began, which may partially or entirely account for the results. The researchers may not have been able, for practical reasons, to assign residents randomly to experimental and control groups within floors; real-world considerations often compromise the design of a true experiment.
What about self-selection into conditions? What would be the obvious problem with allowing participants to pick whether they watched the violent or nonviolent movie? Clearly, the more aggressive ones would pick the violent movie, so the experimenter would not know whether aggressive personality or the movie itself made them play more aggressively afterward. Indeed, in one study (Bushman, 1995, Study 1), participants who scored higher on the personality trait of aggression were more interested in violent videos than were participants low in trait aggressiveness (Figure 2.5). Because of biases inherent in self-selection, then, truly random assignment matters.
Figure 2.5 Trait Aggression's Influence on Choice of Violent Videos
Source: From Bushman, 1995. Copyright © 1995 by the American Psychological Association. Adapted with permission.
The third feature of a true experiment is experimental control over potential third variables to prevent confounding the independent variable with an extraneous variable. This is equivalent to compensating for potential contamination by third variables in correlational research, to prevent a spurious relationship. The unique feature of experiments is experimental control: Researchers can isolate the independent variable, making sure that nothing else correlates with it, or in other words, prevent confounding the manipulation with unwanted other factors. In confounding, an extraneous variable is manipulated simultaneously with the independent variable of interest. If participants watch a violent movie or a nonviolent movie, for example, it would never do to give the violent-movie watchers ice cream and the nonviolent-movie watchers popcorn. Snacks would be confounded with the independent variable.
One of the problems with the prison study is that it did not precisely isolate its independent variable, prison role (guard vs. prisoner). Although student volunteers were randomly assigned one role or the other, the role was confounded with a variety of theatrical effects (costumes, public “arrest”) that make it hard to isolate or precisely identify the manipulation. To be fair, the researchers' aim was to re-create a variety of features of the prisoner/guard role, but the effect of that choice was to create confounds that undermine its judged validity as a true experiment.
As another example, in last chapter's organ-meats study, Lewin himself identified confounds in his experiment. Recall that some participants heard a lecture, and some made a group decision about the wisdom of consuming organ meats. The finding that the group decision was more persuasive could be caused also by (a) the differing personalities of the leaders of the two groups or (b) the expectation, created only for the group-decision participants, that they would be queried later about their introduction of the new food. Lewin (1952) addresses these confounds, as social psychologists have ever since, by replicating his results on another sample but without the confounds.
To summarize, then, the three crucial features of experiments are manipulation of the independent variable, random assignment to levels of the independent variable, and control over potentially contaminating third variables to prevent confounding.
MAIN PURPOSE OF EXPERIMENTS
A true experiment's crucial features create its unique power as a research method to infer causality. Experiments demonstrate causality by fulfilling the four requirements outlined earlier, establishing association, temporal priority, rationale, and nonspurious relationships (see Table 2.8).
TABLE 2.8 How Experiments Demonstrate Causality
|
Factor Necessary to Causality |
Feature of True Experiments |
|
Association |
Measure DV at different levels of IV |
|
Temporal priority |
Manipulate IV, then measure DV |
|
|
Random assignment to levels of IV prevents preexisting differences on DV |
|
Rationale |
Theory explains mechanisms |
|
|
Theory makes specific predictions |
|
Nonspurious relationships |
Isolate the IV from contaminating confounds |
IV = independent variable; DV = dependent variable.
Association is established by comparing results in different experimental conditions. For example, if the participants in the violent-video condition behave more aggressively than those in the nonviolent-video condition, then video violence is associated with violent behavior.
Temporal priority is established in two ways. First, because experimental participants are randomly assigned, they should show no preexisting differences on the dependent variable; for example, they should average equivalent on aggressive behavior before the experiment. Second, the independent variable is manipulated before the dependent variable is measured. Participants first watched the video and then played the game that assessed their levels of aggression.
Rationale is provided by specifying the theory that underlies the hypothesis and making specific predictions about the particular operationalizations. Often, the theory will suggest measurable mechanisms for the effect; that is, exactly how does television influence aggression? For example, if a theory proposed that television violence desensitizes people, an experimenter could measure desensitization, along with aggression. Or if a theory proposed that television violence excites people, one could measure arousal and so forth.
Finally, the strength of well-designed experiments lies in their ability to rule out alternative explanations, that is, spurious relationships. Isolating the independent variable, the manipulation, entails being certain that it is not confounded or contaminated with another variable. In our ongoing example, the people watching the two kinds of videos are randomly assigned, so that rules out personality differences, age, gender, ethnicity, and other person variables. Moreover, they all sit in the same chairs, are instructed by the same experimenter, and so on. And the videos themselves are equivalent on possibly confounding dimensions, such as interest and excitement, so that violence is the only measurable difference between them. If the only thing that differs between the experimental and control conditions in the experiment is the violence of the video, then it is the only possible cause of their subsequent differences in aggressive play.
PROS AND CONS OF EXPERIMENTS
As mentioned earlier, experiments limit the type, timing, and duration of variables they can manipulate and measure. Because of the types of operationalizations necessary in a laboratory setting, psychology experiments can be limited in how much they generalize beyond the laboratory. Of course, parallel limitations occur in descriptive opinion polls and in correlational studies in the field. For example, one cannot necessarily generalize from a survey response to behavior in the mall or the voting booth.
Another limitation always present in laboratory experiments (but not unique to them) is the unrepresentative samples of participants. Psychology student participants are not randomly sampled at all. So the question that any experiment raises is, how much does its sample represent the larger population? The habitual research sample in social psychology, college sophomores, do have less rigid attitudes, a less fixed sense of self, stronger cognitive skills, greater respect for authority, and more fluid relationships than older adults (Sears, 1986). These differences lead some to argue that results might well be biased by relying exclusively on college student participants (Henry, 2008; see commentaries, same issue).
Every type of research has trade-offs, strengths, and weaknesses. Researchers raise several points regarding the role of experiments in regard to the “real world.” First, laboratory results on important social issues routinely replicate in field studies with more representative samples, although without the precise controls and possibility of causal inferences. Sometimes the results are stronger in the field than in the laboratory. Together, laboratory experiments and field studies converge to make strong arguments, especially in research on aggression, helping, close relationships, and discrimination.
Second, experiments are often designed as demonstrations of plausible scenarios. The experiment shows that results consistent with the hypothesis can and do occur. Experiments demonstrate that an effect can occur this way (these are sufficient conditions for it to occur) at least among some populations. Thus, one way to think about experiments, and what they do best, is that they show how a process could happen. An experiment demonstrates the sufficient conditions for A to produce B. The scientific hypothesis being tested is then supported under at least some circumstances.
Third, experimenters distinguish between mundane and psychological realism (Wilson, Aronson, & Carlsmith, 2010). Mundane realism describes how much the experimental setting resembles the comparable real-world setting. For example, people often aggress by physical assault, but for obvious reasons, this dependent measure is infeasible in the laboratory. If the experimenter allows the participants to deliver noise blasts as punishment to an insulting competitor, that would have low mundane realism. But if participants get caught up in the situation—as they do—then it has high experimental realism (involving, impactful, taken seriously). If it also re-creates in the lab the same psychological processes that occur in daily life, then it ranks high on psychological realism. Experimenters must create psychological realism (avoiding artificial processes), often desire experimental realism (involvement), and worry less about mundane realism. Novices often reverse these priorities, but thereby test theory less effectively.
Fourth, generalizability is an empirical question. If researchers can establish an effect in one population and one setting, the effect plausibly might work elsewhere too. Thus, generalizability from the laboratory to other populations and settings is a testable hypothesis: An effect found with college students may also hold for other people. One summer, Amherst, Massachusetts, hosted about 3,000 Airstream trailers and their occupants on a campus athletic field. One enterprising graduate student conducted research with those people, who differ a lot from students, just to get a novel sample. Suppose the same effect occurs for college students and for members of the Wally Byam Caravan Club International, who are mostly retired and have enough money to buy an Airstream caravan. Then this effect may well hold true for “people in general,” although it might still be limited, because people who go to college or own Airstream trailers are among the more privileged in society. So these questions of generalizability are always open, but it is an empirical question to see whether an effect generalizes. The question of external validity (or generalizability) asks whether researchers can generalize from their own sample to the population of interest. That is why social psychologists do research in the “real world” too. A random sample survey, an online sample, or a field experiment can help to generalize effects beyond the laboratory participants and setting.
Fifth, with regard to generalizing to contexts outside the laboratory, the main purpose of experiments is to infer causality, and experiments have unparalleled elegance for doing exactly that, as reviewed here and as shown throughout social psychology. Experiments are high on internal validity, which means that researchers can make a causal inference if the experiment meets the standards of a true experiment, outlined before. That is, the experiment, in and of itself, can be scientifically sound, internally valid.
EXPERIMENTAL DESIGNS AND RESULTS
Reading the rest of this book will be easier, given some basic acquaintance with types of experimental designs. The most basic type of experiment would include one independent variable with two groups, as in the violent-video example used so far: The independent variable is the violence of the video, and it has two levels, an experimental group (violent) and a control group (nonviolent). Similarly, the Stanford prison study's independent variable, social role, had two levels, to which participants were randomly assigned: prisoners and guards. And the nursing home study's independent variable had two levels, experimental (increased responsibility) and control (baseline). Our attraction-persuasion study had two levels (attractive, unattractive).
An independent variable also can have more than two levels, for example, if video violence included movies rated X, R, PG-13, and G for violence. Similarly, the independent variable attractiveness could have multiple levels; that is, the experimental conditions might include a highly attractive communicator, an average communicator, and a completely unattractive communicator. Researchers might operationalize this independent variable by selecting photographs at each level of attractiveness, as rated by independent judges. The specific prediction might be that persuasion would be highest for the attractive communicator, moderate for the average one, and lowest for the unattractive one. On the other hand, perhaps people would be most persuaded by the average communicator, who most resembles their own level of attractiveness, but they might distrust the attractive communicator and reject the unattractive one. Without three levels of the independent variable, one could not detect this inverted U–shaped effect. Using simply attractive and unattractive communicators would produce no effect. Sometimes, then, a hypothesis will specify the levels of the independent variable.
Moving up a level of complexity, let's return to the violent-video example. Suppose researchers hypothesize that watching violent videos, and having peers who approve, jointly increase aggression, but not otherwise. This hypothesis involves two independent variables (degree of video violence and peer approval). One study (Leyens, Herman, & Dunand, 1982) paired ordinarily submissive children with peers in an independent variable with three levels: Some children watched the video alone, some with a dominant peer, and some with a submissive peer. Presumably a dominant peer represented aggressive values and encouraged subsequent aggression. The second independent variable, violence of the video they watched, had two levels, neutral and violent. Children were randomly assigned to one of six conditions formed by this 2 × 3 combination.
This type of design, with two independent variables, allows researchers to examine the main effect of each independent variable in isolation from the other, that is, the effect of each variable alone, averaging over levels of the other. So the researchers could examine either the effects of the violent video, averaged over the three levels (types) of peers, or the effect of peers, averaged over levels of video violence (Table 2.9).
TABLE 2.9 Peers and Video Violence Influence Aggression
|
Hypothesis: |
|||
|
IV1: Watching violent video with IV2: Peers who approve → DV: Aggressive behavior |
|||
|
|
IV1: Video |
Peer Main Effect |
|
|
IV2: Type of Peer |
Neutral |
Violent |
(average over videos) |
|
None (watch alone) |
1.86 |
3.67 |
2.76 |
|
Dominant peer (approving violence) |
3.30 |
3.93 |
3.62 |
|
Submissive peer (not approving violence) |
3.33 |
2.25 |
2.79 |
|
Video main effect (average over peers) |
2.83 |
3.28 |
|
|
Main effects: Effect of video violence (2.83 vs. 3.28) (not statistically different); effect of peer (2.76 vs. 3.62 vs. 2.79) (statistically significant difference) |
|||
|
Interaction: Unique effect of combination (e.g., 1.86 vs. 3.93) (statistically significant) |
IV = independent variable; DV = dependent variable.
Source: Data from Leyens et al., 1982.
In addition, this design allows one to examine the interaction or interplay of the two independent variables. Recall that the hypothesis stated that video violence had no effect, unless peers approved. The effects of one variable (video violence) depend on the level of the other variable (peer approval). In that case, only some of the six possible combinations will increase aggression (Table 2.9). The three effects (two main effects and an interaction) can occur separately or together, as the table indicates.
The result of the video violence by peer approval study shows a statistically significant main effect for type of peer approval: Regardless of what they watch, peer attitudes matter. A dominant, violence-approving peer causes even submissive children to behave more aggressively (3.62) than they would alone (2.76) or with another submissive child (2.79). (Aggression scores ranged from 1 to 5, resulting from the degree to which the child later took an opportunity to interfere with the video quality of another child's cartoon show.) The other independent variable, video violence, does not show a statistically significant main effect by itself, although the averages indicate a slight difference between violent and nonviolent videos.
Further, the interaction of the two independent variables indicates that children are especially aggressive when a video is violent and their peers approve. The combination of a violent video and a dominant peer creates a peak of aggression (3.93), whereas that of a neutral video and no peer creates the least aggression (1.86). The other four combinations fall in between these two extremes.
A similar interaction occurred in the video-violence study conducted with college students. As described earlier, they watched the Karate Kid III or Gorillas in the Mist and played a reaction-time game in which the loser receives noise blasts previously set by the opponent. In addition to the results reported earlier, the violent video (independent variable) interacted with the personality trait of aggression, to produce uniquely high levels of aggression. Aggressive participants who had watched the violent video administered far higher noise than any of the remaining three combinations (Figure 2.6; Bushman, 1995).
Figure 2.6 Interaction of Trait Aggression and Video Violence on Aggression
Source: From Bushman, 1995. Copyright © 1995 by the American Psychological Association. Adapted with permission.
The concept of an interaction effect is complex and not intuitive, so let's examine another example. Going back to the attractiveness-persuasion experiment, suppose researchers hypothesize that people are persuaded by people who match their level of attractiveness. In other words, highly attractive people like highly attractive communicators and more average people like more average communicators. Researchers could measure people's attractiveness, which would be a person variable, so it would not be an experimental manipulation. Or, alternatively, researchers could try to manipulate people's temporary feelings of attractiveness by showing them flattering or unflattering pictures of themselves. In any event, the other independent variable, the communicator's attractiveness, could be manipulated simply by presenting a photograph of a highly attractive or more average communicator and assessing persuasion. People who felt attractive might prefer communicators who were attractive, and people who felt more average might prefer communicators who were more average. Notice that the hypothesis predicts an interaction only, with no main effects (participants' own attractiveness does not make them more easily persuaded, nor does the other person's attractiveness make the person inherently more or less persuasive; it is the combination that matters). Like the previous example of video violence and peer approval, the effect of each independent variable depends on the level of the other independent variable.
SUMMARY OF EXPERIMENTAL RESEARCH
Experimental research has advantages over descriptive and correlational research, as well as its own pros and cons. Experiments have independent and dependent variables, along with three crucial components: manipulation of the independent variable, random assignment to experimental condition, and control over potential third variables to prevent confounding. Experiments are designed to infer causality by establishing association between independent and dependent variables, temporal priority, theoretical rationale, and nonspurious relationships. Generalizability of experiments is an empirical question, but they must create realistic psychological processes. Experiments allow researchers to assess the main effects of each independent variable separately, as well as their interaction.
METHODOLOGICAL CHALLENGES IN SOCIAL SETTINGS
Whatever the hypotheses and whatever the research strategy (descriptive, correlational, or experimental), social psychologists face unique challenges in research methods. As in other sciences, the act of observing and measuring affects the phenomenon. The distinctive feature of most psychology is that the observer and the observed are the same species, plus the act of observation is a social phenomenon in itself. The contact between the researcher and the participant constitutes a two-way communication. For example, people adjust themselves when they know other people are watching, and observers compensate for people's efforts to manage the impressions they create.
As a social interaction between two humans, the research enterprise invokes the core social motives introduced in Chapter 1: belonging, understanding, controlling, self-enhancing, and trusting. Each motive introduces possible confounds and biases that threaten the validity of research. Social psychologists have learned to guard against these experimental artifacts, social threats to scientific accuracy.
Expectancy Effects and Motives to Belong
Research participants generally want to get along with researchers. Even if the social contact is brief, participants, especially volunteers, are motivated to be part of the temporary team formed by researcher(s) and participant(s). Wanting to belong elicits compliance, so participants are typically agreeable. Wanting to belong also invites mimicry. (Recall people imitating the nonverbal behavior and facial expressions of conversation partners.) However, too much agreeableness and too much mimicry create problems for researchers.
Participants may agreeably mimic the researcher's attitude, which can be conveyed unintentionally and nonverbally. This is a problem if the researcher's attitude reveals the hypothesis. Think back to the video-violence studies. If the researchers think watching violence makes people aggressive, they themselves may become subtly aggressive (abrupt behavior, irritated expression), causing the participants to imitate and reciprocate. Think back to the attractiveness and persuasion studies. If researchers believe their hypothesis, they may smile slightly as they observe the attractive communicators.
Expectancy effects in research consist of the researcher bringing about the expected results by inadvertently influencing participants to behave in the predicted direction. In the original instance of this phenomenon, teachers told that some of their pupils would experience an intellectual growth spurt that year and did in fact observe those students performing better by year's end. The problem is that the intellectual bloomers had been randomly selected, and the teachers' expectancies had created the effect. The interpersonal expectancy effect (Rosenthal, 1994) has been demonstrated in a variety of domains (listed from smallest to largest effects): laboratory interviews, reaction time, learning and ability, person perception, inkblot tests, everyday situations, psychophysical judgments, and especially animal learning. The mechanisms, at least for teachers advantaging certain students (and maybe even laboratory rats), are a warm attitude (climate), communicated nonverbally, and greater effort to interact with the favored few.
The discovery of expectancy effects caused a revolution in research. To this day, researchers avoid expectancy effects by remaining blind to the experimental condition of the participant until afterward (so they cannot unconsciously affect the person's behavior). Also, they verbally communicate neither the hypotheses nor the comparison conditions to the participants ahead of time, so the participants also cannot consciously or unconsciously behave as “good subjects” and produce the expected results. In short, expectancy effects can be prevented, but they are a particular concern in social-psychological research, because the participants typically want to get along socially.
Participant Construal and Motives to Understand
Another core motive, understanding, also enters the social psychology laboratory. As soon as participants arrive at this admittedly novel context, they want to make sense of it. If each participant makes sense of the situation in a unique way, then the study data will contain a lot of random error (noise) that will obscure the effects of interest (signal). For this reason, researchers provide all participants with a single understandable framework for the study.
But not just any plausible framework will do. Recall that people respond to the social situation; this is the basis of situationism, introduced in the previous chapter. Recall also that they respond to the situation as they understand it. Hence, researchers must be sensitive to the participants' perspective on the research setting. It does not matter what is objectively true, or true from the experimenter's perspective; what matters is the participants' perspective. Thus, for example, two facts might be inconsistent from the experimenter's perspective, but if they are psychologically consistent from the participants' perspective, they are consistent for the purposes of the experiment.
Understanding the participant's psychological construal of the research situation is important so that the researcher and participant have the same shared meaning. In experiments, for example, this allows experimenters to be sure that the manipulated social influences come across as intended. Getting inside the participant's head to create the intended understanding is one of the great challenges of social-psychological methods.
Demand Characteristics and Motives to Control
Participants do not wish only to understand the novel social context presented by a research setting; they also wish to have some sense of efficacy and control. From a motivational perspective, they must feel they have a choice about the role they play in research. They may follow instructions or not, figure out the hypothesis or not, confirm the hypothesis or not. Factors in the research setting (other than the independent variable) may affect their behavior, as demand characteristics (Orne, 1962): prior experience in psychological research, rumors about that study, researchers' behavior, instructions, equipment, setting, and response alternatives. Participants may respond to perceived attempts to control their behavior by cooperating or resisting, so it is important that they do not react to independent variables as the researcher's attempts to influence them. The research must be set up to minimize overly overt cues to expected behavior, a requirement that also meets participant motives for control. Participants must feel able to respond freely to the research stimuli.
Social Desirability and Motives to Self-Enhance
Participants enter the research setting with motives not only to belong, understand, and control. They also have fundamental motives to protect the self and the images of the self that they present. In the social setting of the research context, participants want to come across well. They worry about social desirability, that is, complying with the norms for responses that reflect positively on self.
Consequently, participants resist responding in ways that make them vulnerable to looking incompetent, unkind, dishonest, unfair, biased, and so on. Unfortunately, some of the most interesting human behavior is incompetent, unkind, dishonest, unfair, and biased. Much of what social psychologists study are flaws in human nature, with an eye to diminishing their effects, so sometimes we need to see participants at less than their best. How else can we study topics such as aggression and racism?
Researchers guard against social desirability biases in several ways. Most frequently, researchers honestly assure participants of the anonymity of their responses and the confidentiality of their data. In addition, researchers may bury sensitive questions in a series of less provocative questions, to minimize the impact of self-presentational concerns. Researchers may frame questions as neutrally as possible. They can also separately measure people's tendency to respond in a socially desirable way and statistically control for it in analyzing their data. Finally, researchers may measure subtle responses that escape the participants' notice or control, such as speed of response or distance of seating.
Positivity Biases and Motives to Trust
Finally, according to the positivity bias, people are predisposed to think well of other people, at least in-group others, all else being equal. This means that they will tend to rate others positively and resist negative responses—inconvenient if one develops a 3-point scale consisting of dislike, neutral, like, because virtually every participant will respond like. Even a 5-point scale will elicit a consensus around like moderately. We return to this point in the chapter on interpersonal attraction, but here the point is that researchers need to compensate for participants' tendencies to like and trust others.
Then there's liking and trusting the researcher. Participants expect the researcher to be competent, kind, honest, fair, and unbiased, and they will give the researcher the benefit of the doubt. Researchers need to justify that positivity bias in their demeanor toward participants. Part of that responsibility involves ethical behavior, covered next.
Summary of Methodological Challenges in Social Settings
Researchers beware: Participants, motivated by core social motives, may create artifacts in social-psychological research: (a) motivated by the belonging motive, they may conform to experimenter expectancy; (b) motivated to understand, they may construe the experiment in unintended ways; (c) motivated to control, they may react to perceived demands to respond as expected; (d) motivated to self-enhance, they may respond in socially desirable ways; and (e) motivated to trust, they may show strong positivity biases.
ETHICS IN RESEARCH
Social psychologists rarely deal with death as a variable, but recall the nursing home study (Rodin & Langer, 1977), in which elderly patients may have lived longer when given trivial levels of control over their lives. When the researchers first hypothesized this, should they have tried to persuade the nursing home to give houseplants and movie choices to all the patients at the outset? Should they have halted their study when the data first started coming in? At what point should they have applied their findings?
In the nursing home study, researchers could not know with any certainty that the personal control would prolong people's lives, unless they completed the study as designed. And even then, replications and improvements are needed. This points out the importance of causal inferences in making ethical decisions about people. For example, if we know that merely perceiving control over one's environment can cause better health and well-being, as in the nursing home experiment, then we understand much more about perceived control and the causes of health than if we merely know that people differ in perceived control and in health, even if we know they are correlated. If researchers terminate the experiment before it is completed, then they may not be certain that the initial results were not just a statistical fluke. This is a major issue in medical research, where desperate patients may pressure scientists to give them unproven experimental treatments, rather than allowing themselves to be randomly assigned, perhaps to the control group, which may receive the currently standard treatment.
Similar issues arise in particularly impactful social psychology studies, such as the prison study, that dramatically affect participants' well-being. Should the Stanford prison researchers instead have studied real prisoners and guards? Or should they have tried to observe or interview people in parallel situations, such as Jennifer in her prison-like National Labor Federation, the Brooklyn cult described in the opening chapter? What are some advantages and disadvantages of each of these methods? Given their extreme results, should they have stopped the study when they did? Stopping the study deprived the researchers of complete measures and full knowledge, but how does that weigh against the miserable experience of the participants? Should they never have begun the experiment in the first place?
Even in more ordinary social psychology experiments, participants may experience temporary discomfort, and researchers must obtain their consent ahead of time and allow them to withdraw from the experiment if necessary. In every case, participants incur costs, ranging from inconvenience, to boredom, to embarrassment, to temporary anxiety. These costs must be weighed against the benefits of the research to the participants (learning about themselves), to the researcher (advancing science), to the field (greater theoretical understanding), and to society (solutions to social problems). Ethical decisions about research depend on weighing the potential costs and benefits perceived by researchers and independent review boards (present in every institution that conducts research with human participants).
Ethical Dilemmas
Important variables (pain, failure, stress, fear, aggression) are often unpleasant. Most people are motivated to diminish human suffering and want research to inform those efforts, but researchers must confront these experiences directly, in order to study them. Researchers try to minimize the extent to which they study these experiences by inflicting them on people, but as chapters on aggression and discrimination will show, for instance, the choice of research method is not always simple.
What's more, in social psychology, unambiguous inferences may require that participants are temporarily unaware or deceived about the intent of the study. If participants know that the research is studying selfishness, prejudice, or conformity, they might shape their behavior to fit the socially desirable response, instead of acting spontaneously. Thus, researchers sometimes withhold information about the experiment until the end. Sometimes, they even have to mislead participants, to prevent their guessing the true nature of the study. Deception research was particularly popular during the 1950s to 1960s heyday of high-impact experimental social psychology, when some of the classic studies were conducted, but it decreased in the next two decades (Nicks, Korn, & Mainieri, 1997). Whenever deception appears as an option, researchers must consider more straightforward methods, before resorting to omitting or distorting information, and all researchers must inform participants afterwards about the true nature of the study and obtain permission to use their data before they leave.
Another ethical issue that arises in social psychology is the fear that the data may disadvantage the participants. Suppose a research project shows that one demographic or personality group on average scores lower than another on some socially valued dimension, whether competence, warmth, coordination, aggression, or spatial skills. The disadvantaged group might not have wanted to contribute to these research findings, if they are then used to justify discrimination. On the other hand, no one can censor scientific findings. And the interpretation of those findings is always open to debate. For instance, maybe group averages differ, but only a little, and the variation is so great within the groups that their distributions overlap almost completely. And average differences between, for example, the genders on aggression do not explain why they occur, as in the interactions of biology and society in producing those outcomes.
Finally, social psychologists who work with special populations—including students in psychology classes—have to consider carefully their recruitment of participants. Unintentional coercion always lurks, in both obvious and subtle forms. Recruiting students to work on the professor's own research, for extra credit, may or may not be appropriate, depending on the circumstances. Such ethical dilemmas apply even more clearly to recruiting people who are institutionalized, very young or very old, encountered in a psychological services context, or from a different culture.
Ethical Decisions
Ethical treatment of human participants in research requires that they be respected, not be harmed, and consent to participate. As noted, ethical choices depend on weighing costs and benefits. Resolving ethical conflict includes acknowledging that no absolute answers exist; reasonable people disagree. Researchers are obligated to (a) do the best research possible and (b) protect human participants. When these goals conflict, they must consult their consciences, but they can also be biased by their investment in their own projects. To counteract their bias and ensure fair consideration of the advantages and disadvantages of their research, investigators must undergo collegial review.
Participants always need to know that they have a choice about participating in the experiment, they can stop at any time, they will receive any promised compensation, their responses will be available only to appropriate audiences, and they can get any appropriate additional information. In short, they need to be treated with respect, as humans with control over how they participate in the research experience. Informed consent means that the participant's agreement to engage in the research process is contingent on having a reasonable amount of information about what that will entail.
Summary of Ethics in Research
Social-psychological research raises ethical dilemmas because it involves human participants and important social problems. Sometimes it also involves deception, uncomfortable experiences, participant disadvantages, or special populations. Researchers, with the advice of colleagues and review panels, must weigh the costs and benefits of their research methods, as part of the scientific enterprise.
CHAPTER SUMMARY
Conceptualization, the first step in research, entails forming scientific hypotheses, which come from both application and theory. Theories, broad systems of logical principles that explain or account for observed natural phenomena, adhere to certain principles: positing causal relationships and being coherent, parsimonious, and falsifiable. Derived from theory, hypotheses state relationships expected between two or more variables. Hypotheses specify variables in a conceptual version, and researchers then operationalize the hypotheses, providing working definitions of the variables in a particular research context. In operational form, studies aim to meet scientific standards of operational precision, public reproducibility, and observational accuracy (lack of bias).
Besides making tactical operational decisions about variables, social psychologists must choose an overall research strategy, which also guides how they operationalize their variables. The overall strategies involve using descriptive, correlational, or experimental research methods. Descriptive research, characteristic of polls and surveys, raises issues of random sampling for accuracy and comparison to baseline. Correlational research specifies relationships among variables, often with an implicit causal agenda, but rarely with the ability to make causal inferences. Experiments are designed precisely to infer causality, although their generalizability often remains to be tested, depending on the sample. Artifacts can affect any social research.
With this knowledge, scientists have to make ethical decisions about whether and how to run their research enterprise. Experiments, with their unique power to infer causality, can improve people's lives. Correlational data, without randomized trials, can never truly establish the certainty that a particular intervention improves a particular outcome. Public policy is best based on large-scale randomized experiments, to the extent possible, so that people may get the best we know how to provide. Donald Campbell, many years ago (1969), argued for the experimenting society that would create only those social programs that had been well supported as accomplishing what they were designed to accomplish. Even on a more individual scale, before social psychologists generally assert that something in the social situation (an attractive person, a shampoo ad, a television show, a relationship, a prison) influences people in a particular way, an experiment generally provides the best evidence.
The chapter included accounts of research on test performance, prison roles, health, drinking habits, attractiveness, racial disparities, stereotyping, and aggression, all illustrating social influences on people as individuals and as group members, social processes that the rest of the book will now take up in earnest.
SUGGESTIONS FOR FURTHER READING
1. Abelson, R. P. (1995). Statistics as principled argument. Hillsdale, NJ: Erlbaum.
2. Dunn, D. S. (2013). The practical researcher: A student guide to conducting psychological research (3rd ed.). New York: Wiley.
3. Pelham, B. W., & Blanton, H. (2012). Conducting research in psychology: Measuring the weight of smoke (4th ed.). Belmont, CA: Wadsworth.
4. Pettigrew, T. F. (1996). How to think like a social scientist. New York: Harper Collins.
5. Reis, H. T., & Gosling, S. D. (2010). Social psychological methods outside the laboratory. In S. T. Fiske, D. T. Gilbert, & G. Lindzey (Eds.), The handbook of social psychology (5th ed.). New York: Wiley.
6. Rosenthal, R. (1995). Methodology. In A. Tesser (Ed.), Advanced social psychology (pp. 17–49). New York: McGraw-Hill.
7. Sansone, C., Morf, C. C., & Panter, A. T. (2003). Sage handbook of methods in social psychology. Thousand Oaks, CA: Sage.
8. Shaughnessy, J. J., Zechmeister, E. B., & Zechmeister, J. S. (2012). Research methods in psychology (9th ed.). New York: McGraw-Hill.
9. Stangor, C. (2010). Research methods for the behavioral sciences (4th ed.). New York: Houghton Mifflin.
10. Wilson, T. D., Aronson, E., & Carlsmith, K. (2010). The art of laboratory experimentation. In S. T. Fiske, D. T. Gilbert, & G. Lindzey (Eds.), The handbook of social psychology (5th ed.). New York: Wiley.