DBA 701 4.2

anoobrosea
Chapter5.docx

CHAPTER 5

MEASUREMENT

In  Chapter 3 , we discussed conceptualization—how authors define key terms in their research. This chapter focuses on operationalization— how authors measure things. Researchers develop specific procedures for detecting social phenomena: Do students have low or high self-esteem? How popular is a politician with the public? Are spouses dividing up household chores fairly? To address these sorts of questions, researchers do not simply need definitions of concepts like self-esteem or popularity or marital inequality; they need a system for investigating how much esteem or popularity or inequality actually exists.

In keeping with the theme of this book, let’s briefly compare the “measurements” people take in everyday life to those undertaken by social scientists. Then we’ll focus on the imperfections that pervade social research.

OPERATIONALIZATION IN EVERYDAY LIFE
figure

In everyday life, people tend to measure things in casual and offhand ways. For example, my students sometimes say, “It’s freezing in this classroom!” In this case, they may have “measured” the temperature by simply sensing its effect on their skin rather than looking at the more precise reading that a thermostat might display. Or, consider how a person may eat a jalapeño and proclaim, “That’s a very spicy pepper,” based on the burning sensation on his or her tongue; in contrast, a more scientific approach would be to measure the amount of capsaicin present and then rate the pepper’s spiciness via the Scoville scale.1

People also use relatively informal procedures to assess social phenomena, such as the personalities of our friends, relatives, coworkers, and acquaintances. By casually observing our companions, we try to measure the degree to which they are compassionate, lazy, hardworking, shy, gluttonous, hard-core partiers, and so on. We look for examples and treat them as indicators—as signs that a particular quality is present. Did Mary take time to comfort a friend? Did she get “totally drunk” recently? We look for evidence and classify people according to the impressions they make on us.

As casual observers, our measurements are not very systematic or impartial. We tend to create a mental image—a prejudice of sorts—and then look for evidence that confirms our expectations. For example, many Americans (including me) watched President George W. Bush for any statement or action that indicated a low level of intelligence. Did he use improper grammar? Did he try to exit a news conference via a locked door? More recently, many Americans have scrutinized Vice President Joseph Biden’s behavior in order to discover indicators that he is gaffe prone. Did you hear that Biden referred to Obama as “articulate” and “clean”? Did you see the news story where Biden mistakenly asked someone in a wheelchair to “stand up and be recognized by the audience”?

The more indicators we can assemble, the more convinced we grow about our classifications. We may confidently tell others what we think (“Bush is so dumb—he may be the stupidest president in U.S. history”), or we may repeat our opinions to ourselves, silently or under our breath (“Gosh, that Mary is the most kindhearted person I know” or “Biden—he’s such a gaffe machine”).

As we’ll see, researchers try to do better than this, but they are far from perfect.

SCHOLARS’ MEASUREMENTS ARE (USUALLY) BETTER THAN LAYPERSONS’

Researchers collect data in a variety of ways. For example, they may ask respondents a series of questions via a survey (as your instructors do via teaching evaluations); they may examine news coverage of an issue over time, performing content analysis; they may observe social interaction (in person or via video) and analyze what they see as part of an experimental design or as part of an ethnographic project. A general research methods textbook is a good place to get an overview of the broad array of strategies social scientists use to obtain and analyze data (e.g., Babbie, 2010).

In this section, I will not weigh all the pros and cons of the various strategies that researchers employ to collect data and take measurements. Instead, I will make three simple points to support my argument that scholarly measurements—especially (but not only) those that appear in quantitative journal articles—tend to be superior to the measurement systems used in everyday life.

1)Social scientists carefully think through how best to measure something. Before and as researchers complete a study, they consider and compare different ways to gauge the degree to which a phenomenon exists. By doing a literature review, scholars read about the measurement techniques that have (or have not) worked well for scholars in the past. In the literature, researchers openly discuss and debate measurement strategies and try to improve upon past work when possible. There are even specialized journals that cater to papers that are focused on measurement and other methodological issues. Some of these outlets are interdisciplinary, such as the International Journal of Social Research Methodology; others are disciplinary, such as Psychological Methods, Sociological Methods and Research, and the International Journal of Research & Method in Education.

2)  Scholars usually attempt to use consistent, systematic measurement procedures throughout a study.

In everyday life, we might be tempted to measure intelligence one way for a president we like and another way for a president we don’t like. We may be tempted to treat a grammatical mistake as a trivial error on one occasion and as irrefutable proof of “stupidity” on another occasion. We can do this because we have not articulated a specific and fair set of procedures that will be used to identify and weigh indicators of intelligence. Researchers (usually) try to do better than that.

Let’s return to Sun et al.’s (2003) study of binge drinking for an example. Like intelligence, ordinary conversationalists may have no clear idea of how to measure a person’s level of binge drinking, which leads them to rely on haphazard observations as they classify whether or how often their companions over-imbibe. Sun et al., on the other hand, employed a specific procedure that was consistently applied to every student (more than 1,000) in their sample. They employed a survey question that had been used successfully by many researchers before them.

 

Think back over the last two weeks. How many times have you had five or more drinks2 at a sitting?

 

· None

· Once

· Twice

· 3 to 5 times

· 6 to 9 times

· 10 or more times

 

This sort of questioning does rely on respondents’ memory of and honesty about their own past behavior—both of which may be problematic. And, it includes ambiguous language, such as the term sitting. Nevertheless, a survey question such as this can be seen as a careful attempt to collect data in an objective and systematic fashion. The question is relatively simple, clear, and succinct. It treats everyone in the sample fairly without playing favorites like we might do in everyday life. It efficiently measures whether (and how often) students binge drink without taking too much time or effort on the part of researchers or respondents.

3)  Researchers tell their readers exactly how they measured something so that others can find flaws or propose better measures.

In everyday life, we may never tell others (or even be able to tell ourselves) how we are measuring something like intelligence or binge drinking. In contrast, scholars try to make their measurement choices explicit. The methods sections of journal articles often describe the exact procedures that researchers used to measure each variable; readers can thus better understand how the study was done and how it might be done better. For instance, someone might argue that a different wording would improve the question that Sun et al. (2003) used to measure binge drinking, leading to the collection of more accurate data in a subsequent study.

These three brief points do not constitute an exhaustive list, but they are enough to support my general argument: Social scientists take measurement seriously, and they should be commended for their efforts. Researchers tend to measure things more carefully, systematically, and explicitly than people do in everyday life. Don’t take my word for it—try Exercise 5.1 a few times and see for yourself.

figure
CRITIQUING MEASURES

Although researchers tend to put a great deal of thought into their measurement systems, the results are far from perfect. There is often no one best way to gauge the degree to which a social phenomenon exists. Instead, researchers must choose from a wide range of options, all with advantages and disadvantages. As a result, different researchers tend to employ different strategies to measure “the same” phenomenon.

EXERCISE 5.1

1. Find a couple standard quantitative journal articles; they can be the same ones you used in the exercises for  Chapters 3  and  4 . Take a close look at the methods sections to see if the authors describe the procedures they used to measure key concepts or variables in their studies. Then, think about how these same concepts might be measured in everyday life. Can you make an argument that the social scientists put more thought and effort into their measurement systems than ordinary people probably would?

2. Find a recent textbook on research methods at your university library, such as Babbie (2010). Read the chapter (or sections) on operationalization. Make a list of three ideas or strategies that reflect social scientists’ greater concern with careful measurement compared to the concern laypeople exhibit.

 

Imagine you want to measure your body weight to see if you are getting fatter over time. You’re a fully grown adult and aren’t getting any taller. You have choices: You might step on a bathroom scale, you might put on an old pair of pants to see how well they fit, or you might jump into a swimming pool to see how large of a splash you make on the surrounding patio. Some of these strategies may seem better than others. You might even pick one strategy and then use it repeatedly year after year to see if your weight has changed. It may seem most sensible to write down your bathroom-scale weight every January 1 to track your progress over the years.

If you switched measurement systems yearly—using the pants method one year, the swimming pool method the next, and the bathroom scale in another—then it would be more difficult to compare the results of your inquiries from year to year. Thus, a reasonable person might be tempted to stick to a single method to make the results comparable over time. This sounds great, but there’s a dilemma: What if you chose a “weak” method early on (e.g., the swimming pool method) and you wanted to make a genuine improvement in your measurement system?

You’ve arrived at a dilemma: Using a consistent measurement allows for easier comparisons across separate studies; however, changing a measurement may help improve its effectiveness. Which route do you choose?

Scholars face this dilemma all the time, and they make different judgment calls about it. Sometimes, they are content to reuse existing measures (developed by themselves or by prior scholars), while other times, they want to innovate and improve on existing measures. As researchers debate which measures are better—the new ones or the old ones—additional new-and-improved instruments continue to be created and used in journal articles, giving future scholars even more options to choose from.

The proliferation of inconsistent measures—sometimes called discontinuity— offers us at least two ways of critiquing journal articles.

First, if we realize that authors often have many options when they measure something, then we can question the choices they make. We can ask, Would their research have been stronger if a different measure had been used? What are the strengths and weaknesses of a particular measure in comparison to one or more measures that a researcher chose not to use?

Second, researchers’ use of inconsistent measures can allow us to ask deeper and more challenging questions about the utility and value of research over time: If scholars frequently use inconsistent measures, then is their research comparable and cumulative? Are different scholars studying the same thing, and can their research findings be combined into a coherent set of implications, facts, or lessons about the social world?

Let’s explore measurement discontinuity by considering the topics of binge drinking and marital equality, followed by a shorter discussion of several miscellaneous examples.

Measuring Binge Drinking

Recall Sun et al.’s (2003) measurement system: They asked respondents (via a paper-and-pencil questionnaire) to answer the question: “How many times have you had five or more drinks at a sitting?” While many social scientists have adopted similar techniques to study binge drinking, their approaches are not entirely consistent. In fact, there is a large amount of discontinuity in research on this topic.

For instance, scholars make different choices regarding the time frame. When reflecting on drinking behavior, researchers sometimes ask respondents to think about the past week, or the past month, or the past six months, or the past year (see Courtney & Polich, 2009). These are not necessarily trivial decisions; different measures can produce different results and are thus a matter of debate. One advantage of using a shorter time frame is that respondents can better remember their recent behavior; a disadvantage is that the short time period may be anomalous. If I ask, “How many times did you binge drink?” during two weeks that include spring or summer break, your answer may be much different than if I ask you about the two weeks that precede your final exams. Students’ drinking is likely to vary during different times of the year.

Scholars also differ on the number of drinks that constitute a binge. Some prefer to use “five in a sitting” (like Sun et al., 2003), but there is no consensus on that number or that phrasing. Some prefer to use four drinks for women to accommodate their lower metabolisms; some prefer to ask more precisely about drinks per hour rather than using the language of sitting or occasion; some argue that drinkers’ body weights need to be measured and taken into account (see Courtney & Polich, 2009).

Moreover, various labels and distinctions can be found in the literature reflecting different definitions and different methodological choices. Sun et al. (2003) treated all their binge-drinking respondents the same; it didn’t matter if respondents binge drank once, five times, or ten times in the past two weeks—they were all placed into the same category and then compared (statistically) to those who never binge drank. In contrast, some researchers do choose to distinguish between different kinds of binge drinkers. Read, Beattie, Chamberlain, and Merrill (2008) chose to distinguish between “lower-level binge drinkers” (e.g., males who drank five or six drinks in a sitting) and “heavy binge drinkers” (e.g., males who drank seven or more drinks). Meanwhile, Kokavec and Crowe (1999) decided to draw a line between “regular drinkers”—who consume at least ten drinks every day—and “binge drinkers”—who consume at least ten drinks but no more than two days per week.

When it comes to binge drinking, there are a plethora of strategies for measuring and parsing the phenomenon, with many more techniques created with each passing decade.3

Measuring Marital Equality

Researchers have been studying equality in marriage for several decades, but scholars have not achieved consensus regarding how best to measure the phenomenon. Many scholars have focused on sharing household labor as the most important dimension of marital equality, as opposed to power, respect, sexual relations, communication, or other factors. Yet, even among scholars who focus on labor, there is no standard set of procedures for measuring whether, or to what degree, couples are “sharing” the labor. Instead, many dozens (if not hundreds) of different procedures have been pursued.

One of the earliest and most influential attempts was Blood and Wolfe’s (1960) survey research. To measure the division of labor, Blood and Wolfe employed a questionnaire that asked respondents a series of eight “Who usually. . .?” questions: Who usually washes the dishes, does the grocery shopping, mows the lawn, keeps track of money and bills, and so on. To answer, participants could choose from five options: “husband always” (1), “husband more than wife” (2), “husband and wife exactly the same” (3), “wife more than husband” (4), and “wife always” (5). By assigning quantitative values to these responses—which I put in parentheses—and adding up the scores for each chore, Blood and Wolfe (1960) created a numerical representation of the degree of inequality in respondents’ marriages. A high score would indicate a marriage where the wife was doing more of the household labor, whereas a low score would indicate that the husband was doing more. For example, if the “wife always” did each of the eight chores, the score would be 40 (8 x 5 = 40). If the “husband always” did each chore, the score would be 8 (8 x 1 = 8).

In subsequent decades and across hundreds of journal articles and books, Blood and Wolfe’s (1960) system has been slightly tweaked, radically revised, or cast aside entirely (see Warner, 1986; Shelton & John, 1996; Harris, 2006). I’ll mention just four areas of discontinuity.

First, researchers have disagreed with Blood and Wolfe’s (1960) list of questions—perhaps dropping lawn mowing but adding other chores like disciplining the children. Thus, the original list of chores not only changes but increases or decreases in size: For example, Goldberg, Smith, and Perry-Jenkins (2012, p. 818) collected data on 27 tasks, whereas Geist and Cohen (2011, p. 835) focused on only three key tasks.

Second, many scholars have focused solely on spouses’ labor and do not allow respondents to indicate whether a third person (such as a child, relative, or hired help) sometimes or always does a particular chore (e.g., Blood & Wolfe, 1960; Hank & Jürges, 2007). Other scholars do include the third-party response option but process it differently. For example, Geist and Cohen (2011) treated tasks done by third parties as being “shared equally” between spouses—giving the husband and wife equal credit for the chore. However, Lewin-Epstein, Stier, and Braun (2006) argued that women often supervise third-party contributions to household labor, and so they chose to treat third-party tasks as “wife mostly responsible.”

Third, many scholars have followed Blood and Wolfe’s (1960) lead and collected data from only one spouse in each marriage. It is obviously cheaper and more efficient to gather data from a single spouse rather than both. However, this measurement strategy begs a question: Should one person’s interpretation be treated as an adequate portrayal of a marriage (e.g., see Safilios-Rothschild 1969)? A smaller number of scholars collect data from both husbands and wives. This seems meritorious but raises another thorny question: What should be done when there are discrepancies between husbands’ and wives’ descriptions of their marriages? Researchers respond in different ways. For example, Hank and Jürges (2007) decided to use the average of spouses’ estimates of their household labor—splitting the difference—whereas Lee and Waite (2005) kept spouses’ estimates separate.

A fourth area of discontinuity in the marital equality literature involves a (deceptively) straightforward question: What counts as close enough for a marriage to be classified as equal? It was this discontinuity that helped motivate my own research on marital equality (Harris, 2000, 2006). I noticed that some measurement systems treated husbands as “egalitarian” if they did at least 40 percent of the housework (Haas, 1980; Smith & Reid, 1986), whereas other systems drew the line at 45 percent (Hochschild, 1989, p. 282). I also noticed that some researchers set hourly cutoff points instead of using percentages. Piña and Bengtson (1993, p. 905) decided that equality had been achieved if spouses’ contributions were within seven hours per week of each other (e.g., he does 13 hours, and she does 20). In another study (Benin & Agostinelli, 1988, p. 353), researchers classified as equal those marriages where husbands and wives both did between 16 and 20 hours of housework per week. Blumstein and Schwartz (1983, pp. 144–145), meanwhile, employed a cutoff point of 11 to 20 hours per week and counted couples as egalitarian if both spouses fell within that range. As you can see, the same marriage might be classified as either equal or unequal, depending on the particular measurement system that a researcher chose to employ.

Thus, similar to research on binge drinking, many discontinuities pervade the literature on marital equality. Scholars invent measures, innovate upon them, and never settle on a single, consistent system. And while some measurement choices may in fact be better than others, I think you can see that none is perfect. Rather, scholars must select from flawed options as they attempt to be careful, systematic, and explicit about their measurement strategies.

Miscellaneous Examples

No one—including me—can claim to have read more than a tiny fraction of the vast amount of social research that scholars have produced. Nevertheless, my sense is that the subfields of binge drinking and marital equality are not unique. From what I have seen, discontinuity pervades research on most topics across the social sciences. Consider these miscellaneous examples:

 

· Researchers published 16 articles on aspiration and used 16 different measurement strategies in just a five-year period (Bonjean, Hill, & McLemore, 1965).

· By the late 1980s, social scientists had developed more than 200 ways to measure self-esteem (Scheff, Retzinger, & Ryan, 1989; see also Blascovich & Tomaka, 1991).

· Researchers have created hundreds of instruments for measuring the concept of quality of life (Gladis, Gosch, Dishuk, & Crits-Christoph, 1999).

· Twenty-five years after the initial development of a widely used measure of gender identity—the Bem Sex Role Inventory—researchers had not settled on a particular set of questions or a consistent procedure for scoring respondents’ answers (Hoffman & Borders, 2001).

· Prejudice has been measured in a myriad of ways, each with strengths and weaknesses, giving researchers a large number of diverse options to choose from (Olson, 2009).

 

Still, don’t simply accept my argument on faith. Discontinuity may not be universal—and, it is likely to be more pronounced in some areas of research compared to others. I encourage you to complete Exercise 5.2 and see how much discontinuity (if any) exists in whatever subfields interest you the most.

Remember, the goal of this book is not for you to simply memorize a set of facts or opinions. Rather, I hope to instill in you a healthy respect for social science research while at the same time providing you a set of concepts and questions that can be used to find the imperfections that pervade most journal articles.

figure
CONCLUSION

I started this chapter by arguing that laypersons are not very careful, systematic, or explicit in how they measure things. People draw inferences about what someone is like or what is going on based on very casual observations. Ironically, this enables us to say things like “President Bush is so dumb” while using measurement systems that are, arguably, pretty feeble.

In contrast, I argued, social scientists do better. They think through their measurement strategies and purposefully employ a particular set of procedures. They read about the instruments that have been used in prior research and consider whether to adopt or adapt these techniques. Researchers inform their readers exactly how key variables were measured so that others can learn from their successes and mistakes.

Unfortunately, there is rarely one best way to measure something. Instead, there are usually many options, each with advantages and disadvantages, and scholars often don’t achieve consensus regarding what set of operational procedures will best gauge whether, and to what degree, a phenomenon exists. Thus, measures can be criticized for being weaker than or incomparable to the available alternatives.

Once again, this chapter has affirmed the tone and thesis of my book: Journal articles should be read with due respect for the time, effort, and expertise that went into them. Yet, articles should be approached with a healthy dose of skepticism, as virtually any piece of research can be critiqued for the methodological choices that its authors made. Exercise 5.2 offers a set of questions that may help you get started practicing the ideas from this chapter.

EXERCISE 5.2

Find an article that measures a phenomenon you are interested in. Self-esteem, quality of life, and prejudice are options, or you might use an article you found for the exercises in Chapters 3 or 4.

 

1. What do you think of the measurement strategy the authors used? Do you see any obvious strengths or weaknesses? Can you argue that the authors could have done something different or better? Do the authors themselves mention measurement choices they considered but decided not to use—and, if so, what are the pros and cons of the various alternatives?

2. For a challenging scavenger hunt, find additional articles that measure the same phenomenon. How much, if at all, do the authors’ measurement strategies differ? Are there potential advantages or disadvantages to the different measures that the different researchers used?

It could be challenging for a beginner to find multiple articles that measure the same phenomenon. You will have to search (with guidance from your instructor or librarian, if needed) a database such as Sociological Abstracts or PsycINFO, as I mentioned in earlier chapters. One potentially time-saving strategy might be to locate a review article that summarizes years of research on a topic—these kinds of papers often compare the ways different researchers have measured the same thing.