Interpreting Data and Quantitative Fluency: Charts One and Two

profileSuccess50
WK3READINGONLY1.pdf

,i." Define a

My research methods class has already helped me. I am taking a human development class and u)e are writing annotations and a syn- thesis for three different studies and it's so easy to understand all the studies and data posted from SPSSI I'm euenhelping out a few class- mates so they can include information in their annotations that thE didn' t understand b efor e,

Emma T., Stu dent

a ubstance abuse is a social problem of remarkable proportions. About 18 \, million Americans have an alcohol use disorder (Grant et aL.2004; Hasin, Stinson, Ogburn, and Grant 2007 ; National Instirute on Ncohol Abuse and AIco- holism hTIAAAI 2018), and about 80,000 die every year from alcohol-related causes O{IAAA 2018). While in college, four out of ten students binge drink ftVechsler et aL.2002), and about one out of three could be diagnosed as alcohol abusers (Knight et aL. 2002). Drinking is a factor in almost half of on-campus sexual assaults (Sinozich and Langton20L+), and almost one in four victims of violence in the general population perceive their attackers to have been under the influence of drugs and/or alcohol. And finally, almost half of jail inmates report having alcohol dependence or abuse problems (Karberg and James 2005). Nl told, the annual costs of prevention and treatrment for alcohol and drug abuse exceed $340 billion in the United States (Miller and Hendrie 2008). Across the globe, alcohol misuse results in about}.5 million deaths annually OVorld Health Organi zatton [WHO) 201 3 ).

Within all of these facts, we have presented several concepts, including alcobol, college students, and ahohol dependence. While we all have our own ideas about what these concepts mean, do we all have the same idea in mind when we hear these terms? For exampl q are community colleges classified within the term college? How is alcohol abuse drfferent from dependence?

Whether your goal is to examine the factors related to criminal offending, to deliver useful services, or to design effective social policies, at some point, you will probably need to read the research literature on substance abuse. Every time you begin to review or design relevant research, you will have to answer two questions: The first concerns concepnaLization: "What is meant by sub- stance abuse in this research? " The second concerns measurement: "FIow was substance abuse measured?" Both questions must be answered to evaluate the validity of substance abuse research. You cannot make sense of the results of a study until you know how the concepts were defined and measured. Nor are you

CONCEPTUALIZATION AND MEASUREMENT

ready to begin a research project until you have defined your concepts and constructed valid measures of them. Measurement validity is essential to successful research; in fact, without valid measures, it is fruitless to attempt to achieve the other two aspects of validity: causal validity (see Chapter 6) and generalizabtliry (see Chapter 5).

In this chapter, we first address the issue of concep- anlization, using substance abuse and related concepts as examples. We also provide examples of the conceptualtza- tion process for other terms such as street gangs and inmnte miscondua.We then focus on measurement, reviewing first how measures of substance abuse have been constructed using available data (i.e., arrest data), questions on surveys, observa- tions, and less direct and unobtrusive measures. Then we explain how to assess the validity and reliability of these measures. The final topic is the level of measurement reflected in dif- ferent measures. By chapter's end, you should have a good understanding of measurement, the first of the three legs on which a research project's validity rests.

CONCEPTS

Every concept requires an explicit definition before it is used in research, because we cannot otherwise be certain tllat all readers will share the same definition. It is even more important to define concepts that are somewhat abstract or unfamiliar. When we refer to concepts such as ploerty, social control, or strain, we cannot be certain that others know exacdy what we mean.

Many high school and college students have become familiar with the term binge drink- ingbutyou may be surprised to learn t-hat even researchers do not agree on how to measure it. The definition that Henry Wechsler et al. (2002) used is "heavy episodic drinking"; more specifically, "we defined binge drinking as the consumption of at least 5 drinks in a row for men or 4 drinks in a row for women during the 2 weeks before completion of the question- naire" (205). While this definition is widely accepted among social researchers, the NIAAA (College Alcohol Study 2008) provides a more precise definition: 'A pattern of drinking alco- hol that brings blood alcohol concentration to 0.08 grams percent or above." Most research- ers consider the so-called 5/4 definition (5 drinls for men, 4 for women) to be a reasonable approximation to this more precise definition. We cant say either of these definitions is cor- rect or even that one is better. IIowever, ifwe were to conduct research on the topic ofbinge drinking, we would need to specifi, what we mean when we use the term and be sure that others know that definition. And ofcourse, the definition has to be useful for our purposes: A definition based solely on blood alcohol concentration will not be usefirl if we are not taking blood measures.

We call binge drinking a concept-a mental image that summarizes a set of similar observations, feelings, or ideas. To make that concept usefirl in research (and even in ordinary discourse), we have to define it. Many concepts are used in weryday discourse without consis- tent definition; sometimes definitions of concepts are themselves the object of intense debate, and the meanings of concepts may change over time.

Conceptualization in Practice If we are to do an adequate job of conceptualization, we must do more than think up some definition-any definition-for our concepts. We may need to distinguish subconcepts (diruensions) of the concept. We also should ask how the concept's definition fits within the theoretical framework g"rding the research and what assumptions underlie this framework.

Concept:

h m*ntai lffiafio l.lia!

*fi{nffifr{iz** n **t *i *inillar *h**ru a';i* il $, ff; -# | i r: g s, *" irl*a*,

Conceptualization:

Tht pr*r*ss *f si:r*ifyinfi rzulia1 ri'r* ?ii*'(\rt l:,; a1*rn, I n rl *rl u ct i,t * r ****r r,lt, t#r:te ptil tslizati*n ire lps

lran*lai.* i)*{tit}itt., *t art a**trc,*t th*rsrrl i nt* trsta h I c

It'fprrth *sr * in'r rst tiirl il s trJ* il i f ] r ,; al i xlsl x x, I n i rt rJu *i.ru * { *$*rLi *h, * * n * * pl I saliiat"i *n

i:t an impn*rlztrJ, parl r*i ilie

rsl r *lalr:rj ****r,t ati*rt*.

CHAPTER 4 o CONCEPTUALIZAT ION AND MEASUREMENT 87

CASE STUDY

DefiningYouth Gangs Do you have a clear image in mind when you hear rhe term youth gangs? Nthorgh this is a very ordinary term, social scientists' attempts to define precisely the concept,yoath gang, have not yet succeeded: "Neither gang researchers nor law enforcement agencies can agree onacoillmondefinition...andaconcertednationaleffort...failedtoreachaconsensus" (Ilowell 2003,7 5). Exhibit 4.1 lists a few of the many alternative definitions of youth gangs.

What is the basis of this conceptual difficulty? Howell Q003,27-28) suggests that defin- ing the tennylath gangshas been difficult for four reasons:

Youth gangs are not particularly cohesive.

Individual gangs change their focus over time.

Many have a "hodgepodge of features," with diverse members and unclear rules.

There are many incorrect but popular myths about yourh gangs.

In addition, youth gangs are only one type of social group, and it is important to define youth gangs in a way that distinguishes them from other types of groups (childhood play groups, youth subculture groups, delinquent groups, and adult criminal organizations). You can think of so cial group as a broader concept that has multiple dimensions, one of which may be a youth gang. In the same rvay, you can think of sabstance abuse as a concept with three dimensions: alcohol abuse, drug abuse, and polysubstance abuse. Whenever you define a

1.

2.

3.

4.

*

ffi$$ffiffi{

f frffiffiffi lltiifrffiXffiufijiiiiH$.i

ffi1H HlffiE#Hi u'[.uu#,.ff.ruffi'H.....t.,n,aE,axn.i,,.

88 SECTION II . FUNDAMENTALS OF RESEARCH

Exhibit 4.1. Alternative Definitions of Youth Gangs

concept, you need to consider whether the concept is unidimensional or multidimensional. If it is multidimensional, your job of conceptualization is not complete until you have specified the related subconcepts that belong under the umbrella of the larger concept. And finally, the conceptyou define must capture an idea that is distinctly separate from related ideas.

CASE STUDY

Defi ning Substance Abuse

What observations or images should we associate with the concept substance abwe? Some- one leaning against a building with a liquor bottle, barely able to speak coherendy? College students drinking heavily at a parLy and passing out? Someone in an Alcoholics Anonymous group drinking one beer? A 12-year-old boy drinking a small glass of wine in an alley? A t2-yearold boy drinking a small glass of wine at the dinner table in France? Do all these images have something in common that we should use as a definition of substance abuse for the purposes of a particular research study? Do some of them? Should we take into account cultural differences? Social situations? Physical tolerance for alcohol? Individual standards?

MarD, researchers now use the definition of sabstance abuse contairted in the American Psy- chiatric Association's Q000) Diagnostic and. Statistical Manual of Mmtal Disorderc, Text RnNun (DSM-IV-TR): "a maladaptive pattern of substance use manifested by recurrent and significant adverse consequences related to the repeated use ofsubstances . . . musthave occurred repeatedly during the same l2-month period or been persistenr" (DSM-IV-T& Subsance Abuse Features section, 198). But, despite its popularity among professionals, we cannot judge the DSM-IV-TR definition ofsubstance abuse as correct or incorrect. Each researcher has the right to conceptual- ize as he or she sees fit. Howeve! we can say that the DSM-I2-TR definition of substance abuse is useftrl, pardy because it has been widely adopted. It is also stated in a clear and precise language that minimizes differences in interpretation and maximizes understanding.

One caution is in order. The definition of any one concept rests on a shared understanding of the terms used in the definition. So, if our audience does not already have a shared understanding of terms such as adeqaate socialfunctioning,self-carefunctioning,xtdrEeatedwe,we must also define these terms before we are finished with the process of defining substance abuse.

CASE STUDY

Defining Poverty

Poverty is a very important variable in criminological research because of its relationship with many forms of offending and victimizaion,both at the aggregate level (i.e., city or state) and at the individual level. Decisions about how to define the concept of poaerty has always been somewhat controversial, however, because different notions of whai porrirty is shape estimates of how prevalent it is and what can be done about it.

Most of the statistics that you see in the newspaper about the poverty rate reflect a concep- tion of poverty that was formalized by Mollie Orshansky of the Social Security Administration in 1965 and subsequendy adopted by the federal government and many researchers @utram 1977). She defined poverty as an absolute standard, based on the amount of money required to purchase an emergency diet that is estimated to be nutritionally adequate for about two months. The idea is that people are truly poor if they can barely purchase the food they need and other essential goods.This poverty sandard is adjusted for household size and composition (number of children and adults), and the minimal amount needed for food is multiplied by three, because a 1 95 5 survey indicated that poor famiJies spend about one third of their incomes on food (Orshans$ 1977).

CHAPTER 4 o CONCEPTUALIZAT ION AF.ID MEASUREMENT 89

Does this sound straightforward? As is often the case with important concepts, the meaning of an absolute poverty standard has been the focus of a vigorous debate (Eckholm 2006). Although the traditional definition of absolute poverty accounts only for a family's cash income, some arg'ue that noncash benefits that low-income people can receive, such as food stamps, housing subsidies, and tax rebates, should be added to cash income before the level of poverty is calculated. Douglas Besharov of the American Enterprise Institute terms this approach "a much needed corrective" @ckholm 2006, A8). But some social scientists have proposed increasing the absolute standard for poverty so that it reflec* what a low-income family must spend to maintain a "socially acceptable sandard of living" that allows for a tele- phone, house repairs, and decent clothes (LJchitelle 1999). A new "multidimensional poverty index" (MPI) to aid international comparisons considers absolute deprivations in health, edu- cation, and living standards (Alkire, Roche, Santos, and Seth 2011). Others argue that the per- sistence of poverty should be considered, so someone who is poor for no more than ayea4for example, is distinguished from someone who is poor for manyyears @alker,Tomlinson, and Williams 2010). Any change in the definition of poverty will change eligibility for govern- ment benefits such as food stamps and Medicaid, so the feelings about this concept run deep.

Some social scientists disagree altogether with the absolute sandard and have instead urged adoption of a relatiae poverty standard (see Exhibit 4.2).They identify the poor as those in the lowest fifth or tenth of the income distribution or as those having some fraction of the average income. The idea behind this relative conception is that poverty should be defined in terms of what is normal in a given society at a particular time. "For example, while a car may be a luxury in some poor countries, in a county where most families own cars and public transportation is inadequate, a car is a basic necessity for finding and commuting to work" (Mayrl et al. 2004, 10).

Some social scientists prefer yet another conception of poverty. With the sabjeafue approach, pooerty is defined as what people think would be the minimal income they need to make ends meet. Of course, many have argued that this approach is influenced too much by the different standards that people use to estimate what they "need" (Ruggles 1990). There is a parallel debate about the concept of subjectioe utell-being,which is now measured annually with responses (on a l0-point scale) to four questions in the United Kingdom by its Office of National Statistics (Venkatapuram 2013).The four questions are (Venkatapuram 2013,9):

Overall, how satisfied are you with your life nowadays?

Overall, to what extent do you feel the things you do in your life are worthwhile?

Overall, how hrppy did you feel yesterday?

Overall, how anxious did you feel yesterday?

Which do you think is the most reasonable approach to defining poverty: some type of absolute standard, a relative standard, or a subjective standard? Be careful here: Conceptualization has consequencesl Research using the standard absolute concept ofpoverty indicated that the percentage ofAmericans in poverty declined by 1.7% in the 1990s, but use of a relative concept of poverty led to the conclusion that poverty increased by 2.7% (Mry.l et aL.2004). No matter which conceptuakzaton we decide to adopt, our understanding of the concept ofpoverty will be sharpened after we consider these alternative definitions.

From Concepts to Variables: Measurement Operations

After defining the concepts for a study, we can identifi,variables corresponding to the concepts and develop procedures to measure them. Recall that a variable is a characteristic or property that can vary (e.g., religion, alcohol use, victimization). This is an imporunt step. Consider the concept of social control, which Black (1984) defines as all of the processes by which people

SECTION ll r FUNDAMENTALS OF RESEARCH

1.

2.

3.

4.

90

Absolute Standard

$200,000

$190,000

$160,000

$140,000

$120,000

$100,000

$80,000

$60,000

$40,000

$20,000

$o

Relative Standard Mean Household Income by Quartile:1967 to 2009

$170,844

$79,694

$49,534

$29,257

$11,552

""Sd,t dP "d," d," {,$" ee

""." ""d {r"- ed "S "pf .r.e "f"

Highest Fifth ----, Fourth Fifth - - Thid Fifth -

Second Fifth

- Lowest Fifth

Source: Based onViovanniVecchi, Universita di Roma "TorVergata," Poverty Lines. Bosnia and Herzegovinia Poverty Analysis V/orkshop, September 77 -27, 2OO7 .

define and respond to deviant behavior. What variables do you think represent this conceptu- alization of social control? Proportion of persons amested in a community? Average length of sentences for crimes? Types of bystander reactions to public intoxication? Some combination of these?

Although we must proceed carefrrlly to speci!, what we mean by a concept such ts social control, some concepts are represented well by the specific variables in the study and need not be defined so carefully. We may define binge drinking as heary episodic drinking and measure it, as a variable, by asking people how many drinl,m they consumed in succession during one drinking episode (see Wechsleq Davenport, Dowdall, Moeykens, and Castillo 1994). That is pretty straightforward.

As we will see, some concepts are not so easy to measure, however. The goal is to devise mea- srrement procedures that actually measure the concepts we intend to measure-in other words, to achieve rileasaren ent oalidity. The operationalization process specifies the operations that will indicate the value ofa variable for each case. Exhibit 4.3 represens the operationalization

0perationalization:

Th* yr*r:css *f slre r:ifyiil# lh* *p*rati*ns that yrill

m*ital* ll.t* 'ialu* *f ?t ltariabl* t*r *rttlt *iLl,*,

91

Subjective Standard

CHAPTER 4 O CONCEPTU ALIZATION AND MEASUREMENT

Exhibit 4.2 Absolute, Relative, and Subjective Povertlr Standards

Binge drinking

Soc i.al,: cl,aS:s

0perational defi nition:

Th* s*t r.rf rul*s and ilp* rfrti fi n.g us*rJ t0 {inr} ths

value r:f *as*s rn a variable ,

lndicator:

The rluesiir.rn rr other *reratirn ilscd tr indicate thr value tlf ca$ss r:ir ;i '*ariable.

"How often within the past two weeks did you consume,flve or more drinks containing alcohol in a row?"

lncome + education + prestige

process in three studies. Researchers must provide anoperational definition, which includes what is measured, how it was measured, and the rules used to assign a value to what is observed and to interpret the value. For example, one researcher defines her concept (binge drinking) and chooses one variable (frequency ofheavy episodic drinking) to represent it. This variable is then measured with responses to a single question (or inditanr): "fIow often within the past two weela did you consume five or more drinks containing alcohol in a row?" Another researcher defines her concept(pwerty) as having two aspects or dimensions, subjective poverty and absolute poverty. Subjective poverty is measured with responses to a survey question: "Would you say you are poor?" Absolute poverty is measured by comparing family income to the poverty tlreshold. A third researcher decides that her concept (ncial class) can be indicated with three measured variables: income, education, and occupational prestige. The values of these three variables for each case studied are then combined into a singleindicator.

Good conceptualization and operationalization can prevent confusion later in the research process. For example, a researcher may find that substance abusers who join a self-help group are less likely to drink again than those who receive hospial-based substance abuse treaunent. But what is it about these treatrnent alternatives that is associated with successful abstinence? Level of peer support? Beliefs about the causes of alcoholism? Financial investrnent in the treatrnent? If the researcher had considered such aspecrc of the concept of subsance abuse ueaunent before collecting her data, she might have been able to measure different elements of treatrnent and then identify which, if any, were associated with differences in abstinence rates. Because she did not measure these variables, she will not contribute as much as she might have to our understanding of substance abuse treatrnent.

Social researchers have many options for operationrkzing their concepts. Measures can be based on activities as diverse as asking people questions, reading judicial opinions, observ- ing social interactions, coding words in books, checking census data, enumerating the contents of trash receptacles, or drawing urine and blood samples. We focus here on the operations of using published data, asking questions, observing behavior, and using unobtrusive means of measuring people's behavior and attitudes.

Using Available Data

Government reports are rich and readily accessible sources of criminal justice data, as are datasets available from nonprofit advocacy groups, university researchers, and some private businesses. For example, law enforcement and health satistics provide several community-level indicators of substance abuse (Gruenewald,Ti'eno,Thff, and l{Lruner 1997). Statistics on arrests for the sale and possession of drugs, drunk driving affests, and liquor

92 SECTION ll . FUNDAMENTALS OF RESEARCH

Exhibit 4.3 Goncepts, Variables, and Indicators

law violations (such as sales to minors) can usually be obtained on an annual basis (and often quarterly) from local police departrnents or state crime information centers.

Still, indicators such as these cannot be compared across communities or over time without a carefrrl review of how they were constructed in each community. The level of alcohol in the blood that is legally required to establish intoxication c:m vary among communities, creating the appearance ofdifferent rates ofsubsance abuse even though drinking and driving practices may be identical. Enforcement practices cxm vary among police jurisdictions and over time (Gruenewald et d,. 1997). We also crrnnot assume that available daa are accurate, even when they appear to measure the concept in which we are interested in a way that is consistent across communities. "Official" counts of homeless persons have been notoriously unreliable because of the difficulty of locating homeless persons on the strees, and government agencies have, at times, resorted to "guesstimates" by service providers @ossi 1989). Even available data for zuch seemingly straighdorward measures as cause of death can contain a surprising amount of error. For enample, between 30o/" and 40o/" of death certificates incorrecdy identifi, the cause of death (Altrnan 1 998).

Government statistics that are generated through a central agency such as the U.S. Bureau of the Census are often of high quality, but caution is warranted when using ofE- cial data collected by local levels of government. For example, the Uniform Crime Reports (tlCR) program administered by the Federal Bureau of Investigation @BI) imposes standard classification criteria, with explicit guidelines and regular training at the local level, but data are still inconsistent for many crimes. Consider only a few of the many sources of inconsis- tency between jurisdictions: variation in the classification of forcible rape cases due to differ- ences in what is considered to be "carnal lnowledge of a female"; different decisions about what is considered "more than necessary force" in the definition of strong-arm robberies; and, whether offenses in which threats were made but no physical injury occurred are classified as aggraaated. or siruple assaults (Mosher, Miethe, and Phillips 2002). A new National Inci- dent-Based Reporting System G\IIBRS) corrects some of the problems with the UCR, but it requires much more raining and documentation and has not yet been widely used (Mosher et aL.2002). Moreover, as we learned in Chapter 1, both of these systems rely on victims to report their experiences to police in order to be counted.

In some cases, problems with an available indicator can be lessened by selecting a more precise indicator. For example, the number of single-vehicle nighttime crashes, whether fatal or not, is a more specific indicator of the frequency of drinking and driving than the number of single-vehiele fatal accidents alone (Gruenewald et al. 1997). Focusing on a different level of aggegation may also improve data quality, because procedures for data collection may dif- fer among cities, counties, states, and so on (Gruenewald eta,l. 1997).It is only after such fac- tors as legal standards, enforcement practices, and measurement procedures have been aken into account that comparisons among communities become credible,

Gonstructing Questions Asking people questions is the most common and probably the most versatile operation for measuring social variables. Most concepts about individuals can be defined in such a way that measurement with one or more questions becomes an option. In this section, we introduce some options for writing single questions; in Chapter 8, we explain why single questions can be inadequate measures of some concepts and then we examine approaches that rely on mul- tiple questions to measure a concept.

Measuring variables with single questions is very popular. Public opinion polls based on answers to single questions are reported frequendy in newspaper articles and TV newscasts: "Do you favor or oppose U.S. policy in . . . ?" "If you had to vote today, for which candidate would you vote?" Criminal justice surveys also rely on single questions to measure many vari- ables: "Overall, how satisfied are you with the police in your community?" "flow would you rate your current level ofsafety?"

CHAPTER 4 . CONCEPTUALIZAT ION AND MEASUREMENT 93

CIosed-ended (fixed-choice)

questions:

,$r: rvey il ilesli*ns pr*r,id i n g

;sr *l *r natlai.*d r*, po r: so

*h*i*rs trsr th* r**ficn***t tt n;ircl* i':r *h*ck,

0pen-ended questions:

#*r v *t! rSu'**ti *** kt,uhi t:h tit* r**st*ndcnt re plies in hi*

*r h*r *',ryn l,;*rds, tiihe r hy uritirrU cr hy t'*lkirzu,

Systemic social

observation (SS0):

i\ r:'*r*t* l inrlhr:d of

rsfu * * r v i rs u trsh * n rsrll # li *1,

Single questions can be designed with or without explicit response choices. The question that follows is a closed-ended (fixed-choice) question, because respondents are offered explicit responses to choose from. It has been selected from the Core Alcohol and Drug Survey distributed by the Core Institute (1994) at Southern Illinois University for the Fund for the Improvement of Postsecondary Education (FIPSE) Core Analysis Grantee Group @resley, Meilman, and Lyerla 1994).

Compared to other campuses with which you are familiar, this campus's use of alcohol is...(Markone)

-

Greater than [that of] other campuses

-

Less than [that ofl other campuses

-About

the same as [that of] other campuses

Response choices should be mutually exclusive and exhaustive, so every respondent can find one and only one choice that applies to him or her (unless the question includes a check all tbat apply option). To make response choices exhaustive, researchers may need to offer at least one option with room for ambiguity. For example, a questionnaire asking college students to indicate their school status shorld,notwefreshman,sophomore,junior,smior,and graduate smdmt as the only response choices. Most campuses also have students in a "special" category so you might add otber (please specify) to the five 6xed responses to this question. If respondents do not find a response option that corresponds to their answer to the question, they may skip the ques- tion entirely or choose a response option that does not indicate what they are really thinking.

Most zuwqzs of a large number of people contain primarily fixed-choice questions, which are easy to process with computers aind analyze with satistics. With fixed-choice questions, respondenm are also more Iikely to answer the question that the researcher really wants them to answer krcluding response choices reduces ambiguity and makes it easier for respondents to answer. Howwer, fixed-response choices can obscure what people really think if the choices do not match the range of possible responses to the question; many studies shorr that some reqpondents will choose response choices that do notapplyto them simplyto give some sortofanswer @eterson 2000). We will discuss question wording and response options in greater detail in Chapter 8.

Open-ended questions-namely, questions that have not previously been used in sur- veys, questions that are asked ofnew groups, and questions without explicit response choices, to which respondents write in their answers-are preferable vrhen the range of responses cannot adequately be anticipated. Open-ended questions can also lessen confusion about the meaning of responses involving complex concepts. The next question is an open-ended ver- sion of the earlier fixed-choice question: How would you say alcohol use on this campus compares to that on other campuses?

Making Observations Observations can be used to measure characteristics of individuals, events, and places. The observations may be the primary form of measurement in a study or they may supplement measures obtained through questioning. Reiss (1971) developed a carefrrl method of observing phenomena that he termed systematic social observation (SSO). In his classic study of police interaction vrith the public, Reiss's SSO method involved riding in police squad cars, observing police--citizen interactions, and recording features of these characteristics on a form.

Sampson and Raudenbush (1999) and St.Jean (2007) used direct observation (and other techniques) in their studies of neighborhood disorder and crime in addition to SSO. Teams drove in "a sport utility vehicle at a rate of 5 miles per hour down every street" in a sample of Chicago neighborhoods. On both sides of the vehicle, video cameras recorded activities,

94 SECTION !I . FUNDAMENTALS OF RESEARCH

while a trained observer completed a log for each block. Sampson and Raudenbush's classic research (1999) resulte d in 23,816 observer logs containing information about building con- ditions and land use, while the videotapes were coded to measure features of steets, buildings, businesses, and social interaction on l5,l4l blocls. Direct observation is often the method of choice for measuring behavior in natural settings, as long as it is possible to make t-he requisite observations.

Collecting Unobtrusive Measures

Unobtrusive measures allow us to collect data about individuals or groups without their directknowledge orparticipation.In their classic book (now revised),Webb et al. (1966 t20001) identified four types of unobtrusive measures: physical trace evidence, archives (available daa), simple observation, and contrived observation (using hidden recording hardware or manipulation to elicit a response). Let us consider the first two types in more deail: physical trace evidence and archives.

Unohtrusive measures:

I'l*asil res that ali*,it

r*searilhors t* r:*lle rt data ahcui inrJiviriuals or

fr{{}#trt* '-vith*ut the ir dir**i knoivi*d11r *r p,trticipatir:ir,

CHAPTER 4 . CONCEPTUALIZAT ION AND MEASUREMENT 95

Ryan Gharles Meldrum, PhD., Assistant Professor of GriminalJustice, Florida International Universityr

Source: Courtesy of Ryan Charles Meldrum

Ryan Meldrum's research focuses on the causes of juvenile delinquency. His path to becoming a delinquency researcher began when, out of sheer curiosity, he took a class on juvenile delinquency as an undergraduate student at Oregon State University. Having grown up in a small farming

town in rural Oregon, Meldrum became interested in understanding why some teenagers would engage in delinquent and criminal behavior. His interest in mak- ing a career out of studying this topic was solidified during his graduate studies at Florida State University.

One of lMeldrum's main areas of research con- cerns the measurement and operationalization of associating with delinquent peers. Traditionally, this construct is measured by having survey respondents report on the delinquent behavior of their'friends. However, because of concerns over the accuracy of such reports, researchers are increasingly makirg use of reports of peer delinquency based on social- networking measurement strategies. With this ryea- surement strateg), respondents are first asked to report who their friends are, and then researchers

obtain self-reports of delinquency from those friends directly, bypassirg concerns that someone might inaccurately recall and report on the behavior of their friends. Meldrum's research, and that of others exploring this topic, demonstrates how the empiri- cal significance of peer delinquency for understand- ing individual involvement in delinquency may have been exaggerated in past studies relying solely on respondent perceptions of peer behavior.

His advice for students interested in a similar career is this:

Think like a researcher. One of the most challenging things for me early on was tran- sitioning from a student who was responsi- ble for consuming knowledge to a professor who was expected to produce new knowl- edge. What worked for me? I started to listen to news stories and rea,C research with an eye toward thinking of new questions that needed to be answered or how a study could be conducted on the topic I was learning about. One day during my third year in grad- uate school, the light bulb went off in my head, the floodgates of ideas opened, and I have been working ever since to answer questions related to the causes of juvenile delinquency.

2a

l.l

ta

The phlrsical traces of past behavior are one type of unobtrusive measure that is most usefrrl when the behavior ofinterest cannot be direcdy observed (perhaps because it is hidden or occured in the past) and has not been recorded in a source of available dau. To mea.sure the prwalence of drinking in college dorms or fraternity houses, we might count the number of empty botdes of alcoholic bwerages in the surrounding Dumpsters. Flowwer, you crm probably see that care must be aken to dwelop trace measures that are usefirl for comparative purposes. For instance, comparison of the number of emptybotdes in Dumpsters outside different dorms can be mislead- ing; at the very leasg you would need to take into account the number of residens in the dorms, the time since the last trash collection, and the accessibility of each Dumpster to passersby.

IJnobtrusive measures can also be created from such diverse forms ofmedia as newspaper archives or magazine articles, TV or radio talk shows, legal opinions, historical documents, personal letters, or e-mail messages. An investigation of the drinking climate on campuses might include a count of the amount of space devoted to ads for alcoholic beverages in a sam- ple of issues of the student newspaper. Campus publications also might be coded to indicate the number of times that statements discouraging substance abuse appear. With this tool, you could measure the frequency of articles reporting subsance abuse-related crimes, the degree of approval of drinking expressed in TV shows or songs, or the relationship between region of the country and amount of space devoted in the print media to alcohol consumption.

Combining Measurement Operations Using available data, asking questions, making observations, and using unobtrusive indicators are interrelated measurement tools, each of which may include or be supplemented by the others. From people's ansfiers to survey questions, the U.S. Bureau of the Census dwelops

96 SECTION II O FUNDAMENTALS OF RESEARCH

widely consulted reports containing data on people, fi.rms, and geographic units in the United States. Data from employee surve)re may be supplemented by information available in company records. Interviewers may record observations about those whom they question. Researchers may use insights gleaned from questioning participants to make sense of the social interac- tion they have observed. Unobtrusive indicators can be used to evaluate the honesty ofsurvey responses.

Questioning can be a particularly poor approach for measuring behaviors that are very socially desirable, such as voting or attending church, or those that are socially stigmatized or illegal, such as abusing alcohol or drugs. tiangulation, as we saw in Chapter 1, can strengthen measurement considerably @rewer and Hunter 1989). When we achieve similar results with different measures of the same variable, particularly when they are based on such differ- ent methods as survey questions and field-based observations, we can be more confident in the validity of each measure. If results diverge with different measures, it may indicate that one or more of these measures are influenced by more measurement error than we can tolerate. Divergence between measures could also indicate that they actually operationalize different concepts. An interesting example of this interpretation of divergent results comes from research on crime. Official crime statistics indicate only those crimes that are reported to and recorded by the police; when surveys are used to measure crimes with self-repors of victims, many more victimizations were uncovered that wete not reported to police. We will talk more about triangulation in Chapter 13.

CASE STUDY

Measuring lnmate Misconduct fu we already highlighted in Chapter 1, it is possible to measure offendingin several different ways, including with official arrest data, victimization surveys, and self-report offending data from surveys. There are different types of measurement error associated with each type of measurement tool, but generally, ofEcial daa tend to indicate lower estimates compared to survey data. What if we wanted to measure offending behavior inside correctional facilities? This is an important question, because one indicator of the safety of a prison or jail is the level of inmate offending behavior, generally termed inmate misconduct.

Similar to detection of crime in the general population, detection of crime in a correctional facility is largely influenced by the willingness of victims or witnesses to report the events to authorities. For incidents that are reported, relevant datasets are also influenced by whether an incident makes it into the official record. To determine the convergence between official incident records of inmate misconduct and self-reported offending, Steiner and Wooldredge (2014) collected survey data from inmates as well as official records for the same inmates in correctional facilities in Ohio and Kentucky. They collected data for two groups of inmates: those who had previously served time before and those who had not. To be eligible to participate in the survey, however, respondents had to have been in confinement for six mohths or longer, because the survey asked about misconduct that they engaged in during the past six months. Over 5,600 inmates completed the survey.-To

ope.ationalize whether inmates had committed an assaulg they were asked whether they had "phyeically assaulted another inmate for reasons other than because he tried to hurt you first," or "you stabbed anotler inmate for reasons other than because he ried to hurt you first" (Steiner and Wooldredge 2014, 1083). Offrcial measures of assauh included these behaviors as well as attempted assaults. The authors stated, "Differences in the operational definitions of the types of offenses should be kept in mind when interpreting the findings" (1084). Despite official records including attempted as well as completed assaults, results

CHAPTER 4 o CONCEPTUALIZATION AND MEASUREMENT 97

0.05

0.04

0.03

0.02

0.01

0 Assault

Source: Adapted from Steiner and

Drug Property

ffi Official ffi Self-Reported

Level of measurement:

The complexit'rr of the

ruathernati*rl nleans that

*an b* us*d tr.r expregs ,,he r el*ti*n s li i p betwe* n

a variabie 's values,

The n*minal [e ve I *f ffi oa$u rorrlf, rit,

'.trh i ch

is qualita?i';e, has nr:

mathem*iiral i nt*rpretatisn ; th* quantitative lcvrls of ri0asu reffi errt (*i'd i nal,

intorval ,arsj rati*) ars prilg r*ssiuely rn*re c*rnpl*x

n:athemati**llv.

Nominal level of

measurement: .1lariabies

whos* valu**

have n* mathcrriaii*a!

interprrtaiion; they v*ry in

kind or quaiitv br.rt not ilr

amount,

Wooldredge (20L4, 1083).

indicated that incidence ofself-reported offending behavior was 807o higher than official rates of inmate perpetrated assault. Steiner and Wooldredge also found this to be the case for drug-related offenses, but not for theft-related offenses. Exhibit 4.4 summarizes their findings. As you see, how we operationalize concepts affects our findings. There are many ways we can operationalize constructs in research. We will highlight a few of them next.

VARIABLES AND LEVELS OF MEASUREMENT

Whether we collect information through observations, questions, available data, or the use of unobtrusive measures, the data that result from our particular procedures may vary in mathematical precision. We express this level of precision as the variable's level of measurement. A variable's level of measurement also has important implications for the types of statistics that can be used with the variable, as you will learn in Chapter 14. There are four levels of measurement: nominal, ordinal, interval, and ratio. Exhibit 4.5 depicts the differences among these four levels.

Nominal Level of Measurement The nominal level of measurement (also called the categorical or qualitatizte level) identi- fies variables whose values have no mathematical interpretation; they vary only in kind or quality but not in amount. In fact, it is conventional to refer to the values of nominal variables as awibq.tes instead of oalues. Gender is one example. The vaiable gender has two attributes (or categories or qualities): male and female. We might indicate male with the value I and feruale with the vahe 2, but these numbers do not tell us anything about the difference between male and female except that they are different. Female is not one more

98 SECTION ll . FUNDAMENTALS OF RESEARCH

Exhibit 4.4 Gomparisons of the Prevalence of Self-Reported and Official Measures of Inmate Misconduct

ffitffi American

o (6

=6 :) o

Nominal or categorical level of measurement: Nationality Canadian British

ffii*[i*ru,@ #& ffireM Medium

(D

CU

= C C6 J

o

lnterval level E ila of measurement,

I l= oo. |JlTemperature I I= fiila in degrees 30' l,l= l;frl= Fahrenheit

*ffi *fu

Ratio level of measurement: Group size

unit of gender than male, nor is it twice as much gender. Ethnicity occupation, religious affiliation, and region of the country are also measured at the nominal level. A person may be Spanish or Portuguese, but one ethnic group does not represent more ethnicity than another, just a different ethnicity. A person may be a doctor (arbitrarily valued as 5) or a truck driver (arbitrarily labeled 7), but one does not represent three units more occupation than the other. The values assigned to nominal variables should be thought of as codes, not numbers.

Although the attributes of categorical variables do not have a mathematical meaning, they must be assigned to cases with great care. The attributes we use to measure (categorize) cases must be mutually exclusive and exhaustive:

r Avariable's attributes or values are mutually exclusive attributes if every case can have only one attribute.

. Avariable's attributes or values are qrhaustive attributes when every case can be classified into one ofthe categories.

When a variable's attributes are mutually exclusive and exhaustive, every case corresponds to one, and only one, attribute. Imagine the challenge of coming up with an exhaustive set of attributes when a variable with a Iarge number of attributes is being studied.

High

Mutually exclusive

attributes:

lt,; ariabl*'s a.tt ri bttt ** *r v ?Llu*r:i ?.1t * {{tu?Lufr"lly *xr:l**itt a tt *u*r,{ fias* {:n* fi?tu* *rtl\t

*nrt isltribut*,

Exhaustive attri butes :

{l,i Ar i it*l *' .*t attrib *t"** * r

u'*lt:*t: tn wl,,irlt *vlt\i {:a** tldnl{}* *las*ifi*,J ** *auin* *** ati{i**trt,

CHAPTER 4 . CONCEPTU ALIZATION AND MEASUREMENT 99

Exhibit 4,5 Levels of Measurement

0rdinal level of

measurement:

h n*ausrerne ni *t a,tariable in whirh the numb*rs

i nd irati ng a,rariahlt's ual ues specif'/ *nl',, th* *rder *l t** ta*s$, fr*rnittirtff S r # #t# r^

tfua* and l*ss t,han d isiinr:ticnr,

Ordinal Level of Measurement The first of the three quantitative levels is the ordinal level of measurement. At this level, the numbers assigned to cases speci{, only the order of the cases, permitting greater than and less than distinctions; absolute mathematical distinctions cannot be made between categories.

The properties of variables measured at the ordinal level are illustrated in Exhibit 4.5 by the contrast between the levels of conflict in two groups. The first group, symbolized by two people shaking hands, has a low level of conflict. The second group, symbolized by two persons using fists against each other, has a higher level of conflict. The third group, symbolized by two people pointing guns at each other, has an even higher level ofconflict. To measure conflict, we would put the groups in order by assigning 1 to the low-conflict group, 2 to the group using fists, and 3 to the high-conflict group using guns. The numbers thus indicate only the relative position or order of the cases. Although low level of conflict is represented by the number 1, it is not mathemitically two fewer units of conflict than the high level of conflict, which is represented by the number 3. These numbers really have no mathematical qualities; they are only used to represent relative rank in the measurement of conflict.

The Favorable Attitudes Toward Antisocial Behavior Scale measures attitudes toward antisocial behavior among high school students with a series of questions that involves an ordinal distinction (see Exhibit 4.6). The response choices for each question range from "very wrong" to "not wrong at all"; there's no particular quantity of "wrongness" that these dis- tinctions reflect, but the idea is that a student who responds that it is "not wrong at all" to a question about taking a handgun to school has a more favorable attitude toward antisocial behavior than does a student who says it is "a litde bit wrong," which is in turn more favorable than those who respond "wrong" or "very wrong."

1, How w:rong do you th;ink it is for someone your age to:take a handgun to s'chool?

Ve,r51,W,Lbng

2. How wiong do you think it is for Someone your age to steal anything woith more than $5?

Not wrong at all

Very wrong

Vd,tiV.,.#,ilong

4. How wrong do you think it is {or someone your age to attack someone with the idea of seriously hurting them?

5. How wrong do you think it is for someone your age to stay away f rom school all day when their parents think they are at school?

Sources: Lewis, Chandra, Gwen Hyatt, Keith Lafortune, andJennifer Lembach. 2010. History oftheUse oJ Risk and Protectiue Factors in ltr/ashington State'sHealfuYouth Suruey. Portland, OR: RMC Research Corporation. See alsoArthur, MichaelW.,lohn S. Briney,J. David Hawkins, RobertD. Abbott, Blair L. Brooke-Weiss, and Richard F. Catalano. 2007. "Measuring Risk and Protection in Communities Using the Communities That Care Youtlr Survey." Evaluation and Program Planning 30:. 797 -277.

,N.,61t, rw r: 0 h$,;at. .,4.,t.,..1,

l(}() SECTIONII o FUNDAMENTALSOFRESEARCH

Exhibit 4.6 Example of Ordinal Measures: Eavorable Attihrdes Toward Antisocial Behavior Scale

fu with nominal variables, the different values of a variable measured at the ordinal level must be mutually exclusive and exhaustive. They must cover the range of observed values and allow each case to be assigned no more than one value. Often, questions that use an ordinal level of measurement simply ask respondents to rate their response to some question or statement along a continuum that represents, for example, strength of agreement, level of importance, or relative frequenry. Similar to variables measured at the nominal level, vari- ables measured at the ordinal level in this way classifr cases in discrete categories and so are termed discrete measures.

A series of similar questions may be used instead of one question to measure the same concept. The set of questions in the Favorable Attitudes Toward Antisocial Behavior Scale shown in Exhibit 4.6 is t good example. In such a multi-item index, or scala; numbers are assigned to reflect the order ofthe responses (such as 1 for aery wrong,2 for wrong,3 for a linle bit wrong, and. 4 for not wrong at all); these responses are then summed or averaged to create the index score. One person's responses to the five questions in Exhibit 4.6 could thus range from 5 (meaning they said each behavior is aery wrong) to 20 (meaning they said each behavior is not wrong at all).However, even though these are numeric scores, they still reflect an ordinal level of measurement, because ttre responses they are based on involve only ordinal distinctions.

lnterval Level of Measurement The numbers indicating the values of a variable at the interval level of measure- ment represent fixed measurement units (e.g., the change between each value/unit is equal and incremental), but there is no absolute or fixed zero point. This level of mea- surement is represented in Exhibit 4.5 by the difference between two Fahrenheit tem- peratures.Although 60 degrees is 30 degrees hotter than 30 degrees,60 in this case is not twice as hot as 30. Why not? Because heat does not begin at 0 degrees on the Fahrenheit scale.

An interval-level measure is created by a scale that has fixed measurement units but no absolute or fixed zero point. The numbers can therefore be added and subtacted, but ratios are not meaningfirl. Again, the values must be mutually exclusive and exhaustive.

Social scientists often treat indexes that were created by combining responses to a series of variables measured at the ordinal level as intervalJevel measures. Another example of an index such as this could be created with responses to the Core Institute's (2015) questions about friends' disapproval of substance use (see Exhibit 4.7). The survey has 13 questions on the topic, each of which has the same three response choices. If Do Not Dis- approve is valued at 1, Disapprove is valued at2,and Strongly Disapprove is valued at 3, the summed index of disapproval would range from 13 to 39. The average could then be treated as a fixed unit of measurement. So, a score of 20 could be treated as if it were 4 more units than a score of 16.

Ratio Level of Measurement The numbers indicating the values of a variable at the ratio level of measurement represent fixed measuring units relative to an absolute zero point. (Zero means absolutely no amount of whatever the variable measures or represents.) For example, the following question was used on the National Minority SMHfV Prevention Initiative Youth Questionnaire to measure the number of days during the past 30 days that the respondent drank at least one alcoholic bev- erage. We can easily calculate the number of days that separate any response from any other response (except for the missing value of don't knaw).

During the past 30 days, on how many days did you drink one or more drinls of an alcoholic beverage?

Discrete measure:

trt n:rasu rs lrnl classi{i*s ff,sfi$ iti drstin*t *ateg1*ries,

lndex:

Thc surn *r au*rag* *f rft$poftgfi5 ir: a set *f rau**tlr:n* aYs,tu+, a C*n*epi,

lnterval level of measurement:

A nicas* r*{fr*nt rs! a uari*lsl* in'*hich the nurnhers indicaiing auariahle 's uz"iu** rcprcsent frxcd rlcas*rcmcrrt

urril* hul hai,e ** *ls's*lu1*,,Jr fix*rl ;cr* pr:int,

Hatio level of measurement:

A n:*;rsur*rn*nt *t auari*bi* in urhiclr ih* rrumTs*r* i * d i *ati fi s * ti &{i',.rlslr:' ri,t alu** ro rl re$8 nt tzx*tl m*as u ri n g

**ii,*,a*rJ th*r* is an ahs* luir, z*r* pi: int,

101CHAPTER 4 . CONCEPTUALIZAT ION AND MEASUREMENT

Source: Core Institute 20!5.

102 SECTION ll . FUNDAMENTALS OF RESEARCH

:1

Exhibit 4.7 Ordinal.LevelVariables Gan Be Added to Create an IndexWith Intenral-Level Properties: Gore Alcohol and Drug Sunrey

Exhibit 4.5 displays another example of a variable measured at the ratio level. The number of people in the first group is 5, and the number in the second group is 7. The ratio of the two groups'sizes is then 1.4, a number that mirrors the relationship berween the sizes of the groups. Note that there does not actually have to be any group with a size of 0; what is important is that the numbering scheme begins at an absolute zero: in this case, the absence of any people. The number of days a convicted felon was sentenced to prison would represent a ratio level of measurement, because sentence length begins with an absolute zero point. The number of days an addict stays clean after treatment, too, has a ratio level of measurement.

For most statistical analyses in social science research, the interval and ratio levels of measurement can be treated as equivalent. In addition to having numerical values, both the interval and ratio levels also involve continuous measures: The numbers indicating the values ofvariables are points on a continuum, not discrete categories. But despite these similarities, there is an important difference between variables measured at the interval and ratio levels. On a ratio scale, 10 is 2 points higher than 8 and is also 2 times greater than 5-the numbers can be compared in a ratio. Ratio numbers can be added and sub- tracted, and because the numbers begin at an absolute zero point, they can be multiplied and divided (so ratios can be formed between the numbers). For example, people's ages can be represented by values ranging from 0 years (or some fraction of a year) to 120 or more. A person who is 30 years old is 15 years older than someone who is 15 years old (30 - 15 = 15) and is twice as old as that person (30 / 15 = 2). Of course, the numbers also are mutually exclusive and exhaustive, so that every case can be assigned one and only one value.

The Case of Dichotomies Dichotomies, variables having only two values, are a special case from the sandpoint of lev- els of measurement. Although variables with only two categories are generally thought of as nominally measured, we can also think of a dichotomy as indicating the presence or absence of an attribute. Suppose, for example, we vr'ere interested in differences between individuals who had never used illegal drugs in the last year and those who had used at least one illegal drug in the last year. We could create a variable that indicated this dichotomous distinction by coding those individuals who said they did not use any of the substances listed as 0 and all others as 1. \4ewed in this way, there is an inherent order to the two values: In one group, the atrribute of consuming illegal substances is absent (those coded 0), and in another, it is present (those coded l).

Gomparing Levels of Measurement Exhibit 4.8 summarizes the types of comparisons that can be made with different levels of measurement, as well as the mathematical operations that are legitimate for each one. AII four levels of measurement allow researchers to assign different values to different cases. All three quantitative measures allow researchers to rank cases in order.

An important thing to remember is that researchers choose levels of measurement in the process of operationalizing the variables; the level of measurement is not inherent in the variable itself. Many variables can be measured at different levels, with different pro- cedures. For example, the Core Alcohol and Drug Survey (Core Institute 2015) identifies binge drinking by asking students, "Think back over the last two weeks. How many times have you had five or more drinls at a sitting?"You might be ready to classii, this as a ratio- level measure, but you must first examine the fixed response options given to respondents.

Continuous measure:

A n:ear*r* rryith *un:bers indicatinU th* ,rniu** *l uariafsi*s a$ p*irit* rs* a *rsn{iil*r.rfil,

[)ichotomy:

h variahl* havins rrrly t"i,rr

val ues

CHAPTER 4 . CONCEPTU ALIZATION AND MEASUREMENT 103

,/

This is a closed-ended question, and students are asked to indicate their answer by checking None, Once, Twice, 3 to 5 times, 6 to 9 times, or l0 or more times. Use of these categories makes the level of measurement ordinal. The distance between any two cases cannot be clearly determined. A student with a response in the "6 to 9 times" category could have binged only one more time than a student who responded in the "3 to 5 times" category or he or she could have binged four more times. With these response categories, you cannot mathematically distinguish the number of times a student binged, only the relative amount ofbinging behavior.

It is usually a good idea to try to measure variables at the highest level of measurement possible. The more information available, the more ways we have to compare cases. We also have more possibilities for statistical analysis with quantitative variables than with qualita- tive variables. Thus, if doing so does not distort the meaning of the concept that is to be measured, measure at the highest level possible. For example, even if your primary concern is only to compare teenagers with young adults, measure age in years radrer than in catego- ries; you can always combine the ages later into categories corresponding to teenager and young adult.

Be aware, however, that other considerations may preclude measurement at a high level. For example, many people are very reluctant to report their exact incomes, even in anonymous questionnaires. So, asking respondents to report their income in cate- gories (such as under $10,000; $10,000-$19,999; $20,000-$29,999; etc.) will result in more responses, and thus more valid data, than asking respondents for their income in dollars.

Often, researchers treatvariables measured at the interval and ratio levels as comparable. They then refer to this as the interval-ratio lerrel of measurement. You will learn in Chapter 14 that different statistical procedures are used for variables with fixed measurement units, but it usually doesn't matter whether there is an absolute zero point.

DID WE MEASURE WHAT WE WANTED TO MEASURE?

Do the operations developed to measure our concepts actually do so? Are they valid? If we have weighed our measurement options, carefully constructed our questions and observational procedures, and carefully selected indicators from the available data, we should be on the right track. We cannot have much confidence in a measure until we have empirically evaluated is validity. Additionally, we must also evaluate its reliability (consistenry).

!nterval-ratio level of measurement:

A ffcasilrom*nt of a variable

in which the nurnber$

inilicaiinu the variabl*'s

lralile$ reprfi$fintfix*d

ffifiasuremeilr ilnits, hut therr ffifiy be n* abstlute or iixed

7*r0 p{:}int,

104 SECTION II . FUNDAMENTALS OF RESEARCH

Exhibit 4.8 Properties of Measurement Levels

Measurement Validity In Chapter 2, you learned that measurement vdidity refers to the extent to which measures indicate what they are intended to measure. We want to discuss it in more detail here, along with the wap validity can be assessed.

We briefly discussed the difference between ofEcial police reports and survey data in Chapter 1. We noted that official reports underestimate the actual amount of offending because a great deal of offending behavior never comes to tfie attention of police (Mosher et aL.2002)- There is also evidence that arrest data often reflect the political climate and police policies as much as they do criminal activity. For example, let's suppose we wanted to examine whether illicit drug use was increasing or decreasing since the advent of the United States' "War on Drugs," which heated up in the 1980s and is still being fought today. During this time, arrest rates for drug offenses soared, giving the illusion that drug use was increasing at an epidemic pace. IIowever, self-report surveys that asked citizens direcdy about their drug use during this time period found that use of most illicit drugs was actually declining or had stayed the same (Regoli and Hewitt 1994).h your opinion, then, which measure of drug use-the UCR or self-report surveys-was more valid? The extent to which measures indicate what they are intended to measure can be assessed with one or more offour basic approaches: face validation, content validation, criterion valida- tion, and construct yalidation. Whatever the approach to validation, no one measure will be valid for all times and places. For example, the validity of self-report measures of substance abuse varies with such factors as whether the respondents are sober or intoxicated at the time of the interview, whether the measure refers to recent or lifetime abuse, and whether the respondents see their responses as affecting their chances at receiving housing, treat- ment, or some other desired outcome (Babor, Stephens, and Marlatt 1987). In addition, persons with severe mental illness are, in general, less likely to respond accurately (Corse, Hirschinger, and Zrnis 1995). These tFpes of possibilities should always be considered when evaluating measurement validity.

FaceValidity

Researchers apply the term face ""Iidity

to the confidence gained from carefirl inspection of a concept to see if it is appropriate "on its face"-whether it appears to measure what it intends to measure. For example, if college students' alcohol consumption is what we are try- ing to measure, asking for student's favorite color seems unlikely on its face to tell us much about their drinling patterns. A measure with greater face validity would be a count of how many drinks they have consumed in the past week.

Although everymeasure should be inspected in this way, face validation on its own is not the gold standard of measurement validity. The question "How much beer or wine fid you have to drink last week? " may look valid on its face as a measure of frequency of drinking, but people who drink heavily tend to underreport the amount they drink. So the question would be an invalid measure in a study that includes heavy drinkers.

ContentValidity

Content validity establishes that the measure covers the firll range of the concept's meaning. To determine that range of meaning, the researcher may solicit the opinions of expers and review literature that identifies the different aspects of the concept.

An example of a measure that covers a wide range of meaning is the Michigan Alcohol- ism Screening Test (MAST). The MAST includes 24 questions representing the following subscales: recognition of alcohol problems by self and others; legal, social, and work prob- lems; help seeking; marital and family difficulties; and liver pathology (Skinner and Sheu 1982). Many experts familiar with the direct consequences of subsance abuse agree that these

Measurement validity: Th*t'ifrrt rf vai!riity that i,r srhieved ',iuhtn a msft$ilr*

rit*e.ilr*s what it is pr**tsrrt*d ir mrla*ur*.

Face validity: The typ* of vaiiclily that

cxitt,s wl',*r, an insprlction *f the iten"rs u**d lo ril*asure a **n*epi *rlS$*sfs ihai th*y awd{}pr{}*riat* " *n t**ir tac*,"

Content validity: Tlt*tttp* *t ualirlity ihat r*tah I i*lt*:t tlta! * tilea$il rs *ol;or$ ti:c fuil ran7* *i tha **nc*pi'g rrraninil,

CHAPTER 4 . CONCEPTU AL|ZATION AND MEASUREMENT 105

dimensions capture the firll range of possibilities. Thus, the MAST is believed to be valid from the standpoint of content validity.

Criterion Validity

Consider the following scenario: When people drink an alcoholic beverage, the alcohol is absorbed into their blood and then gradually metabolized (broken doum into other chemi- cals) in their liver (I\IAAA 1997).The alcohol that remains in their blood at any point, unme- tabolized, impairs both thinking and behavior (I\TIAAA 1994). As more alcohol is ingested, cognitive and behavioral consequences multiply. These biological processes can be identified with firect measures of alcohol concentration in the blood, urine, or breath. Questions about alcohol consumption, on the other hand, can be viewed as attempts to measure indirecdy what biochemical tese measure direcdy.

Criterion validity is established when the scores obained on one measure can be accu- rately compared to those obtained with a more direct or already validated measure of the same phenomenon (the miarion). A measure of blood-alcohol concentration or a urine test could serve as the criterion for validating a self-report measure of drinking, as long as the questions we ask about drinking refer to the same time period. Observations of substance use by friends or relatives could also, in some circumstances, serve as a criterion for validating self-report substance use measures.

Criterion validation studies ofsubstance abuse measures have yielded inconsistent results. Self-reports of drug use agreed with urinalysis results for about 85% of the drug users who volunteered for a health study in several cities @eatherby et al. 1994). On the other hand, the posttreatrnent drinking behavior self-reported by 100 male alcoholics was substantially less than the drinking behavior observed by the alcoholics' friends or relatives (Watson, Tilleskjor, Hoodecheck-Schow, Pucel, andJacobs 1984). Such inconsistent findings can occur because of differences in the adequacy of measures across settings and populations. This underscores our point that you cannot assume that a measure that was validated in one study is also valid in another setting or with a different population.

An attempt at criterion validation is well worth the effort, because it gready increases confidence that the measure is actually measuring the concept of interest-the criterion validity basically offers evidence. However, often, no other variable might reasonably be con- sidered a criterion for individual feelings or beliefs or other subjective states. Even with vari- ables for which a reasonable criterion exists, the researcher may not be able to galn access to the criterion, as would be the case with a tax return or employer document as a criterion for self-reported income.

Construct Validity

Measurement validity also can be esablished by showing that a measure is related to a variety of other measures as specified in a theory. This validation approach, lnown as construct validity, is commonly used in social research when no clear criterion exists for validation pu{poses. For example, in one study of the validity of the Addiction Severity Index (ASI), Mclellan and his colleagues (1985) compared subject scores on the ASI to a number of indicators that they felt from prior research should be related to substance abuse: medical problems, employment problems, legal problems, family problems, and psychiatric problems. They could not use a criterion validation approach because they did not have a more direct measure ofabuse, such as laboratory test scores or observer reports. Ilowever, their extensive research on the subject had given them confidence that tlese sorts of other problems were all related to substance abuse, and thus, their measures seemed to be valid from the standpoint of construct validity. Indeed, the researchers found that individuals with higher ASI ratings

Criterion validity: Tlt* trtfi* rtf ualirjity thnr is cstah I i*h*r) i:v r*r* pa.ri r"r s !?r* x*rsr*'t *lstai**d *n th* rfi*a,tlrrs b*i*fi ualirlat*r) +,*

tlttssrt, r:t:tairs*d with a firl:r*

rjirer:t rtr alreariy validatrd

p:h*n,.:rnrn*n (thr rlil*ri*n\,

Construct validity: The typr *{ validity that is

**\al*lishctJ hy sho'r,,i fifr +,hal"

a ilrf;a$ilr* is r*lat*d ts *th*r ffi*a$ilre, as si:*r:ifr*d in a ln*rsrv,

106 SECTIONII . FUNDAMENTALSOFRESEARCH

tended to have more problems in each of these areas, giving us more confidence in the ASI's validity as a measure.

The distinction between criterion and constnrct validation is not always clear. Opin- ions can differ about whether a particular indicator is indeed a criterion for the concept that is to be measured. For example, if you need to validate a question-based measure of sales ability for applicants to a sales position, few would object to using actual sales per- formance as a criterion. But what if you want to validate a question-based measure of dre amount of social support that people receive from their friends? Should you ask people about the social support they have received? Could friends' reports of the amount of sup- port tley provided serve as a criterion? Are verbal accounts of the amount of support provided adequate? What about observations of social support that people receive? Even ifyou could observe people in the act ofcounseling or otherwise supporting their friends, can an observer be sure that the interaction is indeed supportive? There isnt really a crite- rion here, just related concepts that could be used in a construct validation strategy. Even biochemical measures of substance abuse are questionable as criteria for validating self- reported substance use. llrine test results can be altered by ingesting certain substances, and blood tests vary in their sensitivity to the presence of drugs over a particular period of time.

What construct and criterion validation have in coilrmon is the comparison of scores on one measure to scores on other measures that are predicted to be related. It is not so impor- tant that researchers agree that a particular comparison measure is a criterion rather than a related construct. But it is very important to think critically about the quality of the compari- son measure and whether it actually represents a different measure of the same phenomenon. For example, it is only a weak indication of measurement validity to find that scores on a new self-report measure of alcohol use are associated with scores on a previously used self-report measure of alcohol use.

Measurement Reliability Reliability means that a measurement procedure yields consistent scores as long as the phe- nomenon being measured is not changing. For example, if we gave students a surveywith the same questions asking them about their alcohol consumption, the measure would be reliable if the same students gave approximately the same ansvr'ers six months later (assuming their drinking patterns had not changed much). If a measure is reliable, it is affected less by random error or chance variation than if it is unreliable. Reliability is a prerequisite for measurement validity; we cannot really measure a phenomenon if the measure we are using gives inconsis- tent results. Unfortunately, because it is usually easier to access reliability than validity, you are more likely to see an evaluation of measurement reliability in research than an evaluation of measurement validity.

Problems in reliability can occur when inconsistent measurements are obtained after the same phenomenon is measured multiple times, with multiple indicators, or by mul- tiple observers. To assess these different inconsistencies, there are four possible methods: test-retest reliability, interitem reliability, alternate-forms reliabilitg and intraobserver and interobserver reliability.

Test-Retest Reliability

When researchers measure a phenomenon that does not change between two points sepa- rated by an interval of time, the degree to which the two measurements yield comparable, ifnot identical, values is the test-retest reliability ofthe measure. Ifyou take a test ofyour

Beliability:

A ffi*asurc is l"*|t'*l:l* ',r+ft*rt it yie lds c*nsist*ni *cr:r"es rr *hs*rv ati*** *l a friv** ph*nrmrn*n {J,r d iff*r*nt * c * as i *nr:, ff *li't lsilittt

is a pr*requisit* frr fi I rA$ U t t{ft*{lt, VaI i i} i ty,

Test-retest rel iabi I ity: A u:e asurgmont ,:h*'';'li*U tf,at

ffioasu r*s rsl a phe natn *fifr{l at tv'rr: p*irits r* ttr** ar* IriUlrly t:*i{tlateci il tl:t: plreriorne rior: ilas rrtt

chanU*d rsr h*v* *'rtang*d

*u!y as ryiurh a* the ph*n*m*nrn its*if.

CHAPTER 4 . CONCEPTU AL|ZATION AND MEASUREMENT 107

math ability and then retake the test two months later, the test is performing reliably if you receive a similar score both times, presuming that nothing happened during the two months to change your math ability. Of course, if events between the test and the retest have changed the variable being measured, then the difference between the test and retest scores should reflect that change.

One example of how test-retest reliability may be assessed is a study by Sobell et al. (1988) of alcohol abusers' past drinking behavior (using the Lifetime Drinking History ques- tionnaire) and life changes (using the Recent Life Changes questionnaire). All 69 subjects in the study were patients in an addiction treatrnent program. They had not been drinking immediately prior to the interview (as determined by a breath test). The two questionnaires were admilistered by different interyiewers about two or three weels apart; both times, they asked the subjects to recall evens eight years prior to the interviews. Reliability was high: 92o/o of rhe subjects reported the same life events both times, and at least 81% of the subjects were classified consistendy at both interviews as having had an alcohol problem or not. When asked about their inconsistent answers, subjects reported that in the earlier interview they had simply dated an event incorrecdy, misunderstood the question, evaluated the importance of an event differendy, or forgotten an event. Answers to past drinking questions were less reliable when they were very specific, apparendy because the questions exceeded subjects' capacities to remember accurately.

lnteritem Reliability (lnternal Consistency)

When researchers use multiple items to measure a single concept, they are concerned with interitem reliability (or intemal consistmcy). For example, if we are to have confidence that a set of questions (such as those in Exhibit 4.9) reliably measures attitudes toward violence, the answers to the questions should be highly associated with one another. The stronger the asso- ciation between the individual items and the more items included, the higher the reliability of the index. Cronbach's alpha is a reliability measure coilrmonly used to measure interitem reliability. Of course, interitem reliability cannot be computed if only one question is used to measure a concept. For this reason, it is much better to use a multi-item index to measure an important concept (Viswanathan 2005).

Alternate-Forms Reliability

Researchers are testing alternate-forms reliabilitywhen they compare subjects' answers to slighdy different versions of survey questions (Litwin 1995). A researcher may reverse the order of the response choices in an index or modiS, the question wording in minor ways and then readminister that index to subjects. If the two sets of responses are not too different, alternate-forms reliability is established.

A related test of reliability is the split-hdves reliability approach. A survey sample is divided in two by flipping a coin or using some other random assignment method. These rwo halves of the sample are then administered the two forms of the questions. If the responses of the rwo halves of the sample are about the same, the measure's reliability is established.

lntraobserver and lnterobserver Reliabitity

When ratings by an observer, rather than ratings by the subjecs themselves, are being assessed at two or more points in time, test-retest reliability is termed intraobserver reli- ability or intrarater reliability. Lett say a researcher observes a grade school cafeteria for signs of bullying behavior on multiple days. If her observations captured the same degree of bully- ing on every Friday, it can be said that her observations were reliable. When researchers use more than one observer to rate the same persons, events, or places, interobserver reliability

I nteritem reliabi I ity: An ar:pr*ar:h that r:ai*ul*i*x

r*lialsilitrl has*d i:rr the

mrrrlati*n affi ofi $ r::u ltipic iten:s usrd h fr1***ttr* & sin$l* t*rr*#pt,

Cronbach's alpha:

A staii*iic thai ffieasLires {h* r*liabilil'ii:f itcrlrs in an indm *r scalr,

Alternate-forms rel iabi I ity: A prr:c*riu r* irsr t*ttinU the rcliahility *f r*$piln$ft$ t0 $llrll0y qu*$ti$n* i"t

urh ich suhjr*ts' alrrliry*r$ &r{i **iTlpar*d aftrlr the

suhj*ctn hav* hmn ari!<**, sliuhtly rliffe r*nr ver*ii:ns

*f thr fiLicsti*ns *r whr:* ra*d*nlv s* kcted hal v*s rst tit* san:ple havc h*et: aci rn i n istc reri s I i U trtly

dif{er*nt "rersions

rf ihe que *tir:ns.

Split-halves reliability: ft*l iahi I ity achi*verl rnh*n

rJue sti*rts rs, tv'trs ra*d*rnlv

se le*leri haives r:f a saml:le

arr ah*ut thr *arn*,

I ntraobserver re! iability: trnsisttn *'t *t ratinUs hy an r.rbs*rvtr r:f an un*han{}in* phrn*mtn rsn al ivrc *r trt{)r* pcints in tirrr*,

I nterobserver rel iability: ,ffhrn ,qiniilar m*a$ure rnenfs

er* *btai n*d lt',ldiffrr*nt *b**rvtrs rdti{lfr th* garn*

108 SECTION II . FUNDAMENTALS OF RESEARCH

Would you approve of a man punChing a -stra,n,ger who..h,ad hit the man's chiid after the child accidentally damaged the stranger's car?

No (0)

I don't know or not sure (1)

Yes (2)

Wo,uld you approve of :a man punching a str:an:gerwho waS beadng up , : :,,,,, J,,,,

awomananG.,LfiemanSaw1tr:,,:,,..

1\o (0)

I don't know or not sure (1)

Yes (2)

Would you approve of a man punchirrg a stranger who had broken into lhe man's house?

No (0)

I don'l know or not sure (1)

Yes (2)

Source: Cao, Adams, and Jensen 1997 ,37O.

is their goal. If observers are using the same instrument to rate the same thing, their ratings should be very similar. In this case, the researcher interested in cafeteria bullying would use more than one observer. If the measurement of bullying is similar across the observers, we can have much more confidence that the ratings reflect the actual degree of bullying behavior.

It is also important to establish an adequate level of intercoder reliability when data, whether observations or interviews, are transferred from their original form into structured codes or simply into a data entry program or spreadsheet. There can be weak links in data processing, so the consistency of coders should be tested.

Ways to lmprove Reliability and Validity We must always assess the reliability of a measure if we hope to be able to establish its validity. Remember that a reliable measure is not necessarily a valid measure, as Exhibit 4. 10 illustrates. This discrepancy is a common flaw of self-report measures of substance abuse. The multiple questions in self-report indexes of substance abuse are answered by most respondents in a consistent way, so the indexes are reliable. Ffowever, a number of respondents will not admit to drinking, even though they drink a lot. Their answers to the questions are consistent,

I ntercoder reliability: Vfhr*,s the gai** {:ttd#g'dr*

*r,t*r *rl hU riiti*r *rrt c *rl * rs

'dvho ar* ro**'"#in* Ihe $arno rlata,

CHAPTER 4 . CONCEPTU AItZATION AND MEASUREMENT 109

Exhibit 4.9 Questions Used in the Violent Defensive Values Index

Measure: "How much do you drink?"

Subject 1

Measure is reliable and valid.

Subject 2

Measure is reliable but invalid.

Time 1 Time 2

but they are consistendy misleading. So, the indexes based on self-report are reliable but invalid. Such indexes are not usefirl and should be improved or discarded. Unfortunately, many measures are judged to be worthwhile on the basis only of a reliability test.

The reliability and validity of measures in any study must be tested after the fact to assess the quality of the information obtained. But if it turns out that a measure cannot be considered reliable and valid, litde can be done to save the study. Ilence, it is supremely important to select measures that are likely to be reliable and valid in the first place. In studies that use interviewers or observers, careful training is often essential to achieving a consistent approach. In most cases, however, the best strategy is to use measures that have been used before and whose reliability and validity have been established in other contexts. Know that the selection of tried and true measures still does not absolve researchers from the responsibility of testing the reliability and validity of the measure in their own studies-

The process of evaluating the reliability and validity of measures about individuals is termed psychometrics. Measures of individuals that range from tests you ake in school to personality assessments you complete on the job are advertised as psycham.etrically oalid after multiple studies have demonstrated their reliability and validity. The process of evaluating the reliability and validity of measures about organizations, neighborhoods, or other collective

z,.,-----ffiF*

€iiffiffiii#fiir#LfXiH -si'iGiulElitE!'","*'H fIJ

.f " _ -a

ii

-ki-

Time 2

--,-----------

€i#fl ruftii#il#litinr" -rtuj,.

-!:W/itwh ,, EUEWDfu. 4M /3WIW 'a/&fr/@wM///#n'v{z{//"

=ffiiEr!F..'--.x-

fflffi1ffilffiiii#ltu,] 'iliEEEiDlr

,*\ )P {"J

..kF- ,,+rarr{luo:ffishr4\ a't

in n$n ?$ !i{ E !,iE

$L j

,r..4t I F+ vnr:"ey

v $E t 3 d3 I g YT E i ?4 ,rt) \? I r. lt y., tt t i tt ! ! t9 I z tx s 4 rE .t L dt -i

Time 1

ffii.

€iiil$ilffift i#tiii#ltilD

@*ry WWHEItWMEfu WW4/MM VMWE

Psychometrics:

Ihe process fif ei,iililating iht reliabiliiy an:-J u'*litjily rlf ffi *a.iJ r*s ahtlut i ndivirjual,q,

110 SECTION ll . FUNDAMENTALS OF RESEARCH

Exhibit 4.10 I}te Difference Betureen Reliability and Validity: Drinking Behavior

unis is termed econometrics, a method used by Raudenbush and Sampson (1999). For example, Raudenbush and Sampson's econometric evaluation of their observational measures of Chicago neighborhoods included a test ofthe consistency ofratings by multiple observers of the same neighborhoods.

It may be possible to improve the reliability and validity of measures in a study that already has been conducted if multiple measures were used. For example, in a study of hous- ing for homeless mentally ill persons, residents' substance abuse was assessed with several dif- ferent sets of direct questions as well as with reports from subjects' case managers and others (Goldfinger et al. 1996).It was discovered that the observational reports were often inconsis- tent with self-reports and that different self-report measures were not always in agreement and were thus unreliable. A more reliable measure of substance abuse was initial reports of lifetime substance abuse problems. This measure was extremely accurate in identi{ring all those who subsequendy abused substances during the project.

A COMMENT ON MEASUREMENT IN A DIVERSE SOCIETY

Throughout this chapter, we have communicated how important measurement is for research. Although it is crucial to have evidence of reliability and validitg it is important that such evidence also applies to different subgroups within the population. Often, people of color; women; the poor; the lesbian, gay bisexual, and transgender (,GBTQ) community; and other groups have not been adequately represented in the development or testing ofvarious measurement instruments $.Vitkin 2001).Just because a measnre appears valid does not mean that you can assume that it validly measures a construct for different subgroups.

It is reasonable to consider whether the concepts we use have universal meaning or differ across cultures or otler groups. Hui and tiandis (1985) suggest that four components must be evaluated to determine whether a concept differs across cultures:

Conceptual eqaiaalence.The concept must have the same meanirg, have similar precursors and consequences, and relate to other concepts in the same way.

Oper"ational equiualence.The concept must be evident in the same \May so that the operati onahzation is equivalent.

Item eqaiualence.Items used must have the same meaning to each culture.

Scaler eqaiualence.The values used on a scale mean the same in intensity or magnitude.

Thke the concept, self-control,which has been linked to delinquenry and adult offending (for a review of this construct, see Gottfredson and Hirschi 1990). Researchers have used several different types of variables to measure self-control, sometimes called wlf-regulation for young children, but the majority of research has simply assumed that these measures validly measure self-conuol for all individuals. Sulike and his colleagues (2010) wanted to know whether the measure of self-regulation that had been used in many studies of children dif- ferentially measured the construct across different socioeconomic, gender, and racelethnicity groups. The construct they measured was actually called ffirful control @C) and is generally conceptualized as the ability to shift and focus attention as needed and to activate and inhibit behavior (e.g., aggression) as needed, especiallywhen one does notwant to. Sulike et al. (2010) state, "Much of the work on EC has been conducted with [a] primarily European-American sample" (11).

1.

2.

3.

4.

Econometrics:

T l;* ;sr r:c*s$ fi f *,; a|',sriting i.** r *lio1ilit;i artI, u frli ditti r:l ni*?lti*r *'t

als rx st. * r g z ni ru:l.i rt * x,

n *i **'rs * r* rs ts d's, {} r t}tli * Y

**li**litt* itnt!,g,

CHAPTER 4 . CONCEPTU ALIZATION AND MEASUREMENT 11t

112

To determine whether an EC similarly measured self-regulation across gender and race/ ethic groups, Sulike and his colleagues (2010) tested children from 53 preschools in and around Houston, Texas, and 58 preschools in and around Thllahassee, Florida. Researchers gave over 800 preschoolers different tasls designed to measure different factors related to self-regulation, including a task called "waiting for bow," in which a wrapped gift was placed on the table within the child's reach while the researcher explained that he or she forgot the bow on the gift. Children were asked to stay in their seats and not touch or open the gift until the researcher came back from retrieving the bow in another room. Without elaborating on the complex statistical analysis performed by the researchers, suffice it to say that they did not find any differences in the results of their EC measure across subgroups. The concluded, "This indicates that the construct of EC behaviors in a similar way across groups, and that a wide variety of asks index a single latent EC construct" but they caution, "it would be useful to examine the measurement invariance of EC across different levels of SES and in a range of cultures, including groups outside the United States" (20).

Similar concerns have been noted for scales measuring depression. For example, Newmann (1987) has argued that gender differences in levels of depressive symptoms may reflect differences in the socialization process of males and females. She suggests that some scales ask questions about items such as crying, being lonely, and feeling sad, which are more litely to be responded in the afErmative by women and not by men because men are social- ized to not express such feelings. More recent studies have found similar gender differences in response patterns (Cole, Kawachi, Maller, and Berlrnan 2000; Sigmon et al. 2005). Similarly, Ortega and Richey (1998) note that people of color mayrespond differendy to questions used in depression scales. Some ethnic groups report feelings ofsadness or hopelessness as physical complaints and therefore have high scores on these questions but low scores on emotion- related items. Different ethnic groups respond differendy to "how do you feel" questions and "what do you think" questions. Ortega and Richey also note that some items in depression scales, such as suicidal ideation, are not meaningful to some ethnic groups. The elderly are more likely to endorse some items that also measure physical changes as opposed to changes brought about by depression (Sharp and Lipsky 2002). In sum, we must think critically about how our conceptualizations and operationalizations may be understood differendy across dif- ferent subgroups of society.

CONCLUSION

Remember always that measurement validity is a necessity for social research. Gathering data without carefirl conceptualization or conscientious efforts to operationalize key concepts is often a wasted effort.

The difficulties of achieving valid measurement vary with the concept being operational- ized and. the circumstances of the particular study. The examples in this chapter of difEculties in achieving valid measures of substance abuse should sensitize you to the need for caution, particularly when the concepts you wish to measure are socially sti gmadzed md./or illegal.

Planning ahead is the key to achieving valid measurement in your own research; careful evaluation is the key to sound decisions about the validity of measures in others' research. Statistical tests can help determine whether a given measure is valid after data have been col- lected, but if it appears after the fact that a measure is invalid, litde can be done to correct the situation. Ifyou cannot tell how key concepts were operationalized when you read a research report, do not trust the findings. And if a researcher does not indicate the resuls of tests used to establish the reliability and validity of key measures, remain skeptical.

SECTION ll . FUNDAMENTALS OF RESEARCH

ttt rNl nunSvrw oNV Nottvznv:nrdlf,Noc . t urraVuc

,,, ,, 'rrodrl rJoqi e u1 punoi ereq no{.leg,u, arp16 ' isuoPlu$ep e{t ere rBlltuls moffi 'acuoloF j,o, suuoy

:,::::': .....r5.fiid...,..Puui....5|fidtdifi..8$fi'U*sl.at'Suiiql!...8urpn1cu1 ,'aruo' ...,u' uI,sJnJJo lEql ecualo,|zt IIB eqlJJsep ol fl:esn sJeqlo eIIIilrt :..::,,....:. .................lnoun[0$,..tuu*ed. 5i pu1...0i..sfi*ti5idi.'.efie....x-u $....ilb,{^.

,a1u1plo rftsaa,tilp :lorrot eql asn ilFs s.reqcluasoJ:ouros

'eldrucxe rog ;elcplu q,frr "l,dt.reil

pelgip,rdacuoc : , ,

oW Sf noiiniisui jno,( r{q, persa88ry rdacuoJ ,rg}o

1;;..,.;l..txfit0isEuu$.i;;P5## 5$.l;xa'm.1.1.{ffil.t$U!s1sl:.,..pb;uEilfi.;urumft..,;...1.,l':.....,.

qijebaar iecl8oloulwlrJ u1 srdeeudc lurrJod*i 5rV 'i

ffiffiffitHffi#ffi1ffiffi

..:.l.:...'.:..... .......i.,l.....:.........r ua siu$u.s .ilt6'ii.$.su5....l.iu.isa.asl..il..,; u,..$dffir6q'l.q

................'.fi|*,iiuu6He*5dt.....5i'..u,i uue ,;qi,Hau xa'....U.sr*ffi5i5p s1,

........'.....S.i ulilu^iitit.o.ll5n5.I..+fi,1ffi0]Hs.u.9;m.....d$si.i'.(*5ldffiu.....ieoffi.

..,:::l::.,::.:,.::l::...ltl:6.tiux.::::p:fie,:...'.iuffi5ifftl::ll.'iUulpt=4l:.l:t. fidffibc rse a1) pupou' :rFurad .(eql suollerodo leJleuloqiuul aql;o 6palduroc

i:,i,i:i:]i:,:1:,.::,::li:t9:1P5*lplo d4 ue5,,luoiffi5 fiSU5ffi::iq:i:$i5d5ii::irfiAii:]lSq

'uoJtEIJB^ sl| eQlrcsap ,ol, p:esn,aQ uec luqr, scpsllBls Jo edlr eql pu:_:iqtiie E rnoQe paqelQo uopuurroTul

,,,,,,,,,,,,,,,,,r,,,,,,,,,,,,,,,,,u0qo. a*d.Hi...,'1tfi....ffip.i1uf....if5if5idSubr.;o e,c,ue$.i*5......,. ::::-:::::.:.!.:::::::.::.:::

.iiis.58. .ris'....4' *...i$5p1fi6id'ub,rup11en uoF.5il*.

..i uiiu lidi .

lcn4ssoc puu,..,..r.uq!tu.p.liiufii.rfi.6.!*s*.!*5...lu6.i}up.iiux.,..,fusluo,c :::,:,,,,:,,:,,,,: i ,,i 'uolrepilel,:jei :saqJeordde r|,seQ r:nog ue e;agj,

....'........'... .#a+mi.'dfi''..Milolu...$ta"ut se,nsz'*..i*...o*F.ilu#'..5,n'

..,ll..,..,.....'.....,.,.,..'bs5,Ht..#b...1=#6,I18uiqm05...,5' us...,..*0.....'.siuijeiBuI,,Papree.$ip.1.....

.iisfoiediTPu1..iua.t*fic.,.r.'.siud niop uoriiin...ie.$ ....Hfril....s b"4 .

irii:.,'i$5Ii.siitils...,.I.6,,ty.*ocalr,elqr$.i[e*u..,t,i*uua*,ilias,,'uo!iea,a.$$6......

.......i......ffi6ri..,..S*ii5.p.....ffi.ffi..,q5!fi ....'s5inbeiurio,tisfiuilU5ipui...5*.AU........

,, i6, ouo dq gcreese.r u1 paz11euope"t"

,ffill:,, ..,....l.. rrrrtr;.:'....ld*itanp.uf

$.....t.s0.i ui*e|.ifli$|0.5u.1..i.di...fro'-e,;\layopu'r5d{.......

..1.....;l;;;...l..,;.;.;|.$r.,ii$s iiffi'l.ual*ua;xiuffi#55u4{':ufl*a,ssfi;;.;sx!id$pbhiuI, : : ::'I{Jreaser u1 oior pclilc E sr(Eid "io ivn;yieruda5uoC

.#5fi5isi$-#6$i.1.b suuer

,u1 ro,..fas$i...,.Sm..i.6...sffifi6i.,...e Uu*.5ift,' ,

.ll.l.........,'dt.'s5$f6id$di.ia......u4s1*u.uf6'i,....f.....$suu*fiii..#dfd[si5*#6c ; : :

iiffiSrFoiglii. ni.i.$ffi*5.jii$i.iilsidffiplu.idli1$5$a*r11ei....fi8**4ffi.......'........,,.....

...,,...l uSsu.s$uiiPf..iiiuudii.ftiit tt.Sf....'U.l*gX:X6uis$65$f..,i4*...Sir

:

l'......lrr.l;.l.l.Ss***usu,l|ifiu,1 {,,.ffibqr,ry, trrnrrea.,li 5ffi$xil$

.diffi,.:i'i.i..ii,i...i.ii.i.l

I..ll.l.l.il..l...i...,....l..'..i..s..6..l.,til..i...*',{="se.stu.....e4snfiqoun

il6i ffiffiH***ffi

'...,'.'...'.,illl,..ll...,.......=fr

6Heffi SsqoiBiJos,Jp-cffi tffi l. 86r ffi*offinffiffi

.x...0.I....l......,*ue**dTf f l*.{ft ffi I,,tFu*H.',

'.'i,...:.....i.......

l.i.uu'n''...' iffio ......''.....,..i......... t6,,.,.'l....... iriu sfi'1.taff6iilui$$6'i

l..'.............':.l...'.l..'..'...'...ll....,.................f.611.'.

'uollfiliB-ubiiufi5fl'6.. i;.liiiii;i;.ii;.i;.ii;i;li.;.i.i.iiijii;1i;'i. ffi';ii.;.iii.iiiii.ii.ii$fi6ftSe.fib.1.p.e'pUb-uddO

, ,r , ,50I,, ,fiIP,[ua

oeug

ll.I... l...96. ..l.ll..l..l..is.01ft fnle o^llsneffi

l..l.I,..ll....l.....,l,l.i..l.....l.,ll$'..1.,l......'...i..li.s,lr$o.I*ouos$:

....l.:.il....l..l..l....,.I..0'l..l..,...'......l1tus*sHii..icilor'crslo

iiil:.::il.t:i::...i..ii::i..::ii.:..:.:.iii..i'.:'li:i':':ii:.gi0:::I::..:.:l:..:Iu'tffqs:lA

.....1..............l...l.....i;.1.B.6..;I;...'.,1..ri:l;....;.u.

diu.iii.i$Hiufiudi3,

i:]ii]:i :i:i:]ii i:i i]iiiiiii igioi:t: i],iii:lryiPiiB^

uo,lr5i1i$i

iii.ii. i. i€i.0.iii.. ii. i.

ii. .i. ii.e*nsBsru... .snonuJtuo 3 . ...ll, .' l. .. l,... .. l

'..'l

l .$l ..$...l. li...l.....ffi iPHfifi'i iiubiuoQ.

:1..l':l,:.l;l,tl:;..:.$

..i..j'.ii.ii.li.l....ffiifPixsil.ii5ffiisu6:5

48 ffi$uf**$3ilfl3.

(acpq5, paxg) pa,pue-posol3 lrlllqullar suro; ereu-relw

..li.tA:i'.'11l

'atfi..,l$dme'ffioffieluffi{.i,ii{qcuiffi1.$rabtices.,.and..ina,dv'ertent

.1.,tetls,.,u3,e..,., oSe.i'ftieieii.i$eabur'e.s..to.g,!ffier...,,.,.....

,,;,;tO,,,Seit-,,2,,,mof-.,.thOfCIUgh.....t1ranSfife,,.,,O.ti..i,r.,,;..:i;,;;.i.,...i:,...,....

.,,....ffiat,.concep ,..This.....ilracesd.,...i$,.:.called iiarh't,'-".,

.Gi i..H :F.'.#iiE..R,.i.i4:rr:.....ii