Sociology Quantitative Method Discussion

profileCookiemamma87
IndexesScalesandTypologies.pdf

AP Images

What You’ll Learn in This Chapter

Now we conclude the discussion of measurement begun in Chapter 5.

Researchers often need to employ multiple indicators to measure a

variable adequately and validly. Indexes, scales, and typologies are

useful composite measures made up of several indicators of variables.

In this chapter you’ll learn the logic and skills of constructing such

measures.

Indexes, Scales, and Typologies6

In this chapter . . .

Introduction

Indexes versus Scales

Index Construction Item Selection

Examination of Empirical Relationships

Index Scoring

Handling Missing Data

Index Validation

h e Status of Women: An Illustration of Index

Construction

Scale Construction Bogardus Social Distance Scale

h urstone Scales

Likert Scaling

Semantic Dif erential

Guttman Scaling

Typologies

INTRODUCTION

As we saw in Chapter 5, many social science concepts have complex and varied meanings. Making measurements that capture such con- cepts can be a challenge. Recall our discussion of content validity, which concerns whether we’ve captured all the dif erent dimensions of a concept.

To achieve broad coverage of the various di- mensions of a concept, we usually need to make multiple observations pertaining to it. h us, for example, Bruce Berg (1989:21) advises in-depth interviewers to prepare “essential questions,” which are “geared toward eliciting specii c, de- sired information.” In addition, the researcher should prepare extra questions: “questions roughly equivalent to certain essential ones, but worded slightly dif erently.”

Multiple indicators are used with quantita- tive data as well. h ough you can sometimes construct a single questionnaire item that cap- tures the variable of interest—“Sex: ❑ Male ❑ Female” is a simple example—other variables are less straightforward and may require you to use several questionnaire items to measure them adequately.

Often, data analysis aims at reducing a mass of observations to a more manageable form. Our use of concepts to stand for many similar

observations is one example. h e trick is to have the reduction represent the original observations well enough to be accurate and useful.

Sometimes this sort of reduction can be accomplished in the analysis of quantitative data. You could, for example, ask people to answer i ve dif erent questions, reduce each person’s answers to a single number, and then use that number to reproduce that per- son’s answers. So, if you told me that you had assigned someone a score of 3, I would be able to tell you how he or she answered each of the original i ve questions.

How in the world could such a little bit of information communicate so much?

See the “What do you think? Revisited” box toward the end of the chapter.

What do you think?

Quantitative data analysts have developed specii c techniques for combining indicators into a single measure. h is chapter discusses the construction of two types of composite mea- sures of variables—indexes and scales. Although scales and indexes can be used in any form of social research, they are most common in sur- vey research and other quantitative methods. A short section at the end of the chapter considers typologies, which are relevant to both qualitative and quantitative research.

Composite measures are frequently used in quantitative research, for several reasons. First, social scientists often wish to study variables that have no clear and unambiguous single indicators. Single indicators do sui ce for some variables, such as age. We can determine

?

168

E a

rl B

a b

b ie

INDEXES VERSUS SCALES 169

a survey respondent’s age simply by asking, “How old are you?” Similarly, we can determine a newspaper’s circulation merely by looking at the i gure the newspaper reports. In the case of complex concepts, however, researchers can seldom develop single indicators before they actually do the research. h is is especially true with regard to attitudes and orientations. Rarely can a survey researcher, for example, devise single questionnaire items that adequately tap respondents’ degrees of prejudice, religiosity, political orientations, alienation, and the like. More likely, the researcher will devise several items, each of which provides some indication of the variables. Taken individually, each of these items is likely to prove invalid or unreliable for many respondents. A composite measure, however, can overcome this problem.

Second, researchers may wish to employ a rather rei ned ordinal measure of a particular variable—alienation, say—arranging cases in sev- eral ordinal categories from very low to very high, for example. A single data item might not have enough categories to provide the desired range of variation. However, an index or a scale formed from several items can provide the needed range.

Finally, indexes and scales are ei cient devices for data analysis. If considering a single data item gives us only a rough indication of a given variable, considering several data items can give us a more comprehensive and more accurate indication. For example, a single newspaper editorial may give us some indication of the political orientation of that newspaper. Examining several editorials would probably give us a better assessment, but the ma- nipulation of several data items simultaneously could be very complicated. Indexes and scales (especially scales) are ei cient data-reduction de- vices: h ey allow us to summarize several indica- tors in a single numerical score, while sometimes nearly maintaining the specii c details of all the individual indicators.

INDEXES VERSUS SCALES

h e terms index and scale are commonly used im- precisely and interchangeably in social research literature. h ese two types of measures do have

some characteristics in common, but in this book we’ll distinguish between them. However, you should be warned of a growing tendency in the literature to use the term scale to refer to both in- dexes and scales, as they are distinguished here.

First, let’s consider what they have in common. Both scales and indexes are ordinal measures of variables. Both rank-order the units of analysis in terms of specii c variables such as religiosity, alienation, socioeconomic status, prejudice, or intellectual sophistication. A person’s score on ei- ther a scale or an index of religiosity, for example, indicates his or her relative religiosity vis-à-vis other people.

Further, both scales and indexes are compos- ite measures of variables: measurements based on more than one data item. h us, a survey re- spondent’s score on an index or scale of religi- osity is determined by the responses given to several questionnaire items, each of which pro- vides some indication of religiosity. Similarly, a person’s IQ score is based on answers to a large number of test questions. h e political orienta- tion of a newspaper might be represented by an index or scale score rel ecting the newspaper’s editorial policy on various political issues.

Despite these shared characteristics, distin- guishing between indexes and scales is useful. In this book we’ll do so through the manner in which scores are assigned. We construct an index sim- ply by accumulating scores assigned to individual indicators. We might measure prejudice, for exam- ple, by counting the number of prejudiced state- ments each respondent agreed with. We construct a scale, however, by assigning scores to patterns of responses, recognizing that some items rel ect a relatively weak degree of the variable whereas others rel ect something stronger. For example,

index A type of composite measure that summarizes and

rank-orders several specifi c observations and represents

some more general dimension.

scale A type of composite measure composed of several

items that have a logical or empirical structure among them.

Examples of scales include Bogardus social distance, Gutt-

man, Likert, and Thurstone scales.

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES170

taken. If you wrote to a public oi cial and signed a petition, you’d get a total of 2 points. If I gave money to a candidate and persuaded someone to change his or her vote, I’d get the same score as you. Using this approach, we’d conclude that you and I had the same degree of political activ- ism, even though we had taken dif erent actions.

h e second part of Figure 6-1 describes the logic of scale construction. In this case, the ac- tions clearly represent dif erent degrees of po- litical activism—ranging from simply voting to running for oi ce. Moreover, it seems safe to as- sume a pattern of actions in this case. For exam- ple, all those who contributed money probably also voted. h ose who worked on a campaign probably also gave some money and voted. h is suggests that most people will fall into only one of i ve idealized action patterns, represented by the number under each set of boxes in the i gure. h e discussion of scales, later in this chapter, de- scribes ways of identifying people with the type they most closely represent.

As you might surmise, scales are generally supe- rior to indexes, because scales take into consider- ation the intensity with which dif erent items rel ect the variable being measured. Also, as the example in Figure 6-1 shows, scale scores convey more in- formation than do index scores. Again, be aware that the term scale is commonly misused to refer to measures that are only indexes. Merely calling a given measure a scale instead of an index doesn’t make it better.

h ere are two other misconceptions about scaling that you should know. First, whether the combination of several data items results in a scale almost always depends on the particular sample of observations under study. Certain items may form a scale within one sample but not within another. For this reason, do not assume that a given set of items is a scale simply because it has turned out that way in an earlier study.

Second, the use of specii c scaling tech- niques—such as Guttman scaling, to be dis- cussed—does not ensure the creation of a scale. Rather, such techniques let us determine whether or not a set of items constitutes a scale.

agreeing that “Women are dif erent from men” is, at best, weak evidence of sexism compared with agreeing that “Women should not be allowed to vote.” A scale takes advantage of dif erences in in- tensity among the attributes of the same variable to identify distinct patterns of response.

Let’s consider this simple example of sexism a bit further. Imagine asking people to agree or disagree with the two statements just presented. Some might agree with both, some might dis- agree with both. But suppose I told you someone agreed with one and disagreed with the other: Could you guess which statement they agreed with and which they did not? I would guess the person in question agreed that women were dif- ferent but disagreed that they should be prohib- ited from voting. On the other hand, I doubt that anyone would want to prohibit women from vot- ing and assert that there is no dif erence between men and women. h at would make no sense.

Now consider this. h e two responses we wanted from each person would technically yield four response patterns: agree/agree, agree/ disagree, disagree/agree, and disagree/disagree. We’ve just seen, however, that only three of the four patterns make any sense or are likely to occur. Where indexes score people on the basis of their responses, scales score people on the basis of response patterns: We determine what the logical response patterns are and score people in terms of the pattern their responses most closely resemble.

Figure 6-1 illustrates the dif erence between indexes and scales. Let’s assume we want to develop a measure of political activism, dis- tinguishing those people who are very active in political af airs, those who don’t participate much at all, and those who are somewhere in between.

h e i rst part of Figure 6-1 illustrates the logic of indexes. h e i gure shows six dif erent politi- cal actions. Although you and I might disagree on some specii cs, I think we could agree that the six actions represent roughly the same degree of political activism.

Using these six items, we could construct an index of political activism by giving each person 1 point for each of the actions he or she has

INDEX CONSTRUCTION 171

FIGURE 6-1 Indexes versus Scales. Both indexes and scales seek to measure variables such as political activism.

Whereas indexes count the number of indicators of the variable, scales take account of the differing intensities of

those indicators.

Persuaded

someone to

change her or

his voting plans

Gave money

to a political

candidate

Gave money

to a

political cause

Signed a

political

petition

Wrote a letter

to a

public official

0

Voted No

Contributed money to

a political campaign No

Worked on a

political campaign No

Ran for office No

4

Yes

Yes

Yes

Yes

3

Yes

Yes

Yes

No

2

Yes

Yes

No

No

1

Yes

No

No

No

Scale-Construction Logic

Here are some political actions that represent very different degrees of activism: e.g., running for office represents a higher degree of activism than simply voting does. It seems likely, moreover, that anyone who has taken one of the more demanding actions would have taken all the easier ones as well.

To construct a scale of political activism, we might score people according to which of the following “ideal” patterns

comes closest to describing them.

Index-Construction Logic

Here are several types of political actions people

may have taken. By and large, the different actions

represent similar degrees of political activism.

To create an index of overall political activism, we

might give people 1 point for each of the actions

they’ve taken. Wrote a

political letter

to the editor

Constructing indexes is not simple, however. h e general failure to develop index-construc- tion techniques has resulted in many bad in- dexes in social research. With this in mind, I’ve devoted over half of this chapter to the methods of index construction. With a solid understand- ing of the logic of this activity, you’ll be better equipped to try constructing scales.

INDEX CONSTRUCTION

Let’s look now at four main steps in the con- struction of an index: selecting possible items,

An examination of actual social science research reports will show that researchers use indexes much more than scales. Ironically, however, the methodological literature con- tains little if any discussion of index construc- tion, whereas discussions of scale construction abound. h ere appear to be two reasons for this disparity. First, indexes are more frequently used because scales are often dii cult or im- possible to construct from the data at hand. Second, methods of index construction seem so obvious and straightforward that they aren’t discussed much.

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES172

and index construction; that is, a composite measure should represent only one dimension of a concept. h us, items rel ecting religiosity should not be included in a measure of politi- cal conservatism, even though the two variables might be empirically related.

General or Specifi c Although measures should tap the same dimension, the general dimension you’re attempting to measure may have many nu- ances. In the example of religiosity, the indicators mentioned previously—ritual participation, be- lief, and so on—represent dif erent types of religi- osity. If you want to focus on ritual participation in religion, you should choose items specii cally indicating this type of religiosity: attendance at religious services and other rituals such as con- fession, bar mitzvah, bowing toward Mecca, and the like. If you want to measure religiosity in a more general way, you should include a balanced set of items, representing each of the dif erent types of religiosity. Ultimately, the nature of the items included will determine how specii cally or generally the variable is measured.

Variance In selecting items for an index, you must also be concerned with the amount of vari- ance they provide. If an item is intended to in- dicate political conservatism, for example, you should note what proportion of respondents would be identii ed as conservatives by the item. If a given item identii ed no one as a conservative or everyone as a conservative—for example, if nobody indicated approval of a radical-right po- litical i gure—that item would not be very useful in the construction of an index.

To guarantee variance, you have two options. First, you can select several items that generate re- sponses that divide people about equally in terms of the variable; for example, about half conserva- tive and half liberal. Although no single response would justify characterizing a person as very con- servative, a person who responded as a conserva- tive on all items might be so characterized.

h e second option is to select items dif ering in variance. One item might identify about half

examining their empirical relationships, scoring the index, and validating it. We’ll conclude this discussion by examining the construction of an index that provided interesting i ndings about the status of women in dif erent countries.

Item Selection

h e i rst step in creating an index is selecting items for a composite index, which is created to measure some variable.

Face Validity h e i rst criterion for selecting items to be included in an index is face validity (or logical validity). If you want to measure politi- cal conservatism, for example, each of your items should appear on its face to indicate conservatism (or its opposite, liberalism). Political party ai lia- tion would be one such item. Another would be an item asking people to approve or disapprove of the views of a well-known conservative pub- lic i gure. In constructing an index of religiosity, you might consider items such as attendance of religious services, acceptance of certain religious beliefs, and frequency of prayer; each of these ap- pears to of er some indication of religiosity.

Unidimensionality h e methodological litera- ture on conceptualization and measurement stresses the need for unidimensionality in scale

Composite measures involve the combination of ele- ments to create something new. Sometimes it works, sometimes it doesn’t.

E a

rl B

a b

b ie

INDEX CONSTRUCTION 173

the subjects as conservative, while another might identify few of the respondents as conservatives. Note that this second option is necessary for scaling, and it’s reasonable for index construc- tion as well.

Examination of Empirical Relationships

h e second step in index construction is to exam- ine the empirical relationships among the items

being considered for inclusion. (See Chapter 14 for more.) An empirical relationship is established when respondents’ answers to one question—in a questionnaire, for example—help us predict how they will answer other questions. If two items are empirically related to each other, we can rea- sonably argue that each rel ects the same vari- able, and we can include them both in the same index. h ere are two types of possible relation- ships among items: bivariate and multivariate.

Text not available due to copyright restrictions

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES174

right to an abortion in the case of rape should be more likely to support it if the woman’s life is threatened than would those who disapproved of abortion in the case of rape. h is would be another example of a bivariate relationship between the two items.

To determine the relative strengths of relationships among the several pairs of items, you should examine all the possible bivariate relationships among the several items being considered for inclusion in an index. Percent- age tables or more-advanced statistical tech- niques may be used for this purpose. How we evaluate the strength of the relationships, however, can be rather subtle. The box “‘Cause’ and ‘Effect’ Indicators” examines some of these subtleties.

Be wary of items that are not related to one another empirically: It’s unlikely they measure the same variable. You should probably drop any item that’s not related to several other items.

At the same time, a very strong relationship be- tween two items presents a dif erent problem. If two items are perfectly related to each other, then only one needs to be included in the index; be- cause it completely conveys the indications pro- vided by the other, nothing more would be added by including the other item. (h is problem will become even clearer in a later section.)

Here’s an example to illustrate the testing of bivariate relationships in index construction. I once conducted a survey of medical school fac- ulty members to i nd out about the consequences of a “scientii c perspective” on the quality of pa- tient care provided by physicians. h e primary intent was to determine whether scientii cally inclined doctors treated patients more imper- sonally than did other doctors.

h e survey questionnaire of ered several possi- ble indicators of respondents’ scientii c perspec- tives. Of those, three items appeared to provide especially clear indications of whether the doc- tors were scientii cally oriented:

1. As a medical school faculty member, in what capacity do you feel you can make your

Bivariate Relationships among Items A biva- riate relationship is, simply put, a relationship between two variables. Suppose we want to mea- sure respondents’ support for U.S. participation in the United Nations. One indicator of dif erent levels of support might be the question “Do you feel the U.S. i nancial support of the UN is ❑ Too high ❑ About right ❑ Too low?”

A second indicator of support for the United Nations might be the question “Should the United States contribute military personnel to UN peacekeeping actions? ❑ Strongly approve ❑ Mostly approve ❑ Mostly disapprove ❑ Strongly disapprove.”

Both of these questions, on their face, seem to rel ect dif erent degrees of support for the United Nations. Nonetheless, some people might feel the United States should give more money but not provide troops. Others might favor sending troops but cutting back on i nancial support.

If the two items both rel ect degrees of the same thing, however, we should expect respons- es to the two items to generally correspond with one another. Specii cally, those who approve of military support should be more likely to favor i nancial support than would those who disap- prove of military support. Conversely, those who favor i nancial support should be more likely to favor military support than would those disapproving of i nancial support. If these ex- pectations are met, we say there is a bivariate relationship between the two items.

Here’s another example. Suppose we want to determine the degree to which respondents feel women have the right to an abortion. We might ask (1) “Do you feel a woman should have the right to an abortion when her pregnancy was the result of rape?” and (2) “Do you feel a woman should have the right to an abortion if continuing her pregnancy would seriously threaten her life?”

Some respondents might agree with item (1) and disagree with item (2); others will do just the reverse. If both items tap into some general opinion people have about the issue of abortion, then the responses to these two items should be related to each other. h ose who support the

INDEX CONSTRUCTION 175

because none of them is related to a set of criteria for what constitutes being a scientist in any abso- lute sense. Using the items for this purpose would present us with the problem of three quite dif er- ent estimates of how many scientists there were in the sample.

However, these items do provide us with three independent indicators of respondents’ relative inclinations toward science in medicine. Each item separates respondents into the more sci- entii c and the less scientii c. But each grouping of more or less scientii c respondents will have a somewhat dif erent membership from the oth- ers. Respondents who seem scientii c in terms of one item will not seem scientii c in terms of an- other. Nevertheless, to the extent that each item measures the same general dimension, we should i nd some correspondence among the several groupings. Respondents who appear scientii c in terms of one item should be more likely to ap- pear scientii c in their response to another item than do those who appear nonscientii c in their response to the i rst. In other words, we should i nd an association or correlation between the responses given to two items.

Figure 6-2 shows the associations among the responses to the three items. h ree bivariate ta- bles are presented, showing the distribution of re- sponses for each pair of items. An examination of the three bivariate relationships presented in the i gure supports the suggestion that the three items all measure the same variable: scientii c orienta- tion. To see why this is so, let’s begin by looking at the i rst bivariate relationship in the table. h e table shows that faculty who responded that “re- searcher” was the role in which they could make their greatest teaching contribution were more likely to identify their ultimate medical interests as “basic mechanisms” (87 percent) than were those who answered “physician” (51 percent). h e fact that the “physicians” are about evenly split in their ultimate medical interests is irrelevant for our purposes. It’s only relevant that they are less sci- entii c in their medical interests than are the “re- searchers.” h e strength of this relationship can be summarized as a 36 percentage-point dif erence.

greatest teaching contribution: as a practic- ing physician or as a medical researcher?

2. As you continue to advance your own medical knowledge, would you say your ultimate medi- cal interests lie primarily in the direction of total patient management or the understand- ing of basic mechanisms? [h e purpose of this item was to distinguish those who were mostly interested in overall patient care from those mostly interested in biological processes.]

3. In the i eld of therapeutic research, are you generally more interested in articles re- porting evaluations of the ef ectiveness of various treatments or articles exploring the basic rationale underlying the treatments? [Similarly, I wanted to distinguish those more interested in articles dealing with patient care from those more interested in biological processes.] (Babbie 1970:27–31)

For each of these items, we might conclude that those respondents who chose the second answer are more scientii cally oriented than respondents who chose the i rst answer. h ough this compara- tive conclusion is reasonable, we should not be mis- led into thinking that respondents who chose the second answer to a given item are scientists in any absolute sense. h ey are simply more scientii cally oriented than those who chose the i rst answer to the item.

To see this point more clearly, let’s examine the distribution of responses to each item. From the i rst item—greatest teaching contribution— only about one-third of the respondents ap- peared scientii cally oriented. h at is, a little over one-third said they could make their greatest teaching contribution as medical researchers. In response to the second item—ultimate medical interests—approximately two-thirds chose the scientii c answer, saying they were more inter- ested in learning about basic mechanisms than learning about total patient management. In re- sponse to the third item—reading preferences— about 80 percent chose the scientii c answer.

h ese three questionnaire items can’t tell us how many “scientists” there are in the sample,

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES176

ultimate medical interests can be summarized as a 38 percentage-point dif erence, and the strength of the relationship between reading pre- ferences and the two teaching contributions as a 21 percentage-point dif erence. In summary, then, each single item produces a dif erent grouping of “scientii c” and “nonscientii c” respondents. However, the responses given to each of the items correspond, to a greater or lesser degree, to the responses given to each of the other items.

Initially, the three items were selected on the basis of face validity—each appeared to give some indication of faculty members’ orientations to science. By examining the bivariate relation- ship between the pairs of items, we have found support for the expectation that they all measure basically the same thing. However, that support does not sui ciently justify including the items in a composite index. Before combining them in a single index, we need to examine the multivari- ate relationships among the several variables.

Multivariate Relationships among Items Where- as a bivariate relationship deals with two variables at a time, a multivariate relationship uses more than two variables. To present the trivariate relation- ships among the three variables in our example, we would i rst categorize the sample medical school respondents into four groups according to (1) their greatest teaching contribution and (2) their reading preferences. Figure 6-3 does just that. h e numbers in parentheses indicate the number of respondents in each group. h us, 66 of the faculty members who said they could best teach as physi- cians also said they preferred articles dealing with the ef ectiveness of treatments. h en, for each of the four groups, we would determine the percent- age of those who say they are ultimately more inter- ested in basic mechanisms. So, for example, of the 66 faculty mentioned, 27 percent are primarily in- terested in basic mechanisms, as the i gure shows.

h e arrangement of the four groups is based on a previously drawn conclusion regarding sci- entii c orientations. h e group in the upper left corner of the table is presumably the least sci- entii cally oriented, based on greatest teaching

FIGURE 6-2 Bivariate Relationships among

Scientifi c Orientation Items. If several indicators

are measures of the same variable, then they should

be empirically correlated with one another, as you can

observe in this case. Those who choose the scientifi c

orientation on one item are more likely to choose the

scientifi c orientation on another items.

Greatest Teaching Contribution

Physician Researcher

49%

51%

13%

87%

100% (268)

100% (159)

Basic mechanisms

Total patient management

U lt

im a te

M e d

ic a l In

te re

s t

a.

Reading Preferences

Effectiveness Rationale

68%

32%

30%

70%

100% (78)

100% (349)

Basic mechanisms

Total patient management

U lt

im a te

M e d

ic a l In

te re

s t

b.

Reading Preferences

Effectiveness Rationale

85%

15%

64%

36%

100% (78)

100% (349)

Researcher

Physician

G re

a te

s t

T e a c h

in g

C o

n tr

ib u

ti o

n

c.

h e same general conclusion applies to the other bivariate relationships. h e strength of the relationship between reading preferences and

INDEX CONSTRUCTION 177

h e same is true among those most interested in articles dealing with the rationale for treatments (89 percent minus 58 percent: second row). h e original relationship between teaching con- tribution and ultimate medical interest is es- sentially the same as in Figure 6-2, even among those respondents judged as scientii c or non- scientii c in terms of reading preferences.

We can draw the same conclusion from the columns in Figure 6-3. Recall that the original relationship between reading preferences and ultimate medical interests was summarized as a 38 percentage-point dif erence. Looking only at the “physicians” in Figure 6-3, we see that the relationship between the other two items is now 31 percentage points. h e same relationship is found among the “researchers” in the second column.

h e importance of these observations be- comes clearer when we consider what might have happened. In Figure 6-4, hypothetical data tell a much dif erent story than do the actual data in Figure 6-3. As you can see, Figure 6-4 shows that the original relationship between teaching contribution and ultimate medical in- terest persists, even when reading preferences are introduced into the picture. In each row of the table, the “researchers” are more likely to ex- press an interest in basic mechanisms than are the “physicians.” Looking down the columns, however, we note that there is no relationship between reading preferences and ultimate medi- cal interest. If we know whether a respondent feels he or she can best teach as a physician or as a researcher, knowing the respondent’s read- ing preference adds nothing to our evaluation of his or her scientii c orientation. If something like Figure 6-4 resulted from the actual data, we would conclude that reading preference should not be included in the same index as teaching contribution, because it contributed nothing to the composite index.

h is example used only three questionnaire items. If more were being considered, then more- complex multivariate tables would be in order, constructed of four, i ve, or more variables. h e

contribution and reading preference. h e group in the lower right corner is presumably the most scientii cally oriented in terms of those items.

Recall that expressing a primary interest in basic mechanisms was also taken as an indica- tion of scientii c orientation. As we should ex- pect, then, those in the lower right corner are the most likely to give this response (89 percent), and those in the upper left corner are the least likely (27 percent). h e respondents who gave mixed responses in terms of teaching contribu- tions and reading preferences have an interme- diate rank in their concern for basic mechanisms (58 percent in both cases).

h is i gure tells us many things. First, we may note that the original relationships between pairs of items are not signii cantly af ected by the presence of a third item. Recall, for example, that the relationship between teaching contribution and ultimate medical interest was summarized as a 36 percentage-point dif erence. Looking at Figure 6-3, we see that among only those respondents who are most interested in articles dealing with the ef ectiveness of treatments, the relationship between teaching contribution and ultimate medical interest is 31 percentage points (58 percent minus 27 percent: i rst row).

FIGURE 6-3 Trivariate Relationships among

Scientifi c Orientation Items. Indicators of the same

variable should be correlated in a multivariate analysis

as well as in bivariate analyses. Those who chose the

scientifi c responses on greatest teaching contribution

and reading preferences are the most likely to choose

the scientifi c response on the third item.

Percent Interested in Basic Mechanisms

Greatest Teaching Contribution

Physician Researcher

27% (66)

58% (219)

58% (12)

89% (130)

Rationale behind treatments

Effectiveness of treatments

R e a d

in g

P re

fe re

n c

e s

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES178

each point in the index. You’ll be forced to reach some kind of compromise between these con- l icting desires.

h e second decision concerns the actual as- signment of scores for each particular response. Basically you must decide whether to give each item in the index equal weight or dif erent weights. Although there are no i rm rules, I suggest—and practice tends to support this method—that items be weighted equally unless there are compelling reasons for dif erential weighting. h at is, the burden of proof should be on dif erential weighting; equal weighting should be the norm.

Of course, this decision must be related to the earlier issue regarding the balance of items chosen. If the index is to represent the composite of slightly dif erent aspects of a given variable, then you should give each aspect the same weight. In some instances, however, you may feel that, say, two items rel ect essentially the same aspect, and the third rel ects a dif erent aspect. If you wished to have both aspects equally represented by the index, you might decide to give the dif erent item a weight equal to the combination of the two similar ones. In such a situation, you might want to assign a maximum score of 2 to the dif erent item and a maximum score of 1 to each of the similar ones.

Although the rationale for scoring responses should take such concerns into account, you’ll typically experiment with dif erent scoring methods, examining the relative weights given to dif erent aspects but at the same time wor- rying about the range and distribution of cases provided. Ultimately, the scoring method cho- sen will represent a compromise among these several demands. Of course, as in most research activities, such a decision is open to revision on the basis of later examinations. Validation of the index, to be discussed shortly, may lead you to recycle your ef orts toward constructing a com- pletely dif erent index.

In the example taken from the medical school faculty survey, I decided to weight the items equally, because I’d chosen them, in part, be- cause they represented slightly dif erent aspects of the overall variable scientii c orientation. On

purpose of this step in index construction, again, is to discover the simultaneous interaction of the items in order to determine which should be included in the same index.

Index Scoring

When you’ve chosen the best items for the index, you next assign scores for particular responses, thereby creating a single composite index out of the several items. h ere are two basic decisions to be made in this step.

First, you must decide the desirable range of the index scores. Certainly a primary advantage of an index over a single item is the range of gra- dations it of ers in the measurement of a variable. As noted earlier, political conservatism might be measured from “very conservative” to “not at all conservative” (or “very liberal”). How far to the extremes, then, should the index extend?

In this decision, the question of variance en- ters once more. Almost always, as the possible extremes of an index are extended, fewer cases are to be found at each end. h e researcher who wishes to measure political conservatism to its greatest extreme may i nd there is almost no one in that category.

h e i rst decision, then, concerns the conl ict- ing desire for (1) a range of measurement in the index and (2) an adequate number of cases at

FIGURE 6-4 Hypothetical Trivariate Relationship

among Scientifi c Orientation Items. This hypothetical

relationship would suggest that not all three indicators

would contribute effectively to a composite index.

Percent Interested in Basic Mechanisms

Greatest Teaching Contribution

Physician Researcher

51% (66)

51% (219)

87% (12)

87% (130)

Rationale behind treatments

Effectiveness of treatments

R e a d

in g

P re

fe re

n c

e s

INDEX CONSTRUCTION 179

may be independent of one another, though they contribute to the same variable:

Family Stress is a scale of stressful events within

the family. h e experience of any one of these

events—parent job loss, parent separation, par-

ent illness—is independent of the other events.

Indeed, prior research on events utilized in stress

scales has demonstrated that the events in these

scales typically are independent of one another

and reliabilities on the scales [are] low. (2005:176)

If the indicators of a variable are logically related to one another, on the other hand, that relationship should be used as a criterion for determining which are the better indicators.

Handling Missing Data

Regardless of your data-collection method, you’ll frequently face the problem of missing data. In a content analysis of the political orientations of newspapers, for example, you may discover that a particular newspaper has never taken an editorial position on one of the issues being stud- ied. In an experimental design involving several retests of subjects over time, some subjects may be unable to participate in some of the sessions. In virtually every survey, some respondents fail to answer some questions (or choose a “don’t know” response). Although missing data pres- ent problems at all stages of analysis, they’re especially troublesome in index construction. However, several methods for dealing with these problems exist.

First, if there are relatively few cases with missing data, you may decide to exclude them from the construction of the index and the analy- sis. (I did this in the medical school faculty exam- ple.) h e primary concerns in this instance are whether the numbers available for analysis will remain sui cient and whether the exclusion will result in a biased sample whenever the index is used in the analysis. h e latter possibility can be examined through a comparison—on other rele- vant variables—of those who would be included in or excluded from the index.

each of the items, the respondents were given a score of 1 for choosing the “scientii c” response to the item and a score of 0 for choosing the “nonscientii c” response. Each respondent, then, could receive a score of 0, 1, 2, or 3. h is scoring method provided what was considered a useful range of variation—four index categories—and also provided enough cases for analysis in each category.

Here’s a similar example of index scoring, from a study of work satisfaction. One of the key vari- ables was job-related depression, measured by an index composed of the following four items, which asked workers how they felt when think- ing about themselves and their jobs:

“I feel downhearted and blue.”• “I get tired for no reason.”• “I i nd myself restless and can’t keep still.”• “I am more irritable than usual.”• h e researchers, Amy Wharton and James

Baron, report, “Each of these items was coded: 4 = often, 3 = sometimes, 2 = rarely, 1 = never” (1987:578). h ey go on to explain how they mea- sured other variables examined in the study:

Job-related self-esteem was based on four items

asking respondents how they saw themselves in

their work: happy/sad; successful/not success-

ful; important/not important; doing their best/

not doing their best. Each item ranged from 1 to

7, where 1 indicates a self-perception of not being

happy, successful, important, or doing one’s best.

(1987:578)

As you look through the social research litera- ture, you’ll i nd numerous similar examples of cumulative indexes being used to measure vari- ables. Sometimes the indexing procedures are controversial, as evidenced in “What Is the Best College in the United States?”

Although it’s often appropriate to examine the relationships among indicators of a variable being measured by an index or scale, you should realize that the indicators are sometimes inde- pendent of one another. For example, Stacy De Coster notes that the indicators of family stress

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES180

constructing a measure of political conservatism, for example, you may discover that respondents who failed to answer a given question were gen- erally as conservative on other items as were those who gave the conservative answer. As another example, a recent study measuring religious beliefs found that people who answered “don’t know” about a given belief were almost identical to the “disbelievers” in their answers about other beliefs. (Note: You should take these examples only as suggesting general ways to analyze your own data—not as empirical guides.) Whenever the analysis of missing data yields such

Second, you may sometimes have grounds for treating missing data as one of the available responses. For example, if a questionnaire has asked respondents to indicate their participa- tion in various activities by checking “yes” or “no” for each, many respondents may have checked some of the activities “yes” and left the remain- der blank. In such a case, you might decide that a failure to answer meant “no,” and score missing data in this case as though the respondents had checked the “no” space.

h ird, a careful analysis of missing data may yield an interpretation of their meaning. In

Each year U.S. News and World Report issues a special report ranking the nation’s colleg- es and universities. Their rankings reflect an index, created from several items: edu- cational expenditures per student, gradu- ation rates, selectivity (percent accepted of those applying), average SAT scores of first-year students, and similar indicators of quality.

Typically, Harvard is ranked the num- ber one school in the nation, followed by Yale and Princeton. However, the 1999 “America’s Best Colleges” issue shocked educators, prospective college students, and their parents. The California Institute of Technology had leaped from ninth place in 1998 to first place a year later. Although Harvard, Yale, and Princeton still did well, they had been supplanted. What had happened at Caltech to produce such a remarkable surge in quality?

h e answer was to be found at U.S. News and World Report, not at Caltech. h e news- magazine changed the structure of the ranking

index in 1999, which made a big dif erence in how schools fared.

Bruce Gottlieb (1999) gives this example of how the altered scoring made a dif erence.

So, how did Caltech come out on top? Well,

one variable in a school’s ranking has long

been educational expenditures per student,

and Caltech has traditionally been tops in this

category. But until this year, U.S. News consid-

ered only a school’s ranking in this category—

i rst, second, etc.—rather than how much

it spent relative to other schools. It didn’t

matter whether Caltech beat Harvard by $1

or by $100,000. Two other schools that rose

in their rankings this year were MIT ( from

fourth to third) and Johns Hopkins ( from 14th

to seventh). All three have high per-student

expenditures and all three are especially

strong in the hard sciences. Universities are al-

lowed to count their research budgets in their

per-student expenditures, though students

get no direct benei t from costly research their

professors are doing outside of class.

ISSUES AND INSIGHTS

What Is the Best College in the United States?

INDEX CONSTRUCTION 181

“purity” of your index and reduce the likelihood that it will relate to other variables in ways you may have hypothesized.

If you’re creating an index out of several items, you can sometimes handle missing data by using proportions based on what is observed. Suppose your index is composed of six indicators, and you have only four observations for a particular sub- ject. If the subject has earned 4 points out of a possible 4, you might assign an index score of 6; if the subject has 2 points (half the possible score on four items), you could assign a score of 3 (half the possible score on six observations).

interpretations, then, you may decide to score such cases accordingly.

h ere are many other ways of handling the problem of missing data. If an item has several possible values, you might assign the middle value to cases with missing data; for example, you could assign a 2 if the values are 0, 1, 2, 3, and 4. For a continuous variable such as age, you could similarly assign the mean to cases with missing data (more on this in Chapter 14). Or, you can supply missing data by assigning values at random. All of these are conservative solutions, because any such changes weaken the

In its “best colleges” issue two years ago,

U.S. News made precisely this point, saying it

considered only the rank ordering of per-

student expenditures, rather than the actual

amounts, on the grounds that expenditures

at institutions with large research programs

and medical schools are substantially higher

than those at the rest of the schools in the

category. In other words, just two years ago,

the magazine felt it unfair to give Caltech,

MIT, and Johns Hopkins credit for having

lots of fancy laboratories that don’t

actually improve undergraduate

education.

Gottlieb reviewed each of the changes in the index and then asked how 1998’s ninth-ranked Caltech would have done had the revised in- dexing formula been in place a year earlier. His conclusion: Caltech would have been i rst in 1998 as well. In other words, the apparent improvement was solely a function of how the index was scored.

For a very dif erent ranking of colleges and universities, you might be interested in the

“Webometrics Ranking” (www.webometrics .info/), which focuses on schools’ presence on the web. h is website details the items in- cluded in the index, as well as how they are combined to produce an overall ranking of the world’s institutions of higher education. As of January 2008, MIT was the top-ranked Ameri- can university, but you’ll have to examine the methodological description to know what that means.

Composite measures such as scales and indexes are valuable tools for understanding society. However, it’s important that we know how those measures are constructed and what that construction implies.

So, what’s really the best college in the United States? It depends on how you dei ne “best.” h ere is no “really best,” only the various social constructions we can create.

Sources: “America’s Best Colleges,” U.S. News and

World Report, August 30, 1999; Bruce Gottlieb, “Cook-

ing the School Books: How U.S. News Cheats in Picking

Its ‘Best American Colleges,’ ” Slate, August 31, 1999,

http://www.slate.com/crapshoot/99-08-31/crapshoot

.asp.

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES182

h e choice of a particular method to be used depends so much on the research situation that I can’t reasonably suggest a single “best” method or rank the several I’ve described. Ex- cluding all cases with missing data can bias the representativeness of the i ndings, but including such cases by assigning scores to missing data can inl uence the nature of the i ndings. h e safest and best method is to con- struct the index using alternative methods and see whether the same i ndings follow from each. Understanding your data is the i nal goal of analysis anyway.

Now that we’ve covered several aspects of in- dex construction, see the box “How Healthy Is Your State?” for more on choosing indicators and scoring items.

Index Validation

Up to this point, we’ve discussed all the steps in the selection and scoring of items that result in a composite index purporting to measure some variable. If each of the preceding steps is carried out carefully, the likelihood of the index actually measuring the variable is enhanced. To demon- strate success, however, we need to validate the index. Following the basic logic of validation, we assume that the index provides a measure of some variable; that is, the scores on the index arrange cases in a rank order in terms of that variable. An index of political conservatism rank- orders people in terms of their relative conser- vatism. If the index does that successfully, then people scored as relatively conservative on the

ISSUES AND INSIGHTS

How Healthy Is Your State?

Since 1990, United Health Foundation, the American Public Health Association, and Par- tnership for Prevention have collaborated on an annual evaluation of the health status of each of the 50 states. Table 6-1, “2008 Over- all Rankings,” shows the results of their 2008 research. h e scores indicate where each state stands in comparison to the nation as a whole. h e healthiest state in 2008, Vermont, was 24.8 percent healthier than the national average. You may be interested in seeing how your state ranks.

Since you are, by now, a critical consumer of social research, I can hear you asking, “Wait a minute, how did they measure healthy ? ” Good question. Table 6-2, “Weight of Indi- vidual Measures,” provides a summary of the components of their dei nition of what consti- tutes good or bad health. You’ll see that they’ve included indicators in a variety of categories.

Some represent positive indications (as in high school graduation rates) and some are negative indicators (as in smoking and binge drinking). Moreover, Table 6-2 indicates the weight assigned to each indicator in the con- struction of each state’s overall score.

Review each indicator and see whether you agree that it rel ects how healthy states are. Perhaps you can think of other indicators that might have been used.

h e full report provides a wealth of thoughtful discussion on why each of these indicators was chosen. Check it out at www . americashealthrankings.org/2008.

Source: h e United Health Foundation, American

Public Health Association, and Partnership for Pre-

vention, America’s Health Rankings: A Call to Action for

Individuals and h eir Communities, pp. 8, 32. http://

www.americashealthrankings.org/2008/. ©2008 United

Health Foundation.

INDEX CONSTRUCTION 183

TABLE 6.1 2008 Overall Rankings

Alphabetical by State Rank Order

Rank State Score* Rank State Score*

40 Alabama –7.0 1 Vermont 24.8 30 Alaska 1.3 2 Hawaii 21.6 33 Arizona 0.4 3 New Hampshire 19.9 43 Arkansas –8.1 4 Minnesota 18.8 24 California 5.3 5 Utah 18.2 19 Colorado 9.7 6 Massachusetts 17.7 7 Connecticut 17.5 7 Connecticut 17.5 35 Delaware –1.6 8 Idaho 16.1 45 Florida –8.9 9 Maine 15.3 41 Georgia –7.8 10 Washington 14.9 2 Hawaii 21.6 11 Rhode Island 14.0 8 Idaho 16.1 12 North Dakota 12.5 31 Illinois 0.8 13 Nebraska 12.0 34 Indiana –0.6 14 Wyoming 11.8 15 Iowa 11.6 15 Iowa 11.6 22 Kansas 6.7 16 Oregon 11.3 37 Kentucky –3.6 17 Wisconsin 10.3 50 Louisiana –15.2 18 New Jersey 9.8 9 Maine 15.3 19 Colorado 9.7 26 Maryland 3.4 20 Virginia 9.0 6 Massachusetts 17.7 21 South Dakota 7.5 27 Michigan 2.0 22 Kansas 6.7 4 Minnesota 18.8 23 Montana 6.5 49 Mississippi –15.0 24 California 5.3 38 Missouri –4.9 25 New York 3.8 23 Montana 6.5 26 Maryland 3.4 13 Nebraska 12.0 27 Michigan 2.0 42 Nevada –7.9 27 Pennsylvania 2.0 3 New Hampshire 19.9 29 New Mexico 1.7 18 New Jersey 9.8 30 Alaska 1.3 29 New Mexico 1.7 31 Illinois 0.8 25 New York 3.8 32 Ohio 0.7 36 North Carolina –3.2 33 Arizona 0.4 12 North Dakota 12.5 34 Indiana –0.6 32 Ohio 0.7 35 Delaware –1.6 43 Oklahoma –8.1 36 North Carolina –3.2 16 Oregon 11.3 37 Kentucky –3.6 27 Pennsylvania 2.0 38 Missouri –4.9 11 Rhode Island 14.0 39 West Virginia –5.0 48 South Carolina –10.7 40 Alabama –7.0 21 South Dakota 7.5 41 Georgia –7.8 47 Tennessee –9.7 42 Nevada –7.9 46 Texas –9.0 43 Arkansas –8.1 5 Utah 18.2 43 Oklahoma –8.1 1 Vermont 24.8 45 Florida –8.9 20 Virginia 9.0 46 Texas –9.0 10 Washington 14.9 47 Tennessee –9.7 39 West Virginia –5.0 48 South Carolina –10.7 17 Wisconsin 10.3 49 Mississippi –15.0 14 Wyoming 11.8 50 Louisiana –15.2

*Scores presented in this table indicate the percentage a state is above or below the national norm.

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES184

TABLE 6.2 Weight of Individual Measures

Name of Measure Percentage of Total Effect on

Score

Determinants

Personal Behaviors

Prevalence of Smoking 10.0 Negative

Prevalence of Binge Drinking 5.0 Negative

Prevalence of Obesity 5.0 Negative

Community and Environment

High School Graduation 5.0 Positive

Violent Crime 5.0 Negative

Occupational Fatalities 2.5 Negative

Infectious Disease 5.0 Negative

Children in Poverty 5.0 Negative

Air Pollution 5.0 Negative

Public and Health Policies

Lack of Health Insurance 5.0 Negative

Public Health Funding 2.5 Positive

Immunization Coverage 5.0 Positive

Clinical Care

Adequacy of Prenatal Care 5.0 Positive

Primary Care Physicians 5.0 Positive

Preventable Hospitalizations 5.0 Negative

Health Outcomes

Poor Mental Health Days 2.5 Negative

Poor Physical Health Days 2.5 Negative

Geographic Disparity 5.0 Negative

Infant Mortality 5.0 Negative

Cardiovascular Deaths 2.5 Negative

Cancer Deaths 2.5 Negative

Premature Death 5.0 Negative

Overall Health Ranking 100.0 —

INDEX CONSTRUCTION 185

token, all the 0’s had to answer this item with “total patient management.” h us, 0 percent of those respondents said “basic mechanisms.” Here’s how the table looks with the information we already know.

Index of Scientifi c Orientations

0 1 2 3

Percent who said they were more interested in basic mechanisms 0 ?? ?? 100

If the individual item is a good rel ection of the overall index, we should expect the 1’s and 2’s to i ll in a progression between 0 percent and 100 percent. More of the 2’s should choose “basic mechanisms” than 1’s. h is is not guaranteed by the way the index was constructed, however; it is an empirical question—one we answer in an item analysis. Here’s how this particular item analysis turned out.

Index of Scientifi c Orientations

0 1 2 3

Percent who said they were more interested in basic mechanisms 0 16 91 100

As you can see, in accord with our assumption that the 2’s are more scientii cally oriented than the 1’s, we i nd that a higher percentage of the 2’s (91 percent) than the 1’s (16 percent) say “basic mechanisms.”

index should appear relatively conservative in all other indications of political orientation, such as their responses to other questionnaire items. h ere are several methods of validating an index.

Item Analysis h e i rst step in index validation is an internal validation called item analysis. In item analysis, you examine the extent to which the composite index is related to (or predicts responses to) the individual items it comprises. Here’s an illustration of this step.

In the index of scientii c orientations among medical school faculty, for example, index scores ranged from 0 (most interested in pa- tient care) to 3 (most interested in research). Now let’s consider one of the items in the index: whether respondents wanted to advance their own knowledge more with regard to total pa- tient management or more in the area of basic mechanisms. h e latter were treated as being more scientii cally oriented than the former. h e following empty table shows how we would ex- amine the relationship between the index and the individual item.

Index of Scientifi c Orientations

0 1 2 3

Percent who said they were more interested in basic mechanisms 0 ?? ?? ??

If you take a minute to rel ect on the table, you may see that we already know the numbers that go in two of the cells. To get a score of 3 on the index, respondents had to say “basic mechanisms” in response to this question and give the “scientii c” answers to the other two items as well. h us, 100 percent of the 3’s on the index said “basic mechanisms.” By the same

item analysis An assessment of whether each of the items

included in a composite measure makes an independent

contribution or merely duplicates the contribution of other

items in the measure.

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES186

as their responses to other items in a question- naire. Of course, we’re talking about relative conservatism, because we can’t make an abso- lute dei nition of what constitutes conserva- tism. However, those respondents scored as the most conservative on the index should be the most conservative in answering other questions. h ose scored as the least conservative on the index should be the least conservative on other items. Indeed, the ranking of groups of respon- dents on the index should predict the ranking of those groups in answering other questions dealing with political orientations.

In our example of the scientii c orienta- tion index, several questions in the question- naire of ered the possibility of such external validation. Table 6-3 presents some of these items, which provide several lessons regarding index validation. First, we note that the index strongly predicts the responses to the validating items in the sense that the rank order of scientii c responses among the four groups is the same as the rank order provided by the index itself. h at is, the percentages rel ect greater scientii c orientation as you read across the rows of the table. At the same time, each item gives a dif erent description of scientii c orientations overall. For example, the last validating item indicates that the great majority of all faculty were engaged in research during the preceding year. If this were the only indicator of scientii c orientation, we would conclude that nearly all faculty were scientii c. Nevertheless, those scored as more scientii c on the index are more likely to have engaged in research than are those who were scored as relatively less scientii c. h e third validating item provides a dif erent descriptive picture: Only a minority of the faculty overall say they would prefer duties limited exclusively to research. (Only among those scored 3 on the index do a majority agree with that statement.) Nevertheless, the percentages giving this answer correspond to the scores assigned on the index.

An item analysis of the other two components of the index yields similar results, as follows.

Index of Scientifi c Orientations

0 1 2 3

Percent who said they could teach best as medical researchers 0 4 14 100

Percent who said they preferred reading about rationales 0 80 97 100

Each of the items, then, seems an appropriate component in the index. Each seems to rel ect the same quality that the index as a whole measures.

In a complex index containing many items, this step provides a convenient test of the inde- pendent contribution of each item to the index. If a given item is found to be poorly related to the index, it may be assumed that other items in the index cancel out the contribution of that item, and it should be excluded from the index. In other words, if the item in question contributes nothing to the index’s power, it should be excluded.

Although item analysis is an important i rst test of the index’s validity, it is scarcely sui cient. If the index adequately measures a given vari- able, it should successfully predict other indica- tions of that variable. To test this, we must turn to items not included in the index.

External Validation People scored as politi- cally conservative on an index should appear conservative by other measures as well, such

external validation The process of testing the validity of

a measure, such as an index or scale, by examining its rela-

tionship to other, presumed indicators of the same variable.

If the index really measures prejudice, for example, it should

correlate with other indicators of prejudice.

INDEX CONSTRUCTION 187

validating items are insui cient. One way is to examine the relationships between the validat- ing items and the individual items included in the index. If you discover that some of the index items relate to the validators and others do not, you’ll have your understanding of the index as it was initially constituted.

h ere is no cookbook solution to this di- lemma; it is an agony serious researchers must learn to survive. Ultimately, the wisdom of your decision to accept an index will be determined by the usefulness of that index in your later analyses. Perhaps you’ll initially decide that the index is a good one and that the validators are defective, but you’ll later i nd that the variable in question (as measured by the index) is not related to other variables in the ways you ex- pected. You may then have to compose a new index.

The Status of Women: An Illustration of Index Construction

For the most part, I’ve talked about index con- struction in the context of survey research, but other types of research also lend themselves to this kind of composite measure. For example, when the United Nations (1995) set about ex- amining the status of women in the world, they chose to create two indexes, rel ecting two dif er- ent dimensions.

h e Gender-related Development Index (GDI) compared women with men in terms of three indicators: life expectancy, education, and income. h ese indicators are commonly used in monitoring the status of women in the world. h e Scandinavian countries of Norway, Sweden, Finland, and Denmark ranked highest on this measure.

h e second index, the Gender Empowerment Measure (GEM), aimed more at power issues and comprised three dif erent indicators:

h e proportion of parliamentary seats held by • women

Bad Index versus Bad Validators Nearly every index constructor at some time must face the apparent failure of external items to validate the index. If the internal item analysis shows incon- sistent relationships between the items included in the index and the index itself, something is wrong with the index. But if the index fails to predict strongly the external validation items, the conclusion to be drawn is more ambiguous. You must choose between two possibilities: (1) the index does not adequately measure the vari- able in question, or (2) the validation items do not adequately measure the variable and thereby do not provide a sui cient test of the index.

Having worked long and conscientiously on the construction of an index, you’ll likely i nd the second conclusion compelling. Typically, you’ll feel you have included the best indicators of the variable in the index; the validating items are, therefore, second-rate indicators. Nevertheless, you should recognize that the index is purport- edly a very powerful measure of the variable; thus, it should be somewhat related to any item that taps the variable even poorly.

When external validation fails, you should reexamine the index before deciding that the

TABLE 6-3 Validation of Scientifi c Orientation Index

Index of Scientifi c Orientations

Low High

0 1 2 3

Percent interested in attending scientifi c lectures at the medical school 34 42 46 65

Percent who say faculty members should have experience as medical researchers 43 60 65 89

Percent who would prefer faculty duties involving research activities only 0 8 32 66

Percent who engaged in research during the preceding academic year 61 76 94 99

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES188

in terms of income, education, and life expec- tancy, they were still denied access to power. And whereas the GDI scores were higher in the wealthier nations than in the poorer ones, GEM scores showed that women’s empowerment did not seem to depend on national wealth, with many poor, developing countries outpac- ing some rich, industrial ones in regard to such empowerment.

By examining several dif erent dimensions of the variables involved in their study, the UN re- searchers also uncovered an aspect of women’s earnings that generally goes unnoticed. Popu- lation Communications International (1996:1) summarizes the i nding nicely:

h e proportion of administrative, managerial, • professional, and technical positions held by women A measure of access to jobs and wages•

Once again, the Scandinavian countries ranked high but were joined by Canada, New Zealand, the Netherlands, the United States, and Austria. Having two dif erent measures of gender equality allowed the researchers to make more-sophisticated distinctions. For example, in several countries, most notably Greece, France, and Japan, women fared rela- tively well on the GDI but quite poorly on the GEM; thus, although they were doing fairly well

ISSUES AND INSIGHTS

Indexing the World

If you browse the web in search of indexes, you’ll be handsomely rewarded. Here are just a few examples of the ways in which people have used the logic of social indexes to moni- tor the state of the world or large portions of it.

h e well-being of nations is commonly mea- sured in economic terms, such as the gross domestic product (GDP) per capita, average income, or stock market averages. In 1972, however, the mountainous kingdom of Bhutan drew global attention by proposing an index of “gross national happiness,” augmenting eco- nomic factors, with measures of physical and mental health, freedom, environment, marital stability, and other indicators of noneconomic well- being. h e World Database of Happiness expands this general idea to 24 countries at worlddatabaseofhappiness.eur.nl/hap_quer/ hqi_fp.htm.

Columbia University’s “Environmental Sustainability Index” is one of several mea- sures that seek to monitor nations’ environ- mental impact on the planet. You can explore this further and download data for analysis at

sedac.ciesin.columbia.edu/es/compendium .html#data.

h e well-being of America’s young people is the focus of the “Child and Youth Well-Being Index,” housed at Duke University. See www .soc.duke.edu/~cwi/.

Money Magazine has indexed the 100 best places to live in America, using factors such as economics, housing, schools, health, crime, weather, and public facilities. See the details at money.cnn.com/magazines/moneymag/ bplive/2007/top100/.

h e Heritage Foundation of ers an “Index of Economic Freedom” for those planning business ventures around the world; see www .heritage.org/index/.

For Christians who believe in prophecies of the end of times, “h e Rapture Index” uses 45 indicators—including inl ation, famine, l oods, liberalism, and Satanism—to gauge of how close or far away the end is. See www .raptureready.com/rap7.html.

See if you can i nd some other, similar indexes.

SCALE CONSTRUCTION 189

that not all indicators of a variable are equally important or equally strong. h e i rst senator might have voted for the seven least conser- vative bills, whereas the second senator might have voted for the four most conservative bills. (h e second senator might have considered the other six bills too liberal and voted against them.)

Scales of er more assurance of ordinality by tapping the intensity structures among the indi- cators. h e several items going into a composite measure may have dif erent intensities in terms of the variable. Many methods of scaling are available. To illustrate the variety of techniques at hand, we’ll look at four scaling procedures, along with a technique called the semantic dif- ferential. Although these examples focus on questionnaires, the logic of scaling, like that of indexing, applies to other research methods as well.

Bogardus Social Distance Scale

Let’s suppose you’re interested in the extent to which U.S. citizens are willing to associate with, say, sex of enders. You might ask the following questions:

1. Are you willing to let sex of enders live in your country?

2. Are you willing to let sex of enders live in your community?

Every year, women make an invisible contribu-

tion of eleven trillion U.S. dollars to the global

economy, the UNDP [United Nations Develop-

ment Programme] report says, counting both

unpaid work and the underpayment of women’s

work at prevailing market prices. h is “under-

evaluation” of women’s work not only undermines

their purchasing power, says the 1995 HDR

[Human Development Report], but also reduces

their already low social status and af ects their

ability to own property and use credit. Mahbub

ul Haq, the principal author of the report, says

that “if women’s work were accurately rel ected in

national statistics, it would shatter the myth that

men are the main breadwinners of the world.”

h e UNDP report i nds that women work longer

hours than men in almost every country, includ-

ing both paid and unpaid duties. In developing

countries, women do approximately 53% of all

work and spend two-thirds of their work time

on unremunerated activities. In industrialized

countries, women do an average of 51% of the

total work, and—like their counterparts in the

developing world—perform about two-thirds of

their total labor without pay. Men in industrial-

ized countries are compensated for two-thirds of

their work.

h e box “Indexing the World” gives some other examples of indexes that have been created to monitor the state of the world.

As you can see, indexes can be constructed from many dif erent kinds of data for a variety of purposes. (See the box “Assessing Women’s Sta- tus” for more on this topic.) Now we’ll turn our attention from the construction of indexes to an examination of scaling techniques.

SCALE CONSTRUCTION

Good indexes provide an ordinal ranking of cases on a given variable. All indexes are based on this kind of assumption: A senator who voted for seven out of ten conservative bills is considered to be more conservative than one who voted for only four of them. What an in- dex may fail to take into account, however, is

Assessing Women’s Status

In our discussion of the Gender Empow- erment Measure (GEM), we analyze the status of women in countries around the world. How might you use the logic of this analysis to examine and assess the status of women in a particular organization, such as the college you attend or a corporation you’re familiar with?

APPLYING CONCEPTS IN EVERYDAY LIFE

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES190

logic demands that once a person has refused a relationship presented in the scale, he or she will also refuse all the harder ones that follow it.

h e Bogardus social distance scale illustrates the important economy of scaling as a data- reduction device. By knowing how many rela- tionships with sex of enders a given respondent will accept, we know which relationships were accepted. h us, a single number can accurately summarize i ve or six data items without a loss of information.

Motoko Lee, Stephen Sapp, and Melvin Ray (1996) noticed an implicit element in the Bog- ardus social distance scale: It looks at social distance from the point of view of the majority group in a society. h ese researchers decided to turn the tables and create a “reverse social dis- tance” scale: looking at social distance from the perspective of the minority group. Here’s how they framed their questions (1996:19):

Considering typical Caucasian Americans you

have known, not any specii c person nor the worst

or the best, circle Y or N to express your opinion.

Y N 5. Do they mind your being a citizen in

this country?

Y N 4. Do they mind your living in the same

neighborhood?

Y N 3. Would they mind your living next to

them?

Y N 2. Would they mind your becoming a

close friend to them?

Y N 1. Would they mind your becoming

their kin by marriage?

As with the original scale, the researchers found that knowing the number of items minority re- spondents agreed with also told the researchers which ones were agreed with—99 percent of the time in this case.

Thurstone Scales

Often, the inherent structure of the Bogardus social distance scale is not appropriate to the variable being measured. Indeed, such a logical structure among several indicators is seldom

3. Are you willing to let sex of enders live in your neighborhood?

4. Would you be willing to let a sex of ender live next door to you?

5. Would you let your child marry a sex of ender?

h ese questions increase in terms of how closely the respondents want to associate with sex of enders. Beginning with the original con- cern to measure willingness to associate with sex of enders, you have thus developed several questions indicating dif ering degrees of inten- sity on this variable. h e kinds of items pre- sented constitute a Bogardus social distance scale (created by Emory Bogardus). h is scale is a measurement technique for determining the willingness of people to participate in social relations—of varying degrees of closeness—with other kinds of people.

h e clear dif erences of intensity suggest a structure among the items. Presumably, if a person is willing to accept a given kind of as- sociation, he or she would be willing to accept all those preceding it in the list—those with lesser intensities. For example, the person who is willing to permit sex of enders to live in the neighborhood will surely accept them in the community and the nation but may or may not be willing to accept them as next-door neigh- bors or relatives. h is, then, is the logical struc- ture of intensity inherent among the items.

Empirically, one would expect to i nd the largest number of people accepting co- citizenship and the fewest accepting intermar- riage. In this sense, we speak of “easy items” ( for example, residence in the United States) and “hard items” ( for example, intermarriage). More people agree to the easy items than to the hard ones. With some inevitable exceptions,

Bogardus social distance scale A measurement technique

for determining the willingness of people to participate

in social relations—of varying degrees of closeness—with

other kinds of people. It is an especially effi cient technique

in that one can summarize several discrete answers without

losing any of the original details of the data.

SCALE CONSTRUCTION 191

h urstone scaling is not often used in research today, primarily because of the tremendous ex- penditure of energy and time required to have 10 to 15 judges score the items. Because the quality of their judgments would depend on their ex- perience with the variable under consideration, professional researchers might be needed. More- over, the meanings conveyed by the several items indicating a given variable tend to change over time. h us, an item might have a given weight at one time and quite a dif erent weight later on. To be ef ective, a h urstone scale would have to be updated periodically.

Likert Scaling

You may sometimes hear people refer to a questionnaire item containing response cat- egories such as “strongly agree,” “agree,” “dis- agree,” and “strongly disagree” as a Likert scale. h is is technically a misnomer, although Rensis Likert (pronounced “LICK-ert”) did create this commonly used question format. Likert also created a technique for combining the items into a scale, but while Likert’s scaling tech- nique is rarely used, his answer format is one of the most frequently used formats in survey research.

h e particular value of this format is the un- ambiguous ordinality of response categories. If respondents were permitted to volunteer or se- lect such answers as “sort of agree,” “pretty much agree,” “really agree,” and so forth, you would i nd it impossible to judge the relative strength of agreement intended by the various respondents. h e Likert format solves this problem.

h ough seldom used, Likert’s scaling method is fairly easy to understand, based on the relative intensity of dif erent items. As a simple example, suppose we wish to measure prejudice against women. To do this, we create a set of 20 state- ments, each of which rel ects that prejudice.

apparent. A h urstone scale (a format created by Louis h urstone) is an attempt to develop a format for generating groups of indicators of a variable that have at least an empirical structure among them.

One of the basic formats is that of “equal- appearing intervals.” A group of judges is given perhaps a hundred items felt to be indicators of a given variable. Each judge is then asked to estimate how strong an indicator of a variable each item is by assigning scores of perhaps 1 to 13. If the variable were prejudice, for example, the judges would be asked to assign the score of 1 to the very weakest indicators of prejudice, the score of 13 to the strongest indicators, and inter- mediate scores to those in between.

Once the judges have completed this task, the researcher examines the scores assigned to each item to determine which items produced the greatest agreement among the judges. h ose items on which the judges disagreed broadly would be rejected as ambiguous. Among those items producing general agreement in scoring, one or more would be selected to represent each scale score from 1 to 13.

h e items selected in this manner might then be included in a survey questionnaire. Respon- dents who appeared prejudiced on those items representing a strength of 5 would then be ex- pected to appear prejudiced on those having lesser strengths, and if some of those respon- dents did not appear prejudiced on the items with a strength of 6, it would be expected that they would also not appear prejudiced on those with greater strengths.

If the h urstone scale items were adequately developed and scored, the economy and ef ect- iveness of data reduction inherent in the Bogar- dus social distance scale would appear. A single score might be assigned to each respondent (the strength of the hardest item accepted), and that score would adequately represent the responses to several questionnaire items. And, as is true of the Bogardus scale, a respondent who scored 6 might be regarded as more prejudiced than one who scored 5 or less.

h urstone scale A type of composite measure, construct-

ed in accordance with the weights assigned by “judges” to

various indicators of some variables.

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES192

give 15 points to people disagreeing with that statement.

As I’ve said earlier, Likert scaling is sel- dom used today. The item format devised by Likert, however, is one of the most commonly used formats in contemporary questionnaire design. Typically, it’s now used in the creation of simple indexes. With, say, five response categories, scores of 0 to 4 or 1 to 5 might be assigned, taking the direction of the items into account ( for example, assign a score of 5 to “strongly agree” for positive items and to “strongly disagree” for negative items). Each respondent would then be assigned an over- all score representing the summation of the scores he or she received for responses to the individual items.

Semantic Differential

Like the Likert format, the semantic dif eren- tial asks respondents to choose between two opposite positions. Here’s how it works.

Suppose you’re evaluating the ef ectiveness of a new music-appreciation lecture on subjects’ appreciation of music. As a part of your study, you want to play some musical selections and have the subjects report their feelings about them. A good way to tap those feelings would be to use a semantic dif erential format.

To begin, you must determine the dimensions along which subjects should judge each selection. h en you need to i nd two opposite terms, representing the polar extremes along each dimension. Let’s suppose one dimension that interests you is simply whether subjects enjoyed the piece or not. Two opposite terms in this case could be “enjoyable” and “unenjoyable.” Similarly, you might want to know whether they regarded the individual selections as “complex” or “simple,” “harmonic” or “discordant,” and so forth.

Once you have determined the relevant di- mensions and have found terms to represent the extremes of each, you might prepare a rating sheet each subject would complete for each piece of music. Figure 6-5 shows what it might look like.

One of the items might be “Women can’t drive as well as men.” Another might be “Women shouldn’t be allowed to vote.” Likert’s scaling technique would demonstrate the dif erence in intensity between these items as well as pegging the intensity of the other 18 statements.

Let’s suppose we ask a sample of people to agree or disagree with each of the 20 state- ments. Simply giving one point for each of the indicators of prejudice against women would yield the possibility of index scores ranging from 0 to 20. A true Likert scale goes one step beyond that and calculates the average index score for those agreeing with each of the indi- vidual statements. Let’s say that all those who agreed that women are poorer drivers than are men had an average index score of 1.5 (out of a possible 20). h ose who agreed that women should be denied the right to vote might have an average index score of, say, 19.5—indicating the greater degree of prejudice rel ected in that response.

As a result of this item analysis, respondents could be rescored to form a scale: 1.5 points for agreeing that women are poorer drivers, 19.5 points for saying women shouldn’t vote, and points for other responses rel ecting how those items related to the initial, simple index. If those who disagreed with the statement “I might vote for a woman for president” had an average index score of 15, then the scale would

Likert scale A type of composite measure developed by

Rensis Likert in an attempt to improve the levels of mea-

surement in social research through the use of standardized

response categories in survey questionnaires to determine

the relative intensity of different items. Likert items are

those using such response categories as “strongly agree,”

“agree,” “disagree,” and “strongly disagree.” Such items

may be used in the construction of true Likert scales as well

as other types of composite measures.

semantic dif erential A questionnaire format in which

the respondent is asked to rate something in terms of two,

opposite adjectives (e.g., rate textbooks as “boring” or

“exciting”), using qualifi ers such as “very,” “somewhat,”

“neither,” “somewhat,” and “very” to bridge the distance

between the two opposites.

SCALE CONSTRUCTION 193

perhaps multivariate relations among those items. In scale construction, however, you would also look for relatively “hard” and “easy” indica- tors of the variable being examined.

Earlier, when we talked about attitudes re- garding a woman’s right to have an abortion, we discussed several conditions that can af ect peo- ple’s opinions: whether the woman is married, whether her health is endangered, and so forth. h ese dif ering conditions provide an excellent illustration of Guttman scaling.

Here are the percentages of the people in the 2006 GSS sample who supported a woman’s right to an abortion, under three dif erent conditions:

Woman’s health is seriously endangered 87% Pregnant as a result of rape 77% Woman is not married 38%

h e dif erent percentages supporting abortion under the three conditions suggest something about the dif erent levels of support that each item indicates. For example, if someone would support abortion when the mother’s life is seri- ously endangered, that’s not a very strong indi- cator of general support for abortion, because al most everyone agreed with that. Supporting abortion for unmarried women seems a much stronger indicator of support for abortion in general—fewer than half the sample took that position.

On each line of the rating sheet, the sub- ject would indicate how he or she felt about the piece of music: whether it was enjoyable or unenjoyable, for example, and whether it was “somewhat” that way or “very much” so. To avoid creating a biased pattern of responses to such items, it’s a good idea to vary the placement of terms that are likely to be related to each other. Notice, for example, that “discordant” and “tra- ditional” are on the left side of the sheet, with “harmonic” and “modern” on the right. Most likely, those selections scored as “discordant” would also be scored as “modern” as opposed to “traditional.”

Both the Likert and semantic dif erential for- mats have a greater rigor and structure than do other question formats. As I’ve indicated earlier, these formats produce data suitable to both in- dexing and scaling.

Guttman Scaling

Researchers today often use the scale developed by Louis Guttman. Like Bogardus, h urstone, and Likert scaling, Guttman scaling is based on the fact that some items under consideration may prove to be more-extreme indicators of the variable than others. One example should sui ce to illustrate this pattern.

h e construction of a Guttman scale would begin with some of the same steps that initiate index construction. You would begin by examin- ing the face validity of items available for analy- sis. h en, you would examine the bivariate and

Enjoyable

Simple

Discordant

Traditional

Unenjoyable

Complex

Harmonic

Modern

Very Much Somewhat Very MuchSomewhatNeither

FIGURE 6-5 Semantic Differential: Feelings about Musical Selections. The semantic differential asks re-

spondents to describe something or someone in terms of opposing adjectives.

Guttman scale A type of composite measure used to

summarize several discrete observations and to represent

some more-general variable.

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES194

h e i nal column in the table indicates the number of survey respondents who gave each of the response patterns. h e great majority (1,785, or 97 percent) i t into one of the scale types. h e presence of mixed types, however, indicates that the items do not form a perfect Guttman scale. (It would be extremely rare for such data to form a Guttman scale perfectly.)

Recall at this point that one of the chief func- tions of scaling is ei cient data reduction. Scales provide a technique for presenting data in a summary form while maintaining as much of the original information as possible. When the scientii c orientation items were formed into an index in our earlier discussion, respondents were given one point for each scientii c response they gave. If these same three items were scored as a Guttman scale, some respondents would be as- signed scale scores that would permit the most accurate reproduction of their original responses to all three items.

In the present example of attitudes regarding abortion, respondents i tting into the scale types would receive the same scores as were assigned in the index construction. Persons selecting all three pro-choice responses would still be scored 3, those who selected pro-choice responses to the two easier items and were opposed on the hard- est item would be scored 2, and so on. For each of the four scale types, we could predict accurately from their scores all the actual responses given by all the respondents.

h e mixed types in the table present a prob- lem, however. h e i rst mixed type (– + –) was scored 1 on the index to indicate only one pro- choice response. But, if 1 were assigned as a scale score, we would predict that the 44 respondents in this group had chosen only the easiest item (approving abortion when the woman’s life was endangered), and we would be making two er- rors for each such respondent: thinking their response pattern was (+ – –) instead of (– + –). Scale scores are assigned, therefore, with the aim of minimizing the errors that would be made in reconstructing the original responses.

Guttman scaling is based on the notion that anyone who gives a strong indicator of some variable will also give the weaker indicators. In this case, we would assume that anyone who supported abortion for unmarried women would also support it in the case of rape or of the woman’s health being threatened. Table 6-4 tests this assumption by presenting the number of respondents who gave each of the possible response patterns.

h e i rst four response patterns in the table compose what we would call the scale types: those patterns that form a scalar structure. Fol- lowing those respondents who supported abor- tion under all three conditions (line 1), we see that those with only two pro-choice responses (line 2) have chosen the two easier ones; those with only one such response (line 3) chose the easiest of the three (the woman’s health being endangered). And i nally, there are some respon- dents who opposed abortion in all three circum- stances (line 4).

h e second part of the table presents mixed types, or those response patterns that violate the scalar structure of the items. h e most radi- cal departures from the scalar structure are the last two response patterns: those who ac- cepted only the hardest item and those who re- jected only the easiest one.

TABLE 6-4 Scaling Support for Choice of Abortion

Women’s Health

Result of Rape

Woman Unmarried

Number of Cases

Scale types + + + 728

+ + – 653

+ – – 207

– – – 197

Total = 1,785

Mixed types – + – 44

+ – + 7

– – + 3

– + + 4

Total = 58

SCALE CONSTRUCTION 195

Except in the case of perfect (100 percent) re- producibility, there is no way of saying that a set of items does or does not form a Guttman scale in any absolute sense. Virtually all sets of such items approximate a scale. As a general guide- line, however, coei cients of 90 or 95 percent are the commonly used standards in this regard. If the observed reproducibility exceeds the coei - cient you’ve specii ed, you’ll probably decide to score and use the items as a scale.

h e decision concerning criteria in this regard is, of course, arbitrary. Moreover, a high degree of reproducibility does not insure that the scale constructed in fact measures the concept under consideration, although it increases coni dence that all the component items measure the same thing. Also, you should realize that a high coef- i cient of reproducibility is most likely when few items are involved.

One concluding remark with regard to Gutt- man scaling: It’s based on the structure observed among the actual data under examination. h is important point is often misunderstood. It does not make sense to say that a set of questionnaire

Table 6-5 illustrates the index and scale scores that would be assigned to each of the response patterns in our example. Note that one error is made for each respondent in the mixed types. h is is the minimum we can hope for in a mixed- type pattern. In the i rst mixed type, for exam- ple, we would erroneously predict a pro-choice response to the easiest item for each of the 44 respondents in this group, making a total of 44 errors.

h e extent to which a set of empirical re- sponses form a Guttman scale is determined by the accuracy with which the original responses can be reconstructed from the scale scores. For each of the 1,843 respondents in this example, we’ll predict three questionnaire responses, for a total of 5,529 predictions. Table 6-5 indicates that we’ll make 58 errors using the scale scores assigned. h e percentage of correct predictions is called the coei cient of reproducibility: the percentage of original responses that could be reproduced by knowing the scale scores used to summarize them. In the present example, the coei cient of reproducibility is 99 percent.

TABLE 6-5 Index and Scale Scores

Response Pattern Number of Cases Index Scores Scale Scores Total Scale

Errors

Scale types + + + 728 3 3 0

+ + – 653 2 2 0

+ – – 207 1 1 0

– – – 197 0 0 0

Mixed types – + – 44 1 2 44

+ – + 7 2 3 7

– – + 3 1 0 3

– + + 4 2 3 4

Total scale errors = 58

Coeffi cient of reproducibility = 1 – number of errors

number of guesses

= 1 – 58

= 1 – 58

1,843 x 3 5,529

= 0.9895 = 99%

Note: This table presents one common method for scoring mixed types, but you should be advised that other methods are also used.

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES196

TYPOLOGIES

h is chapter now ends with a short discussion of typology construction and analysis. Recall that indexes and scales are constructed to provide ordinal measures of given variables. We attempt to assign index or scale scores to cases in such a way as to indicate a rising degree of prejudice, religiosity, conservatism, and so forth. In such cases, we’re dealing with single dimensions.

Often, however, the researcher wishes to sum- marize the intersection of two or more variables, thereby creating a set of categories or types, which we call a typology. You may, for example, wish to examine the political orientations of newspapers separately in terms of domestic is- sues and foreign policy. h e fourfold presenta- tion in Table 6-6 describes such a typology.

Newspapers in cell A of the table are conserva- tive on both foreign policy and domestic policy; those in cell D are liberal on both. h ose in cells B and C are conservative on one and liberal on the other.

As another example, Rodney Coates (2006) created a typology of “racial hegemony” from two dimensions:

1. Political Ideology a. Democratic b. Non-Democratic 2. Military and Industrial Sophistication a. Low b. High

He then used the typology to examine modern examples of colonial rule, with specii c reference to race relations. h e cases he looked at allowed him to illustrate and rei ne the typology. He

items (perhaps developed and used by a previous researcher) constitutes a Guttman scale. Rather, we can say only that they form a scale within a given body of data being analyzed. Scalability, then, is a sample-dependent, empirical matter. Although a set of items may form a Guttman scale among one sample of survey respondents, for example, there is no guarantee that this set will form such a scale among another sample. In this sense, then, a set of questionnaire items in and of themselves never forms a scale, but a set of empirical observations may.

h is concludes our discussion of indexing and scaling. Like indexes, scales are composite measures of a variable, typically broadening the meaning of the variable beyond what might be captured by a single indicator. Both scales and indexes seek to measure variables at the ordinal level of measurement. Unlike indexes, however, scales take advantage of any intensity struc- ture that may be present among the individual indicators. To the extent that such an intensity structure is found and the data from the people or other units of analysis comply with the logic of that intensity structure, we can have coni dence that we’ve created an ordinal measure.

You can further pursue the topic of indexes

and scales at the website for the Bureau of

Labor Statistics, Measurement Issues in the

Consumer Price Index: www.bls.gov/cpi/

cpigm697.htm. The federal government’s

Consumer Price Index (CPI) is one of those

composite measures that affects many

people’s lives—determining cost-of-living

increases, in this case. This site discusses

some aspects of the measure.

typology The classifi cation (typically nominal) of

observations in terms of their attributes on two or more

variables. The classifi cation of newspapers as liberal-urban,

liberal-rural, conservative-urban, or conservative-rural

would be an example.

TABLE 6-6 A Political Typology of Newspapers

Foreign Policy

Conservative Liberal

Domestic Policy Conservative Liberal

A

C

B

D

TYPOLOGIES 197

the rural newspapers are scored as type A (con- servative on both dimensions) as compared with 30 percent of the urban ones. Moreover, suppose that only 5 percent of the rural newspa- pers are scored as type B (conservative only on domestic issues) as compared with 40 percent of the urban ones. It would be incorrect to con- clude from an examination of type B that urban

points out that such a device represents Weber’s “ideal type”:

As stipulated by Weber, idea types represent a type

of abstraction from reality. h ese abstractions,

constructed from the logical extraction of

elements derived from specii c examples, provide

a theoretical model by which and from which we

may examine reality. (2006:87)

Frequently, you arrive at a typology in the course of an attempt to construct an index or scale. h e items that you felt represented a single variable appear to represent two. You might have been attempting to construct a single index of political orientations for newspapers but discovered—empirically—that foreign and domestic politics had to be kept separate.

In any event, you should be warned against a dii culty inherent in typological analysis. Whenever the typology is used as the indepen- dent variable, there will probably be no problem. In the preceding example, you might compute the percentages of newspapers in each cell that normally endorse Democratic candidates; you could then easily examine the ef ects of both foreign and domestic policies on political endorsements.

It’s extremely dii cult, however, to analyze a typology as a dependent variable. If you want to discover why newspapers fall into the dif erent cells of typology, you’re in trouble. h at becomes apparent when we consider the ways you might construct and read your tables. Assume, for ex- ample, that you want to examine the ef ects of community size on political policies. With a single dimension, you could easily determine the percentages of rural and urban newspapers that were scored conservative and liberal on your in- dex or scale.

With a typology, however, you would have to present the distribution of the urban news- papers in your sample among types A, B, C, and D. h en you would repeat the procedure for the rural ones in the sample and compare the two distributions. Let’s suppose that 80 percent of

If I were to tell you that we had given each respondent one point for every relationship they were to have with sex of enders, and I told you further that a particular respondent had been given a score of 3, would you be able to reproduce each of these i ve answers?

1. Are you willing to let sex offenders live in your country? YES

2. Are you willing to let sex offenders live in your community? YES

3. Are you willing to let sex offenders live in your neighborhood? YES

4. Would you be willing to let a sex offender live next door to you? NO

5. Would you let your child marry a sex offender? NO

Although this logic is very clear in the case of the Bogardus social distance scale, we’ve also seen how social researchers approximate that structure in creating other types of scales, such as h urstone and Guttman scales, which also take account of dif ering intensities among the indicators of a variable.

What do you think? REVISITED ?

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES198

Don’t think that typologies should always be avoided in social research; often they provide the most appropriate device for understanding the data. To examine the pro-life orientation in depth, you might create a typology involving both abortion and capital punishment. Libertarian- ism could be seen in terms of both economic and social permissiveness. You have been warned, however, against the special dii culties involved in using typologies as dependent variables.

newspapers are more conservative on domestic issues than are rural ones because 85 percent of the rural newspapers, compared with 70 percent of the urban ones, have this characteristic. h e relative sparsity of rural newspapers in type B is due to their concentration in type A. It should be apparent that an interpretation of such data would be very dii cult for anything other than description.

In reality, you’d probably examine two such di- mensions separately, especially if the dependent variable has more categories of responses than does the example given.

Main Points

Introduction

Single indicators of variables seldom capture • all the dimensions of a concept, have sui - cient validity to warrant their use, or permit the desired range of variation to allow ordinal rankings. Composite measures, such as scales and indexes, solve these problems by includ- ing several indicators of a variable in one summary measure.

Indexes versus Scales

Although both indexes and scales are in-• tended as ordinal measures of variables, scales typically satisfy this intention better than do indexes.

Whereas indexes are based on the simple • cumulation of indicators of a variable, scales take advantage of any logical or empirical intensity structures that exist among a vari- able’s indicators.

Index Construction

h e principal steps in constructing an index • include selecting possible items, examining

their empirical relationships, scoring the index, and validating it.

Criteria of item selection include face validity, • unidimensionality, the degree of specii city with which a dimension is to be measured, and the amount of variance provided by the items.

If dif erent items are indeed indicators of the • same variable, then they should be related empirically to one another. In constructing an index, the researcher needs to examine bivari- ate and multivariate relationships among the items.

Index scoring involves deciding the desirable • range of scores and determining whether items will have equal or dif erent weights.

Various techniques allow items to be used in • an index in spite of missing data.

Item analysis is a type of internal validation • based on the relationship between individual items in the composite measure and the measure itself. External validation refers to the relationships between the composite measure and other indicators of the variable— indicators not included in the measure.

of measurement in your proposal. As in the case of operationalization, you may i nd this easier to formulate in the case of quantitative studies, but the logic of multiple indicators may be applied to all research methods.

If your study will involve the use of composite measures, you should identify the type(s), the indicators to be used in their construction, and the methods you’ll use to create and validate them. If the study you’re planning in this series of exercises will not include composite mea- sures, you can test your understanding of the chapter by exploring ways in which they could be used, even if you need to temporarily vary the data-collection method and/or variables you have in mind.

Review Questions

1. In your own words, what is the dif erence between

an index and a scale?

2. Suppose you wanted to create an index for rating

the quality of colleges and universities. What are

three data items that might be included in such an

index?

3. Why do you suppose h urstone scales have not

been used more widely in the social sciences?

4. What would be some questionnaire items that

could measure attitudes toward nuclear power

and that would probably form a Guttman scale?

Online Study Resources

Go to www.cengage.com/login

and click on “Create My Account” for access to this powerful online study tool. You’ll get a personalized study plan based on your responses to a diagnostic pretest. Once you’ve mastered

Scale Construction

Four types of scaling techniques are repre-• sented by the Bogardus social distance scale, a device for measuring the varying degrees to which a person would be willing to associate with a given class of people; h urstone scal- ing, a technique that uses judges to determine the intensities of dif erent indicators; Likert scaling, a measurement technique based on the use of standardized response categories; and Guttman scaling, a method of discovering and using the empirical intensity structure among several indicators of a given variable. Guttman scaling is probably the most popular scaling technique in social research today.

h e semantic dif erential is a question format • that asks respondents to make ratings that lie between two extremes, such as “very positive” and “very negative.”

Typologies

A typology is a nominal composite measure • often used in social research. Typologies can be used ef ectively as independent variables, but interpretation is dii cult when they are used as dependent variables.

Key Terms

Bogardus social distance scale Likert scale

external validation scale

Guttman scale semantic dif erential

index h urstone scale

item analysis typology

Proposing Social Research: Composite Measures

h is chapter has extended the issue of measure- ment to include those in which variables are measured by more than one indicator. What you’ve learned here may extend the discussion

ONLINE STUDY RESOURCES 199

CHAPTER 6 INDEXES, SCALES, AND TYPOLOGIES200

resources in addition to CengageNOW to aid you in studying for your exams. For example, you’ll i nd Tutorial Quizzes with feedback, Internet Exercises, Flash Cards, Glossary and Crossword Puzzles, as well as Learning Objectives, GSS Data, Web Links, Essay Questions, and a Final Exam.

the material with the help of interactive learning tools, you can take a posttest to coni rm that you’re ready to move on to the next chapter.

Website for The Basics of Social Research, 5th edition

At the book companion website (www.cengage .com/sociology/babbie) you’ll i nd many