questions

BinningBarrett1989.pdf

Journal of Applied Psychology 1989, m 74, No, 3,478-494

Validity of Personnel Decisions: A Conceptual Analysis of the Inferential and Evidential Bases

John F. Binning Illinois State University

Gerald V. Barrett University of Akron

Issues common to both the process of building psychological theories and validating personnel deci- sions are examined. Inferences linking psychological constructs and operational measures of con- structs are organized into a conceptual framework, and validation is characterized as the process of accumulating various forms of judgmental and empirical evidence to support these inferences. The traditional concepts of construct-, content-, and criterion-related validity are unified within this framework. This unified view of validity is then contrasted with more conventional views (e.g., Uni-

form Guidelines, 1978), and misconceptions about the validation of employment tests are exam- ined. Next, the process of validating predictor constructs is extended to delineate the critical infer- ences unique to validating performance criteria. Finally, an agenda for programmatic personnel selection research is described, emphasizing a shift in the behavioral scientist's role in the personnel selection process.

Demonstrating the validity of decisions based on psychologi-

cal assessment procedures is of fundamental importance to per-

sonnel and other applied psychologists. Furthermore, few

would argue with the fact that generating and articulating valid-

ity evidence is a complex process. To fully appreciate this com-

plexity, it is important to realize that conceptions of validity

have evolved over the years through the melding of legal, techni-

cal, and practical concerns about the quality and utility of per-

sonnel decisions. Inevitably, differences of interpretation and

opinion have arisen as each constituency has viewed these myr-

iad concerns from uniquely important perspectives. Perhaps

equally inevitable, however, is the confusion that has grown out

of these differences. Because this confusion ultimately limits the

effectiveness of practitioners and theorists alike, the need for

greater clarity cannot be overestimated (Guion, 1987; Landy,

1986; Tenopyr, 1986).

This article is based on the premise that all validity issues

discussed in personnel contexts have some conceptual counter-

part in the general process of theory development (Landy,

1986). Moreover, various departures from this "ideal" process

have led to myopic, if not erroneous, conceptions of validity. To

elucidate how these departures have distorted conceptions of

validity, the article is divided into four major sections. In the

section that immediately follows, we review how the general

A shorter version of this article was presented at the annual meeting of the Academy of Management, Anaheim, California August 1988.

We would like to thank the following people for helpful comments on

earlier drafts of this article: JerFFacteau, Mel Goldstein, Steven Landau, Pat Maloney, Tim Mooney, John Pryor, Pat Raymark, Glenn Reeder, Bob Rumery, Jay Thomas, Karen Williams, Kenneth \ork, and two

anonymous reviewers. Correspondence concerning this article should be addressed to John

F. Binning, Department of Psychology, Illinois State University, Nor-

mal, Illinois 61761.

concept of scientific validity implies a simple model in which

constructs and measures of such are inferentially linked. In the

next section, we suggest that in personnel selection contexts, a

conceptually truncated adaptation of this model often implic-

itly guides the validation of predictor-criterion relationships.

This truncation has for years had an undesirably limiting in-

fluence on conceptions of validity. Perhaps its most damaging

effect has been the relative neglect of criterion validity concerns.

In remedial response to this, a third model is presented. This

model is designed to restore and clarify the severed criterion

portions of the original. Finally, suggested strategies for elabo-

rating the proposed model and broadening conceptions of vali-

dation are discussed.

Validation Vis-a-Vis Theory Development

It is now commonly accepted that validity is not a character-

istic of a test or assessment procedure but, instead, of inferences

made from test or assessment information (Cronbach, 1970;

Guion, 1980,1987;Landy, 1986; Society for Industrial and Or-

ganizational Psychology, 1987; American Psychological Associ-

ation, 1985). An inference is valid to the extent that it is sup-

ported by sound evidence. Expressed alternatively by Nunnally

(1978), "one validates not a measuring instrument but rather

some use to which a measuring instrument is put" (p. 87). Logi-

cally, therefore, to examine the concept of validity in personnel

decision making, it is important to delineate (a) the types of

inferences involved in applied personnel decision situations and

(b) the nature of evidence that can be used to support such in-

ferences.

Inferences Linking Psychological Constructs

Following Landy's (1986) lead, it is both appropriate and im-

portant to view the process of validating a particular selection

procedure as a special case of hypothesis testing and scientific

478

VALIDITY OF PERSONNEL DECISIONS 479

theory building. The following rudimentary characterization of

the theory-building process will provide a backdrop for further

discussion of some important validity concepts.

Psychological constructs are labels for clusters of covarying

behaviors. In this way, a virtually infinite number of behaviors

is reduced to a system of fewer labels, which simplifies and

economizes the exchange of information and facilitates the pro-

cess of discovering behavioral regularities. For example, it is less

cumbersome to refer to the relation between verbal and quanti-

tative ability than to the abilities to add, subtract, multiply, and

divide numbers, fractions, decimals, and so forth, and their re-

lations to reading, spelling, understanding word meanings, and

soon.

Putting aside the perennial debate over the objective existence

of psychological traits and psychologists' constructs (Cronbach

& Meehl, 1955; Kane, 1982; Loevinger, 1957; Messick, 1981;

Nunnally, 1978), viewed pragmatically, a construct is merely a

hypothesis about which behaviors will reliably covary. Con-

structs are heuristic devices for describing behavioral domains.

Of course, construct domains can vary in being large versus

small, specific versus general, and fuzzy versus clearly defined

(Guion, 1987; Nunnally, 1978). Also, constructs become the

object of conceptual scrutiny in their own right. In other words,

psychologists hypothesize both (a) whether certain behaviors

will covary and (b) whether the clusters of covarying behaviors

(constructs) tend to covary in meaningful ways. In this general

sense, the terms construct validation and theory development

imply the same basic process. Both refer to the process of iden-

tifying (and often reifying) constructs by developing measures

of such constructs and examining relationships among the vari-

ous measures.

Nunnally (1978) delineated the four inferences that form the

core of this construct validation process. These four inferences

logically bind the components of the model presented in Figure

1. One can attempt to determine whether an inferred relation-

ship between two constructs (e.g., anxiety and manual dexter-

ity) exists by developing measures or causal conditions for each

(labeled Jf and Y, respectively). It is important to emphasize that

these measures are nothing more than procedures for sampling

behaviors within the respective construct domains. The follow-

ing four inferences then follow logically;

1. X and Y relate in some specified way. 2. X is a measure of (or treatment that induces) anxiety. 3. Anxiety and manual dexterity are causally related in some spec- ified way. 4. Yis a measure of (or treatment that induces) manual dexterity.

Even though these four inferences are interrelated, a single ex-

periment cannot validate all four inferences simultaneously. In

fact, Inference 1 is the only one that can be empirically tested

directly. That is, we can use our measures of anxiety and man-

ual dexterity to derive scores that are subsequently found to

relate either experimentally or correlationally. These data serve

as empirical evidence of the veridicality of Inference 1. From

this one empirical finding, therefore, it would be necessary to

infer the truth or falsity of the others, because Inferences 2, 3,

and 4 each link an observable measure with a hypothetical con-

struction. Of course, merely finding a correlation between X

Measure of Anxiety

1 Measure of Manual Dexterity

' \

Figure 1. Critical inferential linkages in the theory-building process.

and Y leaves open several alternative interpretations of possible

relationships. For example, anxiety and manual dexterity are

perhaps both related to some third construct?

To provide incontrovertible proof that the four inferences are

correct, it would be necessary to empirically demonstrate three

of the inferences. If three of the linkages are unequivocally

proven correct, then complete confidence in the fourth would

be justified. However, because this direct empirical proof is im-

possible (Nunnally, 1978), typical practice is to assume that two

of the three inferences (2,3, or 4) are correct and this, combined

with empirical evidence of Inference 1, allows a valid conclu-

sion regarding the remaining inference. Generally, these con-

clusions about construct validity are strengthened in those situ-

ations in which the truth of the assumptions is obvious to every-

one scrutinizing the conclusions drawn. Specifically, we are

more confident that a test validly measures a given construct if

(a) the behavioral domain of the other construct is explicitly

defined and (b) the assumption of a relationship between the

two constructs is unarguable (Nunnally, 1978).

The Three Faces of Construct Validity

To avoid confusion, it is important to realize that the term

construct validity has thus far been used to describe the sound-

ness of evidence supporting any of the four inferences. Thus,

the term is being used in its most general sense in reference

to construct-construct links (Inference 3), construct-measure

links (Inferences 2 and 4), and measure-measure links (Infer-

ence 1). However, what have traditionally been of particular

concern to research psychologists and psychometricians are

construct-measure links (i.e., Inference 2 or 4). In the heyday

of trait psychology, construct validity often referred to whether

a given test or measurement procedure allowed accurate infer-

ences about an individual's standing on a psychological con-

struct of particular interest (D. T. Campbell & Fiske, 1959;

Cronbach & Meehl, 1955; Ebel, 1977; Guion, 1980; Messick,

1980). These two uses of the term construct validity (equal con-

cern for Inferences 1, 2, 3, & 4 vs. primary concern for only

Inference 2 or 4) are clearly congruent. In fact, the difference

in perspective was recognized by Loevinger (1957) when she

referred to the validity of the construct versus the validity of the

480 JOHN F. BINNING AND GERALD V. BARRETT

test as a measure of the construct (Landy, 1986). If theory build-

ing is of primary interest, Inferences 1, 2, 3, and 4 are all of

equal importance. On the other hand, in specific situations (e.g.,

development of a new test), Inference 2 (or 4) is emphasized.

This becomes potentially more confusing because the term con-

struct validity has a somewhat different connotation in the per-

sonnel selection literature. Here, it has been frequently used to

describe a specific evidential approach for justifying a specific

measure-construct link (i.e., the predictor-performance link-

age portrayed in Figure 2) by documenting underlying con-

struct-construct and construct-measure links (Schwab, 1980).

The inferences implicated in this latter meaning are described

in detail in the next section. Perhaps the most important issue

at this juncture is to realize that these various meanings of the

term construct validity are nothing more than different views

of the same logical system, with varying emphasis on different

inferences.

Examining Traditional Conceptions of Validity

A common conception of the personnel selection process in-

volves (a) analysis of the job to determine (b) a performance

domain, denned in terms of job behaviors or outcomes, which

then guides (c) the selection or development of certain assess-

ment procedures, which make possible (d) predictions about

the likelihood that applicants will perform the job with a certain

degree of proficiency, and then subsequently (e) evaluating indi-

vidual performance by some operational criterion measure

(e.g., Cascio, 1987; Muchinsky, 1987; Society for I/O Psychol-

ogy, 1987; APA, 1985; Uniform Guidelines, 1978). This pro-

cess implies a framework, presented in Figure 2, which parallels

Figure 1 in many respects. The framework represented in Fig-

ure 2 portrays the following inferences:

5. Predictor measurements relate to criterion measurements. 6. The predictor measure is an adequate sample from a psycholog- ical construct domain. 7. The predictor construct domain overlaps with the performance domain. 8. The criterion measure is an adequate sample from the perfor- mance domain. 9. The predictor measure is related to the performance domain.

These inferences serve to link the components in Figure 2 anal-

ogously to the inferences in Figure 1. It is important to realize,

however, that in the transition from Figure 1 to Figure 2, two

important differences have arisen. First, an additional mea-

sure-construct linkage (Inference 9) has been created, linking

the predictor measure and the performance domain. Second,

rather than equal emphasis being placed on all inferences, this

additional measure-construct (Inference 9) link has taken on

greater relative importance.

The implications of this way of thinking for understanding

the validation process are explored in the discussion that fol-

lows. Before detailing how validation of personnel selection de-

cisions is merely a special case of the more general construct

validation process, it would be helpful to discuss the process of

conceptualizing and constructing behavioral domains.

Figure?. A common conception of the inferences for personnel selection.

Contrasting Predictor Construct Domains and

Performance Domains

In an attempt to simplify the virtually infinite number of be-

haviors that can be exhibited by human beings, psychologists

attempt to identify naturally occurring clusters, then construct

labels for them, and investigate the covariance between them.

Predictor constructs, therefore, represent those clusters of co-

variant behaviors identified through psychological research and

constructed to enhance our general understanding of behavior.

In contrast to the psychologists' search for naturally occur-

ring behavioral construct domains across myriad situations, or-

ganizational designers in effect create behavioral domains to en-

hance their understanding and prediction of job behavior. In

fact, it is important to realize that from our pragmatic view-

point, a job performance domain is a construct, albeit in a con-

ceptually different sense than is usually implied in the psycho-

logical literature. Nonetheless, the performance of any job in

any organization is a cluster of interlocked and covariant behav-

iors, and this cluster consists of a subset of all possible behaviors

necessary for the organization to accomplish its broader goals

and objectives (Weick, 1979). Just as psychological constructs

represent behavioral domains, performance associated with a

job (or distinguishable aspects of job performance) represents

a behavioral domain.

Performance domains are conceptually distinct from predic-

tor constructs in that the universe to be sampled is delineated

differently. Construct domains on the predictor side are con-

ceived of by the research psychologist with reference to some

theoretical framework developed to explain general regularities

in human behavior. Performance domains are determined, or

at least influenced, by organizational decision makers and selec-

tion specialists collaborating to translate broad organizational

objectives into normative statements of valued behaviors and

outcomes.

The overriding reason for constructing behavioral domains

on both the predictor and the performance side is the parceling

of myriad behaviors into meaningful clusters to enhance under-

standing and communication. However, this parceling process

is different on the predictor versus the performance side because

of differences in (a) the conceptualization of predictor domains

versus performance domains, (b) the specific purposes for sped-

VALIDITY OF PERSONNEL DECISIONS 481

fying behavioral domains, (c) the methods used to cluster be-

haviors, and often (d) the language system used to communicate

about the resulting construct systems.

First, the source of covariance between job behaviors is de-

signed by the organization and induced by various external con-

trol and coordination mechanisms (Mintzberg, 1983). This can

be contrasted with the naturally occurring covariance resulting

from the individual's personal predispositions or the interaction

of these predispositions with untold environmental influences,

as is often conceptualized for psychological constructs. Second,

predictor constructs are clusters of behaviors created by re-

search psychologists to capture general regularities in behavior.

Performance domains are designed more or less rationally to

interlock in such a way as to maximize efficient attainment of

organizational goals. As a result, performance domains are

clusters of behavior-outcome units that are differentially val-

ued by the organization. Depending on which goals are most

operative, clustering systems can vary considerably (Campion

AThayer, 1985; Griffin, 1982; Harvey, 1986).

Third, through psychologists' use of both correlational (e.g.,

factor analytic) and experimental methods, behavior is empiri-

cally examined, and reliable regularities are conceptualized and

assigned construct labels. In contrast, organizational designers

typically rely on rational and relatively informal methods for

delineating performance domains. Finally, construct terminol-

ogy on the predictor side reflects the concern for identifying do-

mains of behavior caused by personal dispositions (e.g., "she is

extraverted"). Terminology used in organizations to describe

performance domains is more often goal-related (e.g., "she is

customer-service oriented"). These different terminologies for

describing behavioral domains have been discussed extensively

elsewhere (Fleishman, 1982; Pearlman, 1980).

On the predictor side, behaviors can be clustered hierarchi-

cally in varying levels of inclusiveness. For example, Dunnette

(1976) reviewed attempts to conceptualize intellectual func-

tioning by pointing out that behavior can be grouped into a

single, global construct (i.e., the g factor), several less inclusive

constructs (e.g., Thurstone's, 1938, seven primary mental abili-

ties), or literally hundreds of constructs (e.g., Guilford's, 1967,

structure of intellect model).

Organizations also conceptualize job behavior at different

levels of inclusiveness, depending on the purpose at hand. For

purposes of administrative decision making, global "construc-

tions" of job behavior are evoked, because the overriding im-

perative is the comparison of employees' overall contribution

to the organization. In this situation, the constructed system of

performance domains can be conceptualized as merely a system

of different job titles, each connoting a different domain of be-

haviors. On the other hand, when remedial feedback about job

behavior is required, organizations often cluster performance

into a number of behaviorally meaningful dimensions. For ex-

ample, the process of constructing behaviorally anchored rating

scales (Bernardin & P. C. Smith, 1981; P. C. Smith & Kendall,

1963) can be viewed as resulting in a performance construct

system that enhances intraorganizational communication and

decision making about job performance in a specific organiza-

tion (Feldman, 1986). A source of difficulty for many job ana-

lysts and performance appraisal system designers is the fact that

the organization's conceptual system for describing perfor-

mance is ineffective for certain purposes.

Performance domains result from the division of labor funda-

mental to organizing human activity. The conceptualization

and resulting terminology used in an organization to describe

performance differences (both within and between jobs) serves

to make behavior in organizations more understandable and or-

derly. Building on Weick's (1979) basic tenet that organizing is

a "consensually validated grammar for reducing equivocality

by means of sensible interlocked behavior" (p. 3), we propose

that performance construct systems are an important part of

the grammar and culture of a given organization. The manner

in which performance behavior is clustered and labeled is part

of the consensually validated conceptual scheme that helps

make sense out of the complex stream of interlocked behaviors

in the organization.

Viewed from this perspective, selection decisions represent

attempts to identify regularities in applicants' behavior, but

only those behaviors identified by the organization as valuable

for coordination with others' behavior that are necessary for

goal attainment. Personnel selection, then, is the process of

identifying and mapping predictor samples of behavior to

effectively overlap with performance domains. Validity, there-

fore, can be viewed as the extent to which these two construct

systems overlap.

The "Unitarian" Conception of Validity

The trilogy of construct, content, and criterion-related validi-

ties was first articulated in the "Technical Recommendations

for Psychological Tests and Diagnostic Techniques" (American

Psychological Association, American Educational Research As-

sociation, & National Council of Measurement Used in Educa-

tion, 1954). As noted by Landy (1986), this trilogy was quite

valuable in that it enhanced the clarity with which validity con-

cepts were typically discussed at the time. As is the case with

many popular conceptualizations, however, its initial usefulness

was replaced by growing confusion. This confusion is due, in

part, to the tendency for certain erroneous interpretations, mis-

conceptions, and legal mandates to become crystallized as part

of professional psychology's conventional wisdom (G. V. Bar-

rett, 1972) or tenets of orthodoxy (Guion, 1976). It was many

years before this conventional wisdom was questioned in a sys-

tematic way (Dunnette & Borman, 1979; Guion, 1977, 1978,

1980; Messick, 1975, 1980; Tenopyr, 1977; Tenopyr & Oeltjen,

1982) and, by then, confusion was running rampant.

For many years, the concepts of construct, content, and crite-

rion-related validity have been described as different types of

validity. Some recent descriptions have gone so far as to suggest

that each of these validity analysis strategies (Lawshe, 1985)

should be chosen according to the kinds of inferences or conclu-

sions one wishes to make about job applicants (e.g., Lawshe,

1985; Saal & Knight, 1988) or the nature of the selection proce-

dure (e.g., R. S. Barrett, 1980). Although the latter view has to

some degree been induced by the prevailing opinion in Title VII

litigation, this linking of different validities to different infer-

ences or types of predictors is logically problematic because of

the implication that in any given decision situation, only one of

482 JOHN F. BINNING AND GERALD V. BARRETT

the three validity concepts is useful. On the contrary, an infer- ence drawn from currently available information about some aspect of future job performance (Inference 9) is the single over- riding inference; and content-, construct-, and criterion-related considerations are all quite relevant for justifying its validity. These three concepts are more appropriately viewed as labels for three evidential bases (Messick, 1980) from which infer- ences about future job performance can be supported or justi- fied.

The applied decision maker is concerned about the extent to which test or assessment information will allow accurate pre- dictions about subsequent job performance (Inference 9). One general approach to justifying Inference 9 would be to generate direct empirical evidence that assessment scores relate to valid measurements of job performance (Sussmann & Robertson, 1986). Inference 5 represents this linkage, which has histori- cally been of primary pragmatic concern to personnel psycholo- gists. The term criterion-related has traditionally been used to denote this type of evidence and, in fact, often implies the un- necessary restriction that only correlational evidence is appro- priate. As Landy (1986) ably pointed out, substantive theories are seldom, if ever, built solely on correlational evidence. Viewed in this way, criterion-related evidence can be experi- mental and quasi-experimental in nature.

Why, therefore, have personnel specialists relied so heavily on correlational evidence of validity? This bias might derive from the fact that in personnel selection situations, the constructs of interest are conceived of as enduring person characteristics (e.g., abilities) on the predictor side and fixed job performance measures on the criterion side. Neither of these is typically thought to be amenable to experimental manipulation, except under the most contrived laboratory conditions. Perhaps an- other factor contributing to this bias in favor of correlational evidence was the conventionally held belief in the situational specificity of validity (Schmidt & Hunter, 1981), which would preclude the use of laboratory analogues to real work settings.

Logically, then, one approach for justifying Inference 9 is to empirically link the predictor and the criterion. However, this results in only partial justification. Analogous to the validation of inferences in Figure 1, to have complete confidence in the validity of Inference 9, both Inferences 5 and 8 must be justi- fied. The relative neglect of Inference 8 by those collecting crite- rion-related evidence represents a critical truncation of the vali- dation process. Suffice it to say that for criterion-related evi- dence to be a compelling argument for Inference 9, strong evidence of both Inferences 5 and 8 is required.

What personnel specialists have traditionally implied by the label construct validity is tied to Inferences 6 and 7. Analogous to the logic presented earlier, it is assumed that if Inferences 6 and 7 can be supported by sound evidence, then one can confi- dently believe Inference 9 to be true. The difference is merely one of focus. Therefore, the general conception of construct va- lidity is merely viewed differently in the context of validating personnel selection decisions. In a selection context, Inference 9 is most critical. If it can be shown that a test measures a specific construct (Inference 6) that has been determined to be critical for job performance (Inference 7), then inferences about job

performance from test scores (Inference 9) are, by logical im- plication, justified.

How does a personnel selection specialist support Inferences 6 and 7? Evidence supporting Inference 6 primarily takes the form of empirically based relationships and judgments that are both convergent and discriminant in nature (D. T. Campbell & Fiske, 1959; Cook & D. T. Campbell, 1979;Cronbach&Meehl, 1955; Drasgow & Miller, 1982; Rezmovic & Rezmovic, 1981). Convergent evidence exists when (a) test scores relate to scores on other tests of the same construct, (b) test scores from people who differ in the extent to which they possess the focal construct also differ in a predictable way, or (c) test scores relate to scores on tests of other constructs that are theoretically expected to be related. Discriminant evidence occurs when test scores do not relate to scores on tests of theoretically independent constructs. Note that this discussion can apply equally to criterion mea- surement (Inference 8).

Inference 7, because it links two hypothetical behavioral do- mains, cannot be examined empirically. Analogous to Infer- ence 3 in Figure 1, Inference 7 must be justified theoretically and logically on the basis of accumulated knowledge of con- struct-construct relations. On closer conceptual scrutiny, how- ever, the analogy loses its relevance because the two constructs being related do not share common nomological (Margenau, 1950) status. The unique conceptual issues that arise when re- lating predictor and performance domains are examined in de- tail later. For now, to the extent that Inferences 6 and 7 are sup- ported, the use of the predictor test to predict job performance is construct valid.

A third approach for justifying Inference 9 involves demon- strating that the predictor is isomorphic and obviously inter- changeable with the performance domain. This line of reason- ing is particularly defensible when it is realized that predictor tests are always samples of behavior from which we infer some- thing about behavior on a job (Dunnette, 1963). The behaviors sampled may be dissimilar or similar ("sign vs. sample") to the criterion behaviors being predicted (Wernimont & Campbell, 1968jl If an applicant performs behaviors as part of the assess- ment phase that closely resemble behaviors in the performance domain, then many personnel specialists feel that, logically, the inference about future job performance is better justified. This line of reasoning underlies the type of evidence traditionally la- beled content validity. Of course, various specific procedures for analyzing the degree of isomorphism between predictors and criteria have been proposed (Doran, 1987; Faley & Sundstrom, 1985; Hamilton, 1981;Lawshe, 1975;Schmitt&Ostroff, 1986; Trattner, 1982), but the same basic logic underlies each.

Content-related evidence of validity has traditionally in- volved justifying Inference 9 by rational examination of the manner in which the performance domain is sampled by the predictor. Analogous to statistical sampling theory, if a predic- tor sample is constructed in congruence with certain principles (e.g., ensuring representativeness as well as relevance of the sample), one can assume that scores from that sample will accu- rately estimate the universe from which the sample is drawn. It is this emphasis on operationalization and sample construction that motivated Tenopyr (1977) to refer to content validation as "content-oriented test construction" (p. 52). Therefore, when a

VALIDITY OF PERSONNEL DECISIONS 483

selection specialist can rationally defend the strategy for sam-

pling the performance domain used in a given testing situation,

content-related validity evidence supports the inference that

scores from the test are valid for predicting future performance.

Decision Validity Versus Predictor Development

Thus far, the concepts of construct-, content-, and criterion-

related evidence have been discussed solely as evidential bases

for justifying decision validity. However, the implications of

differences between the three can be traced back in the decision-

making process. By doing so, their differences can be more

clearly appreciated.

Personnel decision making involves two fundamental phases:

(a) constructing the predictor as a sample of some behavioral

domain and (b) using this behavioral information to make pre-

dictions about future job behavior. This latter data combination

phase is the immediate precursor to employment decisions and

has therefore received considerable legal and professional scru-

tiny. Yet, the data collection phase, which involves specifying

the behavioral data base, has equally important implications for

subsequent decision quality (Sawyer, 1966). The respective roles

of the construct-, content-, and criterion-related concepts in the

development of predictor samples of behavior deserves concep-

tual scrutiny.

With reference to Figure 2, the point of departure for the

development of any personnel selection system is the perfor-

mance domain. From this delineation of desirable job behaviors

or outcomes, selection specialists "work backwards" to specify

which behaviors or outcomes should be sampled by the predic-

tors. There are three routes from the performance domain to

predictor development: The construct-related approach in-

volves identifying psychological construct domains that overlap

significantly with the performance domain (Inference 7) and

then developing predictors that adequately sample these con-

struct domains (Inference 6). The content-related approach in-

volves developing predictors that directly sample the perfor-

mance domain. The criterion-related approach involves

developing some operational measure of behaviors in the perfor-

mance domain (Inference 8) and then identifying or developing

predictors that will relate empirically with the operational crite-

rion measure (Inference 5).

We would like to draw attention to a fundamental difference

between the criterion-related approach and the other two ap-

proaches. The criterion is merely an operational sample of the

performance domain. At its best—that is, being neither defi-

cient nor contaminated—it taps the entire performance do-

main, and the criterion-related approach reduces logically to

the content-related approach. At its worst, it represents an

atheoretical and circuitous, if not an entirely misleading route,

to predictor development (e.g., "dust-bowl empiricism"). From

this perspective, we propose that the construct-related and con-

tent-related approaches represent the two fundamental predic-

tor sampling strategies. Construct-related implies that predictor

sampling is guided by evoking a psychological construct do-

main. Content-related implies that predictor sampling is guided

by evoking a performance domain. To the extent that the two

domains are derived differently and relations between the two

are not well understood, construct- and content-related ap-

proaches can lead to substantive differences in predictor devel-

opment and consequent decision validity (R. S. Barrett, 1980).

In contrast with the construct- and content-related approaches,

the criterion-related approach is best characterized as a re-

search strategy for empirically assessing the quality of either

predictor sampling strategy. Viewed from this perspective, judg-

ments of validity are tantamount to judgments about the ade-

quacy of behavior sampling (construct- and content-related) or

empirical indexes of such adequacy (criterion-related).

Generating Evidence for Decision Validity

There has been considerable debate over the years regarding

whether the construct-related versus the content-related view-

point provides the most fruitful model for guiding predictor de-

velopment and subsequent decision making. For instance, a

fundamental conceptual issue was raised when Wernimont and

J. P. Campbell (1968) argued that the classic validity model and

its emphasis on predictor tests as signs of underlying constructs

should be replaced by the behavioral consistency approach in

which predictors represent samples of job behavior. Upon

closer examination, this issue is really one of how predictor do-

mains should best be specified. If the predictor test is labeled a

sign, it implies that the behavior domain was specified by the

theory surrounding a psychological construct. If the predictor

test is labeled a sample, it implies that the behavior domain was

specified by the "theory" surrounding job performance. Ulti-

mately, the resolution of this controversy depends on one's real-

izing that the two approaches are inextricably intertwined in

the inferential system portrayed in Figure 2. Interestingly, this

distinction parallels in certain respects the long-standing con-

troversy in personality psychology between traditional trait ver-

sus situationalist approaches for predicting behavior (Mischel,

1973). The issues of whether intrinsic attributes versus environ-

mental characteristics are the most potent influencers of behav-

ior are certainly not fully resolved; however, advances have been

made to integrate the trait-situation perspectives (Kenrick &

Funder, 1988; Mischel, 1984; Schneider, 1987).

In our view, personnel psychologists should never avidly rec-

ommend the abandonment of construct-based theory develop-

ment, because it is the hallmark of fruitful scientific inquiry.

Tenopyr (1977) pointed out that for a test to have high predic-

tive value, it must share the same psychological constructs that

underlie job behavior. This view recognizes that content speci-

fication is part of the construct validation process. That is, part

of justifying that a test measures a given construct is the exami-

nation of the internal structure of the test to assess the extent to

which it is consistent with the theory surrounding the construct.

Irrespective of this conceptual unity, the construct versus con-

tent perspectives are explicitly recognized in the Uniform

Guidelines (1978), Standards (1985), and Principles (1987),

and therefore it is pragmatically important to draw clear opera-

tional and semantic distinctions between them.

Criterion-related evidence is by its nature empirical, whereas

content-related and construct-related evidence are typically

conceived of as'relying more on human judgment and thus are

used differently to justify inferences from test scores. Perhaps

484 JOHN F. BINNING AND GERALD V. BARRETT

because the precision introduced by careful quantification of

psychological phenomena is fundamental to scientific inquiry,

criterion-related evidence has been endorsed as legally superior

to the other two forms of evidence (Uniform Guidelines, 1978).

However, the scientific superiority of criterion-related evidence

has not received this endorsement in professional guidelines

(e.g., Society for I/O Psychology, 1987; APA, 1985).

Although this issue has been debated for years, the present

framework makes it clear that there is no inherent or immutable

superiority of criterion-related evidence (especially when re-

stricted in form to predictor-criterion correlations) over other

lines of evidence. An uncritical bias in favor of criterion-related

evidence can have deleterious effects on theoretical understand-

ing. Validation research for assessment centers provides a case

in point. After reviewing the empirical validity evidence, Kli-

moski and Brickner (1987) concluded that despite consistent

empirical evidence, the theoretical explanations of "the predic-

tive validity of assessment centers remains a puzzle" (p. 256).

G. V. Barrett, Alexander, O'Connor, and Forbes (1978) argued

that coincidental empirical relationships can be discovered

when relying on a "dust-bowl empiricism" approach to valida-

tion. These coincidental relationships, when atheoretically dis-

covered and interpreted, will detract from our ultimate under-

standing of complex criterion behavior, because they lack what

Guion (1980) has called job-relatedness. The work of Schmidt

and Hunter (1981) also casts suspicion on the reliability of evi-

dence from individual criterion-related validity studies due to

the excessive sampling error that results from the use of small

validation samples.

One could reasonably argue that content-related and con-

struct-related evidence, when based on sound professional

judgment about appropriate test use, are often superior to crite-

rion-related evidence. Research does indicate that pooled esti-

mates of criterion-related validity, based on the opinions of per-

sonnel psychologists, are more accurate than empirical evi-

dence obtained from small-sample validity studies (Hirsch,

Schmidt, & Hunter, 1986; Schmidt, Hunter, Croll, & McKen-

zie, 1983). The traditional emphasis placed on criterion-related

evidence may suggest only that evidence largely based on judg-

ment is more likely to be questioned because of the widely held

belief that judgments are inherently fallible. It may also suggest

that people do not fully realize the subjective nature of judg-

ments about the relevance of criterion measures.

In some discussions of validity, an appeal is made to follow

"professionally accepted procedures" for generating evidence

(Lawshe, 1985). It is easily argued that there is nothing ap-

proaching a specific, unambiguous set of professionally ac-

cepted standards for determining the validity of inferences from

test scores (Landy, 1986). Even casual examination of legal tes-

timony and professional literature indicates that no consensus

exists regarding issues of content-related validation (Kleiman

& Faley, 1978); concurrent versus predictive validation (G. V.

Barrett, Phillips, & Alexander, 1981; Guion & Cranny, 1982;

Schtnitt, Gooding, Noe, & Kirsch, 1984); adequate sample

sizes (Monahan & Muchinsky, 1983; Schmidt, Hunter, & Urry,

1976); validity generalization (Burke, 1984; Callender & Os-

burn, 1980; Gutenberg, Arvey, Osbura, & Jeanneret, 1983;

James, Demaree, & Mulaik, 1986; Schmitt & Noe, 1986); and

criterion development (G. V. Barrett & Kernan, 1987; Kleiman

& Durham, 1981), to mention only a few. As a matter of fact,

the validation of certain inferences is actually being made by

the courts rather than by professionals who regularly use the

measuring instruments upon which these inferences are based.

For instance, from a breathalyzer reading of. 10, the courts infer

that driving ability is impaired. This may not be a valid infer-

ence, even though the test, by all psychometric and professional

standards, may be valid for measuring blood alcohol levels. A

number of the same issues will likely arise regarding the use of

polygraphs, drug tests, and genetic screening. Similarly, some

of the early courts attempted to mandate that a certain validity

coefficient, per se, made a test valid or not. Clearly, the process

of drawing inferences from test scores is a very complex one,

particularly if one considers the interrelated roles of technical,

practical, and legal opinion.

Which Inference Is Which?

Throughout this article, the goal has been to delineate the

inferences that logically underlie the process of validating psy-

chological constructs and measurement procedures. However,

there is another sense in which multiple inferences are discussed

by personnel decision makers. There are many potential infer-

ences about future job behaviors that may be drawn from the

same test scores. These multiple inferences about future job be-

haviors should not be confused with the inferences represented

in Figures 1 and 2. The many different inferences about future

job behavior are merely specific examples of Inference 9 in Fig-

ure 2. It is possible to conceive of tests that yield scores that are

unquestionably valid indicators of some underlying, theoreti-

cally meaningful construct (see Ebel's, 1961, discussion of mea-

surement in the physical sciences). Of course, what is valid is

the inference that test scores reflect differences in the construct

(e.g., Inference 6), but this is conceptually quite different from

inferences about future job behavior drawn from test scores (In-

ference 9). Guion (1974, 1987) highlighted this distinction by

differentiating between job relevance and validity of trait mea-

surement. With enough theoretical and empirical corrobora-

tion, it can be confidently concluded that test scores and con-

struct differences covary systematically. Therefore, test scores

make valid inferences about the construct possible. Yet, one

may attempt to infer whether a person who scores in a particu-

lar way on the test will perform in a certain way on a job, train-

ing program, and so on, but these are quite different kinds of

inferences. The test may not be valid for some or all of these

purposes, because each implies a different criterion to be pre-

dicted. Likewise, the inferences about job performance may

be valid, but inferences about other outcomes (e.g., tenure)

may not.

It is important to make this distinction to emphasize that a

test can be construct valid (in the sense that it validly measures

a given construct) and yet certain inferences about future be-

havior may not be valid. For selection purposes, then, this test

would not be construct valid in the traditional Title VII sense.

It is this differential past usage of the term construct validity

that motivated Guion (1980) to "identify the unifying concept

of validity as similar, but not necessarily identical, to what has

VALIDITY OF PERSONNEL DECISIONS 485

Predictor L !

Muiure 1 Criterion

Measure

Figure 3. A modified framework detailing the

inferences for criterion development.

been meant by construct validity" (p. 393). Perhaps some fu-

ture confusion could be allayed by using the term validity in

the unifying sense to refer to the justifiable confidence in our

selection decision. Canstruct-related validity should be reserved

for references to a particular evidential approach to demon-

strating validity that focuses on justifying certain critical con-

struct-measure and construct-construct inferences (Inferences

6 and 7).

Construct-Related Validity of the Criterion

The discussion thus far has reflected a common view of vali-

dation. That is, personnel specialists generally place more em-

phasis on validity of the predictors because the overriding or-

ganizational imperative is to gather predictor information as

the basis for important and inevitable selection decisions. How-

ever, we would like to call attention to Cascio's (1987) statement

that "in order to emerge from the 'dark ages' we need clear

thinking, in-depth theorizing about criteria, and identification

of the goals of criterion measurement" (p. 51). The importance

of this statement should become clearer on examination of Fig-

ure 3, which represents an adaptation of the validation para-

digm presented in Figure 2. The model presented in Figure 3

was designed specifically to link several traditional concepts

unique to personnel decision making and to highlight the con-

ceptual differences between predictor construct and perfor-

mance domains.

Note that the systems of inferences detailed in Figures 1, 2,

and 3 are logically symmetrical. Therefore, the issues raised

when discussing validity of the predictor are equally important

for validating criterion measures (Frederiksen, 1986). The ca-

veat proposed here is that criterion measures must be validated

analogously to predictors (Guion, 1961, 1976, 1987; James,

1973), with reference to the inferential linkages being sup-

ported by evidence. The importance of this point is often under-

estimated.

In a typical selection situation, again, Inference 9 is the criti-

cal inference for which confirming evidence is required. The

validation process involves accumulating evidence of various

forms to justify Inference 9 in either a direct empirical way (e.g.,

validity coefficients, contrasted groups, or test construction

analyses) or more judgmentally by confirmation of Inferences

6,7, and 8. Inference 7 represents whether a specific psychologi-

cal construct underlies job performance, whereas Inference 8

represents whether the operational criterion samples the perfor-

mance domain. Generating evidence for Inferences 7 and 8 is

the process of accumulating construct-related evidence of crite-

rion validity.

The present framework helps to identify possible loci for the

criterion problem. It results from a tendency to truncate the no-

mological network (specifically, Inferences 7, 8, and 10), which

in turn leads to a myopic view of criterion validity. Two interre-

lated effects of this myopia are likely to result. First, the develop-

ment of criterion measures is likely to be less psychometrically

rigorous than predictor development. Wiggins (1973) stated

that "basically, the 'problem' resides in the considerable dis-

crepancy that typically exists between our intuitive standards

of what criteria of performance should entail and the measures

that are currently employed for evaluating such criteria" (p.

39). Second, performance criteria are likely to be less deeply or

richly embedded in networks of theoretical relationships than

are constructs on the predictor side. Perhaps this state of affairs

has resulted partially from the differences between research for

administrative prediction and research for scientific under-

standing (Anderson & Shanteau, 1977; Loevinger, 1957). Re-

search for prediction tends to ignore the importance of multide-

terminant functional relationships between variables.

The value of an employee's behavior or accomplishments to

an organization is ultimately a relative value judgment by some

member or members of the organization (Fiske, 1951). Stated

strongly, Campbell (1983) maintains that "the meaning of per-

formance is not something to be 'discovered'; it should be im-

posed" (p. 286). As such, it is amenable to different interpre-

tations depending on who is making the judgment. As dominant

coalitions or critical alliances (Weick, 1979) shift and the orga-

nization's values change, so do normative judgments of an em-

ployee's worth (Guion, 1961). As a result, it is less likely that

the systematic procedures that characterize professional test

development will be applied to criterion development (Banks

& Roberson, 1985). Once a test is rigorously developed, it has

perceived potential for long-term usefulness in various predic-

tion and assessment applications. The same kind of rigor in the

development of criterion measures, even assuming that the or-

ganization would "foot the bill," might quickly lose its utility

with a change in values that often occurs with a change in the

organization's leadership. Also, idiosyncratic values regarding

behavior in different organizations logically require customized

criterion measurement systems. Still another factor contribut-

ing to a lack of substantive criterion development is the long-

held belief in the dynamic nature of criteria (Ghiselli, 1956).

This conventional, yet perhaps erroneous (G. V. Barrett, Cald-

well, & Alexander, 1985), belief that performance determinants

486 JOHN F. BINNING AND GERALD V. BARRETT

change significantly over time logically mandates a greater ex-

penditure of resources for criterion development than many are

willing to accept. One result is that little emphasis is likely to

be placed on research as a means of accumulating knowledge

about the appraisal system (Smith, 1976). For these and many

other economic and logistic reasons, it can be assumed that cri-

terion measures are not generally likely to command the con-

cern for rigor necessary for optimal development. Also note that

this issue of rigor in behavioral criterion development is not

unique to personnel selection research (O'Grady, 1982). Rush-

ton, Brainerd, & Pressley (1983) analyzed behavioral criterion

deficiency in 12 major areas of psychological research and con-

cluded that it is a formidable and pervasive problem.

There is a more fundamental conceptual basis for assuming

that criterion development is likely to be deficient. As Figure 3

illustrates, there are three inferences linking the psychological

construct required for job performance and the operational cri-

terion measure. The truth of Inference 8 is to some extent em-

pirically testable by the construct-related validation procedures

discussed earlier. However, Inference 7, linking the performance

construct with the underlying psychological construct, is justi-

fiable only through rational deductive analysis (Cascio, 1987).

This inference must be based on the judgments of certain peo-

ple. On the one hand, the criterion must be defined by organiza-

tion leaders who are responsible for formulating and translating

valued organizational outcomes. On the other hand, selection

specialists are required to infer from job analytic data the pre-

dictor constructs required for job performance. Incidentally,

Smith (1976) pointed out that the translation of goals to valued

behavior should also be validated (Inference 11). This mandate

for collaborative decision making between various professional

groups has obvious implications for the quality of the resulting

criterion measurement system.

When considering construct-related validation of the crite-

rion, unique conceptual and practical issues do arise. On the

predictor side, a test is constructed to sample certain criteria!

behaviors (Messick, 1980) that are specified by the psychologi-

cal construct theory and judged to be indicators of a specific

construct or set of constructs. Criterion measures, likewise, are

developed to be samples of an underlying behavior domain. No-

tice that the relative position of the psychological construct do-

main and performance domain has been changed in Figure 3.

The framework proposed here portrays psychological con-

structs as being more deeply embedded in the nomological net-

work and are more fruitfully conceptualized as labels for behav-

ioral regularities that underlie behavior both sampled by the

predictor and in the performance domain as sampled by the

criterion.

Delineating the Performance Domain

Two prevalent ways of conceptualizing performance domains

are discussed in the performance appraisal literature. One

school of thought places relative emphasis on a conceptualiza-

tion of performance domains as collections of overt job behav-

iors (e.g., Borman, 1983), whereas the other places relative em-

phasis on outcomes or results (e.g., Kane, 1986). The former is

motivated by concern for developing psychological theories that

capture behavioral regularities important to organizational

functioning. The latter recognizes the importance of goal at-

tainment to organizational functioning.

We join others in stressing the inextricable relationship be-

tween job behaviors and outcomes. We propose that perfor-

mance domains are composed of behavior-outcome units. Out-

comes are valued by the organization, and behaviors are the

means to these valued ends. As a result, behaviors take on

different value, depending on the value of their consequent out-

comes. Therefore, optimal description of the performance do-

main for a given job requires careful and complete delineation

of valued outcomes and the accompanying requisite behaviors

(Fine, 1986; James, 1973).

The behavior versus outcome distinction is reflected in the

distinction between composite and multiple criterion models.

The relative merits of these models have been examined in de-

tail elsewhere (Brogden & Taylor, 1950; J. P. Campbell, Dun-

nette, Lawler, & Weick, 1970; Carroll & Schneier, 1982; Dun-

nette, 1963; Guion, 1965; James, 1973; Schmidt & Kaplan,

1971; Smith, 1976; Thorndike, 1949). The important differ-

ence between these models is often viewed as whether different

types of operational criterion information should be combined

or not. For this analysis, however, a more fundamental differ-

ence between the two models is the way in which performance

domains are conceptualized. The composite criterion model im-

plies a unitary (and often economic) conception reflecting an

employee's total worth to an organization. As a result, opera-

tional criteria are designed to reflect the underlying domain by

sampling the "(economic) end products of job behaviors"

(Schmidt & Kaplan, 1971, p. 424; parentheses added). In con-

trast, the multiple criterion model conceptualizes performance

as a behavioral domain within which some behaviors are more

valuable than others for achieving organizational goals. Opera-

tional criteria developed to tap this domain are more behavior-

ally oriented, focusing on individual incidents or dimensions

of actual job behaviors that lead to the attainment of valued

organizational outcomes.

It is important to emphasize that these notions of total eco-

nomic worth versus performance behavior domain are no less

hypothetical constructions than, for instance, intelligence or

reading ability (Schwab, 1980). The fact that job performance

can be described in both ways is reflected in the job analysis

literature by reference to job-oriented (what is accomplished)

versus worker-oriented (what is done to accomplish) bases for

job description (Cummings & Schwab, 1973; McCormick,

1976). However, the relevance of this distinction for criterion

development has generally been unsystematically examined. In

the discussion that follows, we discuss in greater detail the pro-

cedures used to justify Inferences 7, 8, 10, and 11.

Generating Evidence of Criterion Validity

Job analysis provides the evidential basis for justifying Infer-

ences 7, 8, 10, and 11. Most personnel professionals are quick

to agree that systematic job analysis provides the prerequisite

data base for all subsequent selection activities. Yet, perhaps no

other professional activity is better characterized by the idio-

syncratic use of unstandardized procedures and lack of general

VALIDITY OF PERSONNEL DECISIONS 487

principles to guide data collection (Tenopyr, 1986). Clearly, the

proliferation of job analysis procedures is ample testimony to

the conclusion that very little in the way of standard job analytic

procedures exists. Regardless of the reasons for this lack of stan-

dard practice, one result is a relative dearth of both conceptual

and empirical guidelines for adequately justifying the critical

inferential linkages (critical Inferences 7,8, 10, and 11).

Job analysis involves examining job demands and translating

them into behavior-outcome units that define the performance

domain and that subsequently make optimal person-job

matches possible. Inference 10 represents the extent to which

the actual job demands have been adequately analyzed, result-

ing in a valid description of the performance domain. The pro-

cess of substantiating Inference 10 is commonly referred to as

job description. There are at least two fundamental reasons to

suspect the validity of Inference 10 in most applied selection

situations. First, fully adequate taxonomies of job characteris-

tics, which are required for proper delineation of the perfor-

mance domain, have yet to be developed (Fleishman, 1975;

Fleishman & Quaintance, 1984). Second, most jobs are accu-

rately characterized as collections of demands with associated

behavioral universes with fuzzy, if not indeterminant, bound-

aries (Weick, 1979), making their unequivocal delineation logi-

cally impossible.

Inference 11 represents the extent to which behavior-out-

come links have been substantiated. Again, job analysis is the

process of discovering and specifying these links. Some job

analysis procedures more systematically delineate behavior-

outcome links than do others. For example, the critical inci-

dents technique (Flanagan, 1954) formally elicits organization-

ally valued outcomes and systematically ties job behaviors to

these. Functional job analysis also formally assesses these link-

ages through group interviews of subject matter experts (Fine,

1986). Regardless of which method is used, to the extent that

job analysis is conducted without explicating behavior-out-

come links, the validity of Inference 11 is suspect.

Inference 8 links an operational criterion with the perfor-

mance domain. As such, it represents the inference that the op-

erational criterion validly measures the performance domain.

This process is what is commonly referred to as criterion devel-

opment. When the multiple criterion model is used to guide

criterion development, job analysis data in the form of worker-

oriented (what is done to accomplish) behavior requirements

are useful for justifying this inferential linkage. When the com-

posite criterion model is used to guide criterion development,

job analysis data in the form of job-oriented (what is accom-

plished) behavior requirements are most useful. In either case,

justification of Inference 8 typically takes the form of (a) claims

on the part of the job analyst that all major behavioral dimen-

sions or outcomes have been identified and are represented in

the operational criterion measure (e.g., performance rating in-

strument or objective indexes) and, occasionally, (b) psycho-

metric evidence of accuracy or lack of bias in indexes of job

performance (Dickinson, 1987; Kleiman& Durham, 1981). In

other words, criterion measures are usually validated (i.e., evi-

dence for Inference 8 is generated) by rational, albeit tacit,

claims about the content-related evidence of validity. The posi-

tion advanced in this article is that sole reliance on content-

related evidence of criterion validity necessarily means that the

evidential base is deficient relative to the numerous other forms

of evidence available. This is particularly evident in light of

Feldman's 11986) call for a taxonomy of appraisal tasks. He is

pointing out that different types of tasks influence the manner

in which appraisal judgments are made. He goes on to examine

the differences in how these judgments are validated.

The conventional practice of relying solely on single criterion

measures and methods, whose content is often rather unsystem-

atically determined, inevitably leads to many validation efforts

with questionable criterion validity (Guion, 1976). This issue

has been addressed over the years and labeled the criterion

problem. However, except for James's (1973) exposition, little

in-depth analysis of the conceptual issues surrounding the crite-

rion problem has been advanced. Suffice it to say that in many,

if not most, validation situations, the validity of Inference 8 is

suspect, which in turn weakens conclusions about the validity

of other inferences in the system.

Inference 7 is likewise typically justified by the job analyst's

claim that from a specific job analysis, he or she has inferred the

requisite psychological constructs that underlie performance.

This process is commonly referred to as deriving job specifica-

tions. Job analysis data in the form of ability requirements

(Dunnette, 1976; Fleishman, 1982;Pearlman, 1980) are useful

for justifying this inference. Some important theoretical strides

have been made in establishing both theoretical and empirical

linkages between job behaviors and underlying attributes (e.g.,

Fine, 1986;Fine&Wiley, 1971; Fleishman, 1978,1982; Lopez,

Kesselman, & Lopez, 1981; McCormick, 1976). However, in

practice, it is not uncommon for Inference 7 to be informally

justified by job analysts' judgments. Sole reliance on this induc-

tive approach (Bass & G. V. Barrett, 1981) means that the valid-

ity of Inference 7 is suspect whenever these judgments are not

based on current knowledge of construct-behavior relations

and sound reasoning about criterion development. Clearly,

Dunnette's (1976) call to link "the two worlds of behavioral

taxonomies" (p. 514) is still operative.

In addition, personnel specialists must adopt a broader view

of what qualifies as relevant empirical evidence for criterion

linkages. For example, a program designed to train critical job

skills, which is then evaluated by using job performance cri-

teria, provides criterion-related evidence for Inference 7. The

point here is that a wealth of empirical evidence supporting cri-

terion inferences might be more systematically derived from the

extant training literature. This is particularly relevant in those

cases in which training has altered psychological attributes gen-

erally regarded as enduring and less amenable to change. These

cases would be more relevant because selection programs are

more often designed to assess relatively stable constructs be-

cause of the perceived impracticality of trying to change them

through training.

Given that Inferences 7 and 8 may often be justified on tenu-

ous evidential bases, the model presented in Figure 3 leads logi-

cally to an intriguing conclusion regarding the relative superior-

ity of criterion-related versus construct-related evidence of va-

lidity for selection decisions. From this perspective, Inference

5 is a surrogate for the fundamental Inference 9, which links

predictor information to an applicant's true performance in the

488 JOHN F. BINNING AND GERALD V. BARRETT

organization. It is this inference for which validity evidence is

ultimately sought. Taking this logic one step further, to the ex-

tent that Inference 8 is questionable, empirical evidence of In-

ference 5 is not as relevant to Inference 9. Sound evidence of

Inferences 6 and 7 would provide a much more substantive jus-

tification of Inference 9, in this instance. Most selection special-

ists would find it rather easy to recall situations in which their

confidence in Inference 5 (as an index of Inference 9) was se-

verely weakened by information about the deficiency or con-

tamination of a poorly developed criterion measure. Yet, in the

same situation, use of an established assessment instrument, in

combination with a rigorous rationale for why performance re-

quires certain psychological constructs, would provide a firmer

evidential basis on which to conclude the validity of resulting

decisions. This is a case in which construct evidence of validity

is superior to criterion-related evidence. Although this has been

suggested by others, we contend that the lack of critical analysis

of Inferences 5 and 8, which characterizes most validation re-

search, has caused a dramatic underestimate of the frequency

with which construct-related evidence is judged superior to cri-

terion-related evidence.

In the next section, the personnel selection framework pre-

sented in Figure 3 is broadened. Recommendations for future

conceptions of the validation process are then discussed in this

context. It is important to remember that although terminology

will change somewhat and different inferences will be empha-

sized, the process is essentially one of traditional construct vali-

dation. It should become clear from this perspective that the

science of psychology as applied to personnel decision making

involves the development of theories, validation of constructs,

and generation of evidence to support important inferences

about people and their behavior at work.

The Psychological Science of Personnel

Decision Making

Our contention is that the validation process discussed thus

far, if adequately adapted to the unique needs of personnel se-

lection, provides a broader framework for expanding concep-

tions of validity. Thus far, however, our discussion of critical

inferential linkages has resulted in a focus on a more narrow

conception of the validation process than is ultimately desir-

able. We now present a broader view of the nomological frame-

work relevant for developing theory within personnel psychol-

ogy. This framework is schematically presented in Figure 4. The

left side of Figure 4 represents the more traditional notion of

construct validity described by Cronbach and Meehl (1955).

The center of Figure 4 represents the focus of greatest interest

to applied decision makers. Inferences 5, 6, 7, and 8 are of par-

ticular concern because of their direct relevance for justifying

Inference 9. That is, they are relevant for determining the extent

to which inferences from scores on a test of some predictor con-

struct allow predictions of actual job behavior.

Inference 9 is of utmost importance to applied decision mak-

ers. Empirical evidence of Inference 5 provides partial support

of Inference 9 and can be conceived of as a special case of Infer-

ence 13. Evidence supporting Inference 5 can be direct and take

the form of empirically observed relationships. Messick (1980),

when discussing construct-related validity, stated that "some of

the constructs nomological relations thus become criterial

when made specific to the applied setting" (p. 1019). He added

that these predictive relationships are singled out for special at-

tention under the rubric of criterion-related validity and differ

from general nomological relations in being more narrowly fo-

cused on specific sets of data and specific applied settings. Sim-

ilarly, Cook and D. T. Campbell (1979> explicitly stated that

priorities regarding validity issues are fundamentally different

between theoretical and applied researchers. From this perspec-

tive, it can be seen that criterion-related validity evidence is ap-

propriately viewed as a special type of convergent (or discrimi-

nant) evidence of construct-related validity. One can also gener-

ate various indexes of content overlap to support Inference 9.

Evidence of Inference 9 can also be indirect and take the form

of convergent and discriminant relationships between the com-

ponents linked by Inferences 6, 7, and 8. Extending this logic

beyond the original focus, Inference 6 is strengthened by evi-

dence of Inferences 11, 12, 13, and so forth.

The right side of Figure 4 represents construct-related validity

of the criterion. As mentioned earlier, performance domains

have traditionally not been as deeply embedded in networks of

theoretical relationships as constructs on the predictor side.

Theorists and practitioners need to be increasingly aware of the

need to empirically investigate linkages of Inferences 5, 16, 17,

and 18 as evidence to support Inferences 7,8,14, and 15(Vance,

MacCallum, Coovert, & Hedge, 1988).

Traditionally, the focus has been almost exclusively on Infer-

ence 5 through correlations between test scores and a criterion

measure, occasionally on Inference 17 through development of

alternative criterion measures (e.g., Alexander &Wilkins, 1982; •

Cascio & Valenzi, 1978; Holzbach, 1978; Lee, Malone, &

Greco, 1981), and on Inference 8 through assessment of rating

accuracy by "true scores" of performance domains (e.g., Ber-

nardin & Buckley, 1981; Borman, 1979; Hedge & Kavanagh,

1988). Recently, Heneman (1986), in a comparison of supervi-

sory ratings and results-oriented performance indexes, called

for greater emphasis on convergent evidence of criterion valid-

ity. Similarly, James (1973) called for more emphasis on the

three levels of criterion measurement proposed by J. P. Camp-

bell et al. (1970), namely, job behaviors, results, and organiza-

tional outcomes (Smith, 1976).

A Social-Cognitive Perspective on Job Analysis and

Criterion Development

We join others in calling for renewed interest in more rigor-

ous, conceptually coherent criterion development. One impor-

tant issue is that job analysis efforts need to be directed more

at capturing the reality of the organizational context in which

criterion judgments are actually made (Feldman, 1986; Stern,

Stein, & Bloom, 1956; Wiggins, 1973). Our contention is that

typical job analyses produce information that is useful for de-

veloping explicit performance criteria, yet is potentially irrele-

vant to the implicit criteria that often may be used to evaluate

day-to-day performance or promotability (Turnage & Muchin-

sky, 1984). To the extent that the validity of Inference 10 is ques-

tionable, all other inferences in the system are questionable.

VALIDITY OF PERSONNEL DECISIONS 489

Underlying Psychological Construct Domain

Figure 4. An elaborated model for personnel decision research.

Increased concern for the validity of Inference 10 has moti- vated considerable research in recent years on a variety of social and cognitive factors affecting job analysis data (Arvey, Davis, McGowen, & Dipboye, 1982; Cornelius, DeNisi, & Blencoe, 1984; DeNisi, Cornelius, & Blencoe, 1987; Friedman & Har- vey, 1986; Green & Stutzman, 1986; J. E. Smith & Hakel, 1979). The validity of both Inferences 7 and 8 is dependent on

job analytic data used to translate performance behavior into measurable criterion elements and to delineate overlap with predictor constructs. Better understanding the judgments that underlie perceptions of jobs and performance may contribute to improved criterion development (Guion, 1986). In other words, the basic cognitive processes that underlie perceptions of people may also underlie the perceptions of jobs, and there- fore the vast research on person perceptions can be integrated and generalized to enhance our understanding of the determi- nants of job perceptions (e.g., Binning, Zaba, & Whattam, 1986; Cantor &Mischel, 1979; Cleveland ALandy, 1983;Coo- per, 1981; Feldman, 1981, 1986; Funder, 1987; Lord, 1985a, 1985b;Swann, 1984).

A particularly integrative approach is exemplified by Cantor, Mischel, and Schwartz's (1982) prototype assessment of psy- chological situations. They assessed peoples' prototypical be- liefs about person-action combinations in common situations and found considerable consistency across people. This ap- proach could be adapted to the study of job prototypes and their effects on job analysts' perceptions of as well as incumbent performance. Similarly, a considerable amount of job design research exists showing that social information affects percep- tions of task characteristics (Salancik & Pfeffer, 1978; Thomas &Griffin, 1983; Weiss & Shaw, 1979).

At a more molar level, criterion development is largely a so- ciopolitical process and therefore deserves greater attention from this perspective (e.g., Katz & Kahn, 1966; Longenecker,

Sims, & Gioia, 1987; Mitchell & Linden, 1982; Weick, 1979). Programs designed to train raters of performance could also benefit from the integration of research described earlier (Ber- nardin & Buckley, 1981; Borman, 1979; Landy & Farr, 1980).

Expanding on the framework presented, Inference 16 might involve relating specific performance criteria to measures of nonjob behaviors that are theoretically expected to relate to per- formance behaviors in some specified way (e.g., Blau, 1985; Rousseau, 1978; Youngblood, 1984). Youngblood's (1984) study of work and nonwork explanations for absenteeism might exemplify this approach. He found that absenteeism could be explained by the importance of leisure activities engaged in away from the work setting. In a similar vein, perhaps successful managers are more proficient at organizing successful family vacations than are their less competent organizational counter- parts. The point we are making here is that the delineation of behavioral domains can be conceptualized by reference to the- ory surrounding psychological constructs (predictor side) or "theory" surrounding job performance (criterion side). Mean- ingful behavioral regularities may be discovered by investigat- ing relationships between either type of behavioral domain.

Methods used by personality and social psychologists could be adapted for the study of these criterion relationships. For instance, Mischel (1984), Funder (1987), and Funder and Col- vin (1988) reviewed studies relating both lay perceptions and objective measures of personality to independent measures of behavior gathered from peers and family members. Relation- ships between work and nonwork behavior could be investi- gated in an analogous manner. Similarly, the logic of biographi- cal data could be adapted to criterion research. Although bio- data instruments are typically used for prediction purposes, data about nonwork behavior could be collected using an analo- gous questionnaire format. For example, in attempting to verify the accuracy of biographical information, Shaffer, Saunders, &

490 JOHN F. BINNING AND GERALD V. BARRETT

Owens (1986) compared respondents' and parents' descrip-

tions. Extending this to criterion validation, family and peer

descriptions of nonwork behavior could be compared with per-

formance behavior We call attention to Weick's (1979) conten-

tion that "events inside organizations resemble events outside

organizations; sensitivities of the worker inside are continuous

with sensitivities of the worker outside" (p. 31).

In the present framework, the systematic discovery of replica-

ble relationships between measures of performance and non-

work behavior (or reactions) would strengthen the construct-

related validity of performance measurement. In a similar vein,

multivariate studies relating alternative performance criteria to

alternative predictors (Inference 18) are also desirable. We have

now come logically full circle. Figure 4 should eventually be-

come spherical in shape as the nomological network linking

psychological constructs and performance domains becomes

more fully articulated and interwoven.

Another Call for Experimenting Organizations

Perhaps the greatest advancement for the science of person-

nel psychology will come only when the values driving organiza-

tional administrators' decisions about behavioral science re-

search are changed. For many reasons, the behavioral sciences

have what Staw (1977) described as a "center-periphery" rela-

tionship to the administrative users of scientific knowledge. He

contends that new knowledge is created by researchers "who

are presumably at the center of knowledge" (p. 426) and it is

considered their responsibility to disseminate expertise to or-

ganizational users in a prepackaged, "formula-like" fashion.

Failures of behavioral science interventions are thus more likely

to be attributed to deficiencies in science and knowledge rather

than inappropriate expectations for generalizability. Staw

(1977) envisioned a much healthier relationship in which the

seat of innovation is at the local organizational level. This shift

in values reorients the role of behavioral science research so

that it lies in the periphery as a resource to guide organizational

experimentation. Concomitant with this reorientation is a shift

in the educational role of behavioral science and the manner

in which knowledge is disseminated. Rather than persuading

practitioners to adopt a particular theory or planned interven-

tion, psychologists' efforts should be directed more toward sell-

ing the benefits of experimenting organizations, where ongoing,

systemwide, multivariate research is made an integral part of

organizational functioning. Consequently, the psychologist's

role would increasingly involve training practitioners in re-

search evaluation skills.

A concomitant shift from summative to formative evaluation

(Staw, 1977) is also desirable. The process of inferring whether

a specific program or intervention has worked or has had a posi-

tive effect is referred to as summative evaluation. A more itera-

tive and ongoing process of selecting program goals and build-

ing organizational interventions is referred to as formative

evaluation. Formative evaluation implies the successive approx-

imation of desired organizational systems, built through a series

of trials in which failures are considered as informative as suc-

cesses. In this new research context, the term failure merely

implies some unpredicted outcome or result, equally useful for

refinement in the next stage of program development. It is nec-

essary to change typical organizational values so that systems

can be developed to effectively monitor, provide feedback, and

utilize negative as well as positive data.

The creation of experimenting organizations could have vast

implications for personnel selection research. Greater emphasis

would be placed on large-scale, programmatic research involv-

ing the melding of laboratory and field settings (G. V. Barrett,

1972; Flanagan & Dipboye, 1981). This could lead not only to

richer, more efficient theory development but also better under-

standing of longitudinal changes in employee and job character-

istics. For instance, Helmreich, Sawin, and Carsrud (1986)

demonstrated the effects of predictor-criterion time lags on the

predictive power of personality characteristics. They argued

that personality traits have their most potent effects on job per-

formance only after considerable time on a job. Longitudinal

data gathering is important for criterion measurement as well.

Meyer (1987) presented data suggesting that cognitive ability

tests are more predictive of managerial promotional progress

over time than of supervisory ratings of performance at a given

point in time. The same general issue can be raised regarding

social influence effects that result from any organizational inter-

vention. Administrators' preoccupation with one-shot, short-

term identification of successful selection procedures has most

likely masked many useful approaches to employee selection.

Experimenting organizations also would create an environ-

ment in which macro and micro issues could be more systemat-

ically integrated. For example, little work has been done to cre-

ate contingency models relating organizational structures with

job design and criterion development processes. Yet, Mintzberg

(1983) describes in detail how organizations' structural param-

eters affect individual-level control mechanisms and job charac-

teristics. Some organizational structures (e.g., machine bureau-

cracies) contain jobs that are more amenable to the multiple

criterion model. The primary coordination mechanism in ma-

chine bureaucracies is standardization of work processes. This

is possible because jobs are highly routine, and behavior-out-

come links are explicitly understood and programmed. In other

types of organizations (e.g., professional bureaucracies, adhoc-

racies) the composite criterion model may be more appropriate.

In these types of organizations (or parts of organizations) where

work processes are more complex and less programmable, co-

ordination is achieved through other mechanisms such as stan-

dardization of work output, necessitating a performance do-

main that comprises outcomes rather than behaviors. In still

other types of jobs such as higher level managerial jobs, neither

work processes nor work outputs are specifiable a priori (J. P.

Campbell, 1983; Feldman, 1986; Palermo, 1983). In these in-

stances, coordination is achieved primarily through standard-

ization of input skills and knowledge. In these jobs, an individu-

al's worth to the organization might be more fruitfully indexed

by assessment of changes in job-related knowledge. This logic

underlies trait-based and competency-based assessment (Sokol

& Oresick, 1986). We note that it is conceptually flawed to refer

to trait-based performance appraisal, because job performance

is not being appraised. Personal characteristics are being as-

sessed with the assumption that they will reflect an individual's

worth to the organization. Rather than assessing an individual's

VALIDITY OF PERSONNEL DECISIONS 491

contribution to organizational goal attainment, the potential of

such attainment is assessed by using the same logic as personnel

selection.

Micro-macro integration is desirable on the predictor side as

well. A basic tenet of Schneider's (1987) altraction-selection-

attrition framework is that macro-organizational structure

differences can be best understood through types or profiles of

individual employee characteristics. We believe that the cre-

ation of experimenting organizations will ultimately do more

to enhance our movement from test validation to selection re-

search (Guion, 1976), making it more likely that dynamic, mul-

tivariate relationships can be fruitfully understood and used to

enhance the quality of staffing decisions and ultimate organiza-

tional effectiveness.

References

Alexander, E. R., & Wilkins, R. D. (1982). Performance rating validity:

The relationship of objective and subjective measures of perfor-

mance. Group and Organization Studies, 7,485-496.

American Psychological Association. (1985). Standards for educational

and psychological testing. Washington, DC: Author.

American Psychological Association, American Educational Research

Association, & National Council of Measurement Used in Education

(joint committee). (1954). Technical recommendations for psycho-

logical tests and diagnostic techniques. Psychological Bulletin, 51,

201-238.

Anderson, N. H., & Shanteau, J. (1977). Weak inference with linear

models. Psychological Bulletin, 84, 1155-1170.

Arvey, R. D., Davis, G. A., McGowen, S. L., & Dipboye, R. L. (1982).

Potential sources of bias in job analytic processes. Academy of Man-

agement Journal, 25, 618-629.

Banks, C. G., & Roberson, L. (1985). Performance appraisers as test

developers. Academy of Management Review, 10, 128-142.

Barrett, G. V. (1972). Symposium: Research models of the future for

industrial and organizational psychology. Personnel Psychology, 25,

1-17.

Barrett, G. V., Alexander, R. A., O'Connoi; E. J., & Forbes, J. B. (1978).

Values and professional judgment in validating and litigating tests for

civil service positions. Professional Psychology, 9, 137-144.

Barrett, G. V., Caldwell, M. S., & Alexander, R. A. (1985). The concept

of dynamic criteria: A critical reanalysis. Personnel Psychology, 38,

41-56.

Barrett, G. V., & Kernan, M. C. (1987). Performance appraisal and ter-

minations: A review of court decisions since Brito v. Zia with implica-

tions for personnel practices. Personnel Psychology, 40, 489-504.

Barrett, G. V., Phillips, J. S., & Alexander, R. A. (1981). Concurrent and

predictive validity designs: A critical reanalysis. Journal of Applied

Psychology, 66, 1-6.

Barrett, R. S. (1980). Is the test content-valid: Or, does it really measure

a construct? Employee Relations Law Journal, 6, 459-475.

Bass, B. M., & Barrett, G. V. (-1981). People, work, and organizations.

(2nd ed.). Boston: Allyn & Bacon.

Bernardin, H. J., & Buckley, M. R. (1981). A consideration of strategies

in rater training. Academy of Management Review, 2, 205-212.

Bernardin, H. J., & Smith, P. C. (1981). Clarification of some issues

regarding the development and use of behaviorally anchored rating

scales (BARS). Journal of Applied Psychology, 66, 458-463.

Binning, J. F., Zaba, A. J., & Whattam, J. C. (1986). Explaining the

biasing effects of performance cues in terms of cognitive categoriza-

tion. Academy of Management Journal, 29, 521-535.

Blau, G. J. (1985). Relationship of extrinsic, intrinsic, and demographic

predictors to various types of withdrawal behaviors. Journal of Ap-

plied Psychology, 70,442-450.

Borman, W. C. (1979). Format and training effects on rating accuracy

and rating errors. Journal of Applied Psychology, 64, 410-421.

Borman, W. C. (1983). Implications of personality theory and research

for the rating of work performance in organizations. In F. Landy, S.

Zedeck, & J. Cleveland (Eds.), Performance measurement and theory

(pp. 127-172). Hillsdale, NJ: Erlbaum.

Brogden, H. E., & Taylor, E. K. (1950). The dollar criterion. Applying

the cost accounting concept to criterion construction. Personnel Psy-

chology. 3, 133-167.

Burke, M. J. (1984). Validity generalization: A review and critique of

the correlational model. Personnel Psychology, 37,93-116.

Callender, J. C, & Osburn, H. G. (1980). Development and test of a

new model of validity generalization. Journal of Applied Psychology,

65, 664-670.

Campbell, D. T, & Fiske, D. W. (1959). Convergent and discriminant

validation by the multitrait-multimethod matrix. Psychological Bul-

letin, 56, 81-105.

Campbell, J. P. (1983). Some possible implications of "modeling" for

the conceptualization of measurement. In F. Landy, S. Zedeck, & J.

Cleveland (Eds.), Performance measurement and theory (pp. 277-

298). Hillsdale, NJ: Erlbaum.

Campbell, J. P., Dunnette, M. D., Lawler, E. E., & Weick, K. E. (1970).

Managerial behavior, performance, and effectiveness. New York: Mc-

Graw-Hill.

Campion, M. A., & Thayer, P. W. (1985). Development and field evalua-

tion of an interdisciplinary measure of job design. Journal of Applied

Psychology, 70, 29-43.

Cantor, N., & Mischel, W. (1979). Prototypes in person perception. In

L. Berkowitz (Ed.), Advances in experimental social psychology (Vol.

12, pp. 3-52). New %rk: Academic Press.

Cantor, N., Mischel, W., & Schwartz, J. C. (1982). A prototype analysis

of psychological situations. Cognitive Psychology, 14,45-77.

Carroll, S. J., & Schneier, C. E. (1982). Performance appraisal and re-

view systems. Glenview, IL: Scott, Foresman.

Cascio, W. F. (1987). Applied psychology in personnel management (3rd

ed.). Englewood Cliffs, NJ: Prentice-Hall.

Cascio, W. E, & Valenzi, E. R. (1978). Relations among criteria of police

performance. Journal of Applied Psychology, 63, 22-28.

Cleveland, J. N., & Landy, F. J. (1983). The effects of person and job

stereotypes on two personnel decisions. Journal of Applied Psychol-

ogy, 68, 609-619.

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimenlation: Design

& analysis issues for field settings. Chicago: Rand McNally.

Cooper, W. H. (1981). Ubiquitous halo. Psychological Bulletin, 90,218-

244.

Cornelius, E. T. Ill, DeNisi, A. S., & Blencoe, A. G. (1984). Expert and

naive raters using the PAQ: Does it matter? Personnel Psychology, 37,

453-464.

Cronbach, L. J. (1970). Essentials of psychological testing (3rd ed.).

New York: Harper & Row.

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychologi-

cal tests. Psychological Bulletin, 52, 281-302.

Cummings, L. L., & Schwab, D. P. (1973). Performance in organiza-

tions: Determinants & appraisal. Glenview, IL: Scott, Foresman.

DeNisi, A. S., Cornelius, E. T, III, & Blencoe, A. G. (1987). Further

investigation of common knowledge effects on job analysis ratings.

Journal of Applied Psychology, 72, 262-268.

Dickinson, T. L. (1987). Designs for evaluating the validity and accu-

racy of performance ratings. Organizational Behavior and Human

Performance, 40, 1-21.

492 JOHN F. BINNING AND GERALD V. BARRETT

Doran, R. (1987). How to examine construct validity of item banks. Quality and Quantity, 21,139-151.

Drasgow, E, & Miller, H. E. (1982). Psychometric and substantive issues in scale construction and validation. Journal of Applied Psychology, 67, 268-279.

Dunnette, M. D. (1963). A note on the criterion. Journal of Applied Psychology, 47, 251-254.

Dunnette, M. D. (1976). Aptitudes, abilities, and skills. In M. D. Dun- nette (Ed.), Handbook of industrial and organizational psychology (pp. 473-520). Chicago: Rand McNally.

Dunnette, M. D., & Borman, W. C. (1979). Personnel selection and classification. Annual Review of Psychology, 30, 477-525.

Ebel, R. L. (1961). Must all tests1 be valid? American Psychologist, 16, 640-647.

Ebel, R. L. (1977). Comments on some problems of employment test- ing. Personnel Psychology,>.30,55-63.

Faley, R. H., & Sundstrom, E. (1985). Content representativeness: An empirical method of evaluation. Journal of Applied Psychology, 70, 567-571.

Feldman, 1. M. (1981). Beyond attribution theory: Cognitive processes

in performance appraisal. Journal of Applied Psychology, 66, 127- 148.

Feldman, J. M. (1986). Instrumentation and training for performance appraisal: A perceptual-cognitive viewpoint. In K. M. Rowland & G. R. Ferris (Eds.), Research in personnel and human resources man- agement (pp. 45-99). Greenwich, CT: JAI Press.

Fine, S. A. (1986). Jobanalysis. In R. A. Berk (Ed.), Performance assess- ment: Methods and applications (pp. 53-81). Baltimore, MD: Johns Hopkins University Press.

Fine, S. A., & Wiley, W. W. (1971). An introduction to functional job analysis. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research.

Fiske, D. W. (1951). Values, theory, and the criterion problem. Person- nel Psychology, 4, 93-98.

Flanagan, J. C. (1954). The critical incident technique. Psychological Bulletin, 51, 327-358.

Flanagan, M. F, & Dipboye, R. L. (1981). Research settings in indus- trial and organizational psychology: Facts, fallacies, and the future. Personnel Psychology, 34, 37-47.

Fleishman, E. A. (1975). Toward a taxonomy of human performance. American Psychologist, 30,1127-1149.

Fleishman, E. A. (1978). Relating individual differences to the dimen- sions of human tasks. Ergonomics, 21, 1007-1019.

Fleishman, E. A. (1982). Systems for describing human tasks. American Psychologist. 37, 821-834.

Fleishman, E. A., &. Quaintance, M. K. (1984). The description of hu- man tasks. New York: Academic Press.

Frederiksen, N. (1986). Construct validity and construct similarity: Methods for use in test development and test validation. Multivariate Behavioral Research, 21, 3-28.

Friedman, L., & Harvey, R. J. (1986). Can raters with reduced job de- scriptive information provide accurate Position Analysis Question- naire (PAQ) ratings? Personnel Psychology. 39, 779-789.

Funder, D. C. (1987). Errors and mistakes: Evaluating the accuracy of social judgment. Psychological Bulletin, 101, 75-90.

Funder, D. C., & Colvin, C. R. (1988). Friends and strangers: Acquain- tanceship, agreement, and the accuracy of personality judgment. Journal of Personality and Social Psychology, 55. 149-158.

Ghiselli, E. E. (1956). Dimensional problems of criteria. Journal of Ap- plied Psychology, 40, 374-377.

Green, S. B., & Stutzman, T. (1986). An evaluation of methods to select respondents to structured job-analysis questionnaires. Personnel Psy- chology, 39, 543-564.

Griffin, R. W. (1982). Task design: An integrative approach. Glenview, IL: Scott, Foresman.

Guilford, J. P. (1967). The nature of human intelligence. New York: McGraw-Hill.

Guion, R. M. (1961). Criterion measurement and personnel judgments. Personnel Psychology, 14, 141-149.

Guion, R. M. (1965). Personnel testingAtev/ York: McGraw-Hill. Guion, R. M. (1974). Open a new window: Validities and values in psy-

chological measurement. American Psychologist.'29, 287-296.

Guion, R. M. (1976). Recruiting, selection, and job placement. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 777-828). Chicago: Rand McNally.

Guion, R. M. (1977). Content validity, the source of my discontent. Applied Psychological Measurement, 1, 1-10.

Guion, R. M. (1978). Content validity in moderation. Personnel Psy- chology, 31, 205-214.

Guion, R. M. (1980). On trinitarian doctrines of validity. Professional Psychology. 11. 385-398.

Guion, R. M. (1986). Personnel evaluation. In R. A. Berk (Ed.), Perfor- mance assessment: Methods and applications (pp. 345-375). Balti- more, MD: Johns Hopkins University Press.

Guion, R. M. (1987). Changing views for personnel selection research. Personnel Psychology, 40, 199-213.

Guion, R. M., & Cranny, C. J. (1982). A note on concurrent and predic- tive validity designs: A critical reanalysis. Journal of Applied Psychol- ogy, 67. 239-244.

Gutenberg, R. L., Arvey, R. D., Osburn, H. G., & Jeanneret, P. R. (1983). Moderating effects of decision-making/information-process- ing job dimensions on test validities. Journal of Applied Psychology, 68, 602-608.

Hamilton, J. W. (1981). Options for small sample sizes in validation: A case for the J-coefficient. Personnel Psychology, 34, 805-816.

Harvey, R. J. (1986). Quantitative approaches to job classification: A review and critique. Personnel Psychology, 39,267-289.

Hedge, J. W., & Kavanagh, M. J. (1988). Improving the accuracy of performance evaluations: Comparison of three methods of perfor- mance appraiser training. Journal of Applied Psychology, 73,68-73.

Helmreich, R. L., Sawin, L. L., & Carsrud, A. L. (1986). The honey- moon effect in job performance: Temporal increases in the predictive power of achievement motivation. Journal of Applied Psychology, 71, 185-188.

Heneman, R. L. (1986). The relationship between supervisory ratings and results-oriented measures of performance: A meta-analysis. Per- sonnel Psychology, 39, 811-826.

Hirsch, H. R., Schmidt, F. L., & Hunter, J. E. (1986). Estimation of employment validities by less experienced judges. Personnel Psychol- ogy. 39, 337-344.

Holzbach, R. L. (1978). Rater bias in performance ratings: Superior-, self-, and peer ratings. Journal of Applied Psychology, 63, 579-588.

James, L. R. (1973). Criterion models and construct validity for cri- teria. Psychological Bulletin, 80, 75-83.

James, L. R., Demaree, R. G., & Mulaik, S. A. (1986). A note on valid- ity generalization procedures. Journal of Applied Psychology, 71, 440-450.

Kane, M. T. (1982). A sampling model for validity. Applied Psychologi- cal Measurement, 6, 125-160.

Kane, J. S. (1986). Performance distribution assessment. In R. A. Berk (Ed.), Performance assessment: Methods and applications (pp. 237- 273). Baltimore, MD: Johns Hopkins University Press.

Katz, D., & Kahn, R. L. (1966). The social psychology of organizations. New York: Wiley.

Kenrick, D. T, & Funder, D. C. (1988). Profiting from controversy: Les-

VALIDITY OF PERSONNEL DECISIONS 493

sees from the person-situation debate. American Psychologist, 43, 23-34.

Kleiman, L. S., & Durham, R. L. (1981). Performance appraisal, pro- motion and the courts: A critical review. Personnel Psychology, 34, 103-121.

Kleiman, L. S., & Faley, R. H. (1978). Assessing content validity: Stan- dards set by the courts. Personnel Psychology, 57,701-713.

Klimoski, R.,& Brickner, M. (1987). Why do assessment centers work? The puzzle of assessment center validity. Personnel Psychology, 40, 243-260.

Landy, F. J. (1986). Stamp collecting versus science: Validation as hy- pothesis testing. American Psychologist, 41, 1183-1192.

Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72-107.

Lawshe, C. H. (1975). A quantitative approach to content validity. Per- sonnel Psychology. 28, 563-575.

Lawshe, C. H. (1985). Inferences from personnel tests and their validity. Journal of Applied Psychology, 70,237-238.

Lee, R., Malone, M., & Greco, S. (1981). Multitrait-multimethod- multirater analysis of performance ratings for law enforcement per- sonnel. Journal of Applied Psychology, 66, 625-632.

Loevinger, J. (1957). Objective tests as instruments of psychological the- ory [Monograph No. 9]. Psychological Reports, 3, 635-694.

Longenecker, C. Q, Sims, H. P., Jr., & Gioia, D. A. (1987). Behind the mask: The politics of employee appraisal. Academy of Management Executive,!, 183-193.

Lopez, F. M., Kesselman, G. A., & Lopez, F. E. (1981). An empirical test of a trait-oriented job analysis technique. Personnel Psychology, 34, 479-502.

Lord, R. G. (1985a). Social information processing and behavioral mea- surement: Application to leadership measurement. In B. M. Staw &

L. L. Cummings (Eds.), Research in organizational behavior (Vol. 7, pp. 87-128). Greenwich, CT: JAI Press.

Lord, R. G. (1985b). Accuracy in behavioral measurement: An alterna- tive definition based on raters' cognitive schema and signal detection theory. Journal of Applied Psychology, 70, 66-71.

Margenau, H. (1950). The nature of physical reality. New \fork: Mc- Graw-Hill.

McCormick, E. J. (1976). Job and task analysis. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 651-696). Chicago: Rand McNally.

Messick, S. (1975). Meaning and values in measurement and evalua- tion. American Psychologist, 30, 1012-1027.

Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012-1027.

Messick, S. (1981). Constructs and their vicissitudes in educational and psychological measurement. American Psychologist. 89.575-588.

Meyer, H. H. (1987). Predicting supervisory ratings versus promotional progress in test validation studies. Journal of Applied Psychology, 72, 696-697.

Mintzberg, H. (1983). Structure in fives: Designing effective organiza- tions. Englewood Cliffs, NJ: Prentice-Hall.

Mischel, W. (1973). Toward a cognitive social learning reconceptualiza- tion of personality. Psychological Review, 80, 252-283.

Mischel, W. (1984). Convergences and challenges in search for consis- tency. American Psychologist, 39, 351-364.

Mitchell, T. R., & Linden, R. C. (1982). The effects of the social context on performance evaluations. Organizational Behavior and Human Performance, 29, 241-256.

Monahan, C. J., &Muchinsky, P. M. (1983). Three decades of personnel selection research: A state-of-the-art analysis and evaluation. Journal of Occupational Psychology, 56, 215-225.

Muchinsky, P. M. (1987). Psychology applied to work (2nd ed.). Chi- cago: Dorsey Press.

Nunnally, J. C. (1978). Psychometric theory. New librk: McGraw-Hill.

O'Grady, K. E. (1982). Measures of explained variance: Cautions and limitations. Psychological Bulletin, 92, 766-777.

Palermo^ D. S. (1983). Cognition, concepts, and an employee's theory of the world. In F. Landy, S. Zedeck, & J. Cleveland (Eds.), Perfor- mance measurement and theory (pp. 97-115). Hillsdale, NJ: Erl- baum.

Pearlman, K. (1980). Job families: A review and discussion of their im- plications for personnel selection. Psychological Bulletin, 87, 1-27.

Rezmovic, E. L., & Rezmovic, V (1981). A confirmatory factor analysis approach to construct validation. Educational and Psychological Measurement, 41,61-12.

Rousseau, D. M. (1978). Relationship of work tononwork. Journal of Applied Psychology, 63, 513-517.

Rushton, J. P., Brainerd, C. J., & Pressley, M. (1983). Behavioral devel- opment and construct validity: The principle of aggregation. Psycho- logical Bulletin. 94, 18-38.

Saal, F. E., & Knight, P. A. (1988). Industrial/organizational psychol- ogy: Science and practice. Pacific Grove, CA: Brooks/Cole.

Salanciki G. R., & Pfeffer, J. (1978). A social information processing approach to job attitudes and task design. Administrative Science Quarterly, 23, 224-252.

Sawyer, J. (1966). Measurement and prediction, clinical and statistical. Psychological Bulletin, 66, 178-200.

Schmidt, F. L., & Hunter, J. E. (1981). Employment testing: Old theo- ries and new research. American Psychologist, 36, 1128-1137.

Schmidt, F. L., Hunter, J. E., Croll, P. R., & McKenzie, R. C. (1983). Estimation of employment test validities by expert judgment. Journal of Applied Psychology, 68, 590-601.

Schmidt, F. L., Hunter, J. E., & Urry, V. W. (1976). Statistical power in criterion-related validation studies. Journal of Applied Psychology, 61, 473-485.

Schmidt, F. L.,& Kaplan, L. B. (1971). Composite vs. multiple criteria: A review and resolution of the controversy. Personnel Psychology, 24, 419^134.

Schmitt, N., & Noe, R. A. (1986). On shifting standards for conclusions regarding validity generalization. Personnel Psychology, 39, 849-851.

Schmitt, N., & Ostroff, C. (1986). Operationalizing the "behavioral con- sistency" approach: Selection test development based on a content- oriented strategy. Personnel Psychology, 39, 91-108.

Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984). Meta- analysis of validity studies published between 1964 and 1982 and the investigation of study characteristics. Personnel Psychology. 37, 407- 422.

Schneider, B. (1987). The people make the place. Personnel Psychology, 40, 437-454.

Schwab, D. P. (1980). Construct validity in organizational behavior. In B. M. Staw & L. L. Cummings (Eds.), Research in organizational behavior (Vol. 2, pp. 3-44). Greenwich, CT: JAI Press.

Shaffer, G. S., Saunders, V., & Owens, W. A. (1986). Additional evidence for the accuracy of biographical data: Long-term retest and observer ratings. Personnel Psychology, 39, 791-809.

Smith, J. E., & Hakel, M. D. (1979). Convergence among data sources, response bias, and reliability and validity of a structured job analysis questionnaire. Personnel Psychology, 32,677-692.

Smith, P. C. (1976). Behaviors, results, and organizational effectiveness: The problem of criteriaflln M. D. Dunnette (Ed.), Handbook of in- dustrial and organisational psychology (pp. 745-776). Chicago: Rand McNally.

Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations:

494 JOHN F. BINNING AND GERALD V. BARRETT

An approach to the construction of unambiguous anchors for rating Kates. Journal of Applied Psychology, 47, 149-155.

Society for Industrial and Organizational Psychology. (1987). Principles for the validation and use of personnel selection procedures (3rd ed.). Washington, DC: Author.

Sokol, M, & Oresick, R. (1986). Managerial performance appraisal. In R. A. Berk (Ed.), Performance assessment: Methods & applications.

Baltimore, MD: Johns Hopkins University Press. Staw, B. M. (1977). The experimenting organization: Problems and

prospects. In B. M. Staw (Ed.), Psychological foundations of organiza- tional behavior (2nd ed.; pp. 421-437). Santa Monica, CA: Goodyear.

Stem, G. G., Stein, M. I., & Bloom, B. S. (1956). Methods in personality assessment. Glencoe, IL: Free Press.

Sussmann, M., & Robertson, D. U. (1986). The validity of validity: An

analysis of validation study designs. Journal of Applied Psychology,

71, 461-468. Swann, W. B. (1984). Quest for accuracy in person perception: A matter

of pragmatics. Psychological Review, 91, 457-477. Tenopyr, M. L. (1977). Content-construct confusion. Personnel Psychol-

ogy, 30, 47-54. Tenopyr, M. L. (1986). Needed directions for measurement in work set-

tings. In B. S. Flake & J. C. Witt (Eds.), The future oftesting(pp. 269-

288). Hillsdale, NJ: Erlbaum. Tenopyr, M. L., & Oeltjen, P. D. (1982). Personnel selection and classi-

fication. Annual Review of Psychology, 33, 581-618.

Thomas, J., & Griffin, R. (1983). The social information processing model of task design: A review of the literature. Academy of Manage-

ment Review, S, 672-682. Thorndike, R. L. (1949). Personnel selection: Test and measurement

techniques. New \brk: Wiley.

Thurstone, L. L. (1938). Primary mental abilities. Psychometric Mono-

graphs, No. 4. Chicago: University of Chicago Press. Trattner, M. H. (1982). Synthetic validity and its application to the Uni-

form Guidelines validation requirements. Personnel Psychology, 35, 383-397.

Turnage, J. L., & Muchinsky, P. M. (1984). A comparison of the predic- tive validity of assessment center evaluations versus traditional mea- sures in forecasting supervisory job performance: Interpretive im- plications of criterion distortion for the assessment paradigm. Jour-

nal of Applied Psychology, 69, 595-602. Uniform Guidelines on Employee Selection Procedures. 43 Federal

Vance, R. J., MacCallum, R. C., Coovert, M. D., & Hedge, J. W. (1988). Construct validity of multiple job performance measures using con- firmatory factor analysis. Journal of Applied Psychology, 73, 74-80.

Weick, K. E. (1979). The social psychology of organizing. New \brk: Random House.

Weiss, H. M., & Shaw, J. B. (1979). Social influences on judgments about tasks. Organizational Behavior and Human Performance, 24, 126-140.

Wernimont, P. F., & Campbell, J. P. (1968). Signs, samples, and criteria. Journal of Applied Psychology, 52, 372-376.

Wiggins, J. S. (1973). Personality and prediction: Principles of personal- ity assessment. Reading, MA: Addison-Wesley.

Ybungblood, S. A. (1984). Work, nonwork, and withdrawal. Journal of Applied Psychology, 69, 106-117.

Received December 8, 1987

Revision received September 23, 1988

Accepted October 4, 1988 •