questions
Journal of Applied Psychology 1989, m 74, No, 3,478-494
Copyright 1989 by the American Psychological Association, Inc. 0021-9010/89/S00.75
Validity of Personnel Decisions: A Conceptual Analysis of the Inferential and Evidential Bases
John F. Binning Illinois State University
Gerald V. Barrett University of Akron
Issues common to both the process of building psychological theories and validating personnel deci- sions are examined. Inferences linking psychological constructs and operational measures of con- structs are organized into a conceptual framework, and validation is characterized as the process of accumulating various forms of judgmental and empirical evidence to support these inferences. The traditional concepts of construct-, content-, and criterion-related validity are unified within this framework. This unified view of validity is then contrasted with more conventional views (e.g., Uni-
form Guidelines, 1978), and misconceptions about the validation of employment tests are exam- ined. Next, the process of validating predictor constructs is extended to delineate the critical infer- ences unique to validating performance criteria. Finally, an agenda for programmatic personnel selection research is described, emphasizing a shift in the behavioral scientist's role in the personnel selection process.
Demonstrating the validity of decisions based on psychologi-
cal assessment procedures is of fundamental importance to per-
sonnel and other applied psychologists. Furthermore, few
would argue with the fact that generating and articulating valid-
ity evidence is a complex process. To fully appreciate this com-
plexity, it is important to realize that conceptions of validity
have evolved over the years through the melding of legal, techni-
cal, and practical concerns about the quality and utility of per-
sonnel decisions. Inevitably, differences of interpretation and
opinion have arisen as each constituency has viewed these myr-
iad concerns from uniquely important perspectives. Perhaps
equally inevitable, however, is the confusion that has grown out
of these differences. Because this confusion ultimately limits the
effectiveness of practitioners and theorists alike, the need for
greater clarity cannot be overestimated (Guion, 1987; Landy,
1986; Tenopyr, 1986).
This article is based on the premise that all validity issues
discussed in personnel contexts have some conceptual counter-
part in the general process of theory development (Landy,
1986). Moreover, various departures from this "ideal" process
have led to myopic, if not erroneous, conceptions of validity. To
elucidate how these departures have distorted conceptions of
validity, the article is divided into four major sections. In the
section that immediately follows, we review how the general
A shorter version of this article was presented at the annual meeting of the Academy of Management, Anaheim, California August 1988.
We would like to thank the following people for helpful comments on
earlier drafts of this article: JerFFacteau, Mel Goldstein, Steven Landau, Pat Maloney, Tim Mooney, John Pryor, Pat Raymark, Glenn Reeder, Bob Rumery, Jay Thomas, Karen Williams, Kenneth \ork, and two
anonymous reviewers. Correspondence concerning this article should be addressed to John
F. Binning, Department of Psychology, Illinois State University, Nor-
mal, Illinois 61761.
concept of scientific validity implies a simple model in which
constructs and measures of such are inferentially linked. In the
next section, we suggest that in personnel selection contexts, a
conceptually truncated adaptation of this model often implic-
itly guides the validation of predictor-criterion relationships.
This truncation has for years had an undesirably limiting in-
fluence on conceptions of validity. Perhaps its most damaging
effect has been the relative neglect of criterion validity concerns.
In remedial response to this, a third model is presented. This
model is designed to restore and clarify the severed criterion
portions of the original. Finally, suggested strategies for elabo-
rating the proposed model and broadening conceptions of vali-
dation are discussed.
Validation Vis-a-Vis Theory Development
It is now commonly accepted that validity is not a character-
istic of a test or assessment procedure but, instead, of inferences
made from test or assessment information (Cronbach, 1970;
Guion, 1980,1987;Landy, 1986; Society for Industrial and Or-
ganizational Psychology, 1987; American Psychological Associ-
ation, 1985). An inference is valid to the extent that it is sup-
ported by sound evidence. Expressed alternatively by Nunnally
(1978), "one validates not a measuring instrument but rather
some use to which a measuring instrument is put" (p. 87). Logi-
cally, therefore, to examine the concept of validity in personnel
decision making, it is important to delineate (a) the types of
inferences involved in applied personnel decision situations and
(b) the nature of evidence that can be used to support such in-
ferences.
Inferences Linking Psychological Constructs
Following Landy's (1986) lead, it is both appropriate and im-
portant to view the process of validating a particular selection
procedure as a special case of hypothesis testing and scientific
478
VALIDITY OF PERSONNEL DECISIONS 479
theory building. The following rudimentary characterization of
the theory-building process will provide a backdrop for further
discussion of some important validity concepts.
Psychological constructs are labels for clusters of covarying
behaviors. In this way, a virtually infinite number of behaviors
is reduced to a system of fewer labels, which simplifies and
economizes the exchange of information and facilitates the pro-
cess of discovering behavioral regularities. For example, it is less
cumbersome to refer to the relation between verbal and quanti-
tative ability than to the abilities to add, subtract, multiply, and
divide numbers, fractions, decimals, and so forth, and their re-
lations to reading, spelling, understanding word meanings, and
soon.
Putting aside the perennial debate over the objective existence
of psychological traits and psychologists' constructs (Cronbach
& Meehl, 1955; Kane, 1982; Loevinger, 1957; Messick, 1981;
Nunnally, 1978), viewed pragmatically, a construct is merely a
hypothesis about which behaviors will reliably covary. Con-
structs are heuristic devices for describing behavioral domains.
Of course, construct domains can vary in being large versus
small, specific versus general, and fuzzy versus clearly defined
(Guion, 1987; Nunnally, 1978). Also, constructs become the
object of conceptual scrutiny in their own right. In other words,
psychologists hypothesize both (a) whether certain behaviors
will covary and (b) whether the clusters of covarying behaviors
(constructs) tend to covary in meaningful ways. In this general
sense, the terms construct validation and theory development
imply the same basic process. Both refer to the process of iden-
tifying (and often reifying) constructs by developing measures
of such constructs and examining relationships among the vari-
ous measures.
Nunnally (1978) delineated the four inferences that form the
core of this construct validation process. These four inferences
logically bind the components of the model presented in Figure
1. One can attempt to determine whether an inferred relation-
ship between two constructs (e.g., anxiety and manual dexter-
ity) exists by developing measures or causal conditions for each
(labeled Jf and Y, respectively). It is important to emphasize that
these measures are nothing more than procedures for sampling
behaviors within the respective construct domains. The follow-
ing four inferences then follow logically;
1. X and Y relate in some specified way. 2. X is a measure of (or treatment that induces) anxiety. 3. Anxiety and manual dexterity are causally related in some spec- ified way. 4. Yis a measure of (or treatment that induces) manual dexterity.
Even though these four inferences are interrelated, a single ex-
periment cannot validate all four inferences simultaneously. In
fact, Inference 1 is the only one that can be empirically tested
directly. That is, we can use our measures of anxiety and man-
ual dexterity to derive scores that are subsequently found to
relate either experimentally or correlationally. These data serve
as empirical evidence of the veridicality of Inference 1. From
this one empirical finding, therefore, it would be necessary to
infer the truth or falsity of the others, because Inferences 2, 3,
and 4 each link an observable measure with a hypothetical con-
struction. Of course, merely finding a correlation between X
Measure of Anxiety
x
1
2
1
1 Measure of Manual Dexterity
Y
1
' \
4
Figure 1. Critical inferential linkages in the theory-building process.
and Y leaves open several alternative interpretations of possible
relationships. For example, anxiety and manual dexterity are
perhaps both related to some third construct?
To provide incontrovertible proof that the four inferences are
correct, it would be necessary to empirically demonstrate three
of the inferences. If three of the linkages are unequivocally
proven correct, then complete confidence in the fourth would
be justified. However, because this direct empirical proof is im-
possible (Nunnally, 1978), typical practice is to assume that two
of the three inferences (2,3, or 4) are correct and this, combined
with empirical evidence of Inference 1, allows a valid conclu-
sion regarding the remaining inference. Generally, these con-
clusions about construct validity are strengthened in those situ-
ations in which the truth of the assumptions is obvious to every-
one scrutinizing the conclusions drawn. Specifically, we are
more confident that a test validly measures a given construct if
(a) the behavioral domain of the other construct is explicitly
defined and (b) the assumption of a relationship between the
two constructs is unarguable (Nunnally, 1978).
The Three Faces of Construct Validity
To avoid confusion, it is important to realize that the term
construct validity has thus far been used to describe the sound-
ness of evidence supporting any of the four inferences. Thus,
the term is being used in its most general sense in reference
to construct-construct links (Inference 3), construct-measure
links (Inferences 2 and 4), and measure-measure links (Infer-
ence 1). However, what have traditionally been of particular
concern to research psychologists and psychometricians are
construct-measure links (i.e., Inference 2 or 4). In the heyday
of trait psychology, construct validity often referred to whether
a given test or measurement procedure allowed accurate infer-
ences about an individual's standing on a psychological con-
struct of particular interest (D. T. Campbell & Fiske, 1959;
Cronbach & Meehl, 1955; Ebel, 1977; Guion, 1980; Messick,
1980). These two uses of the term construct validity (equal con-
cern for Inferences 1, 2, 3, & 4 vs. primary concern for only
Inference 2 or 4) are clearly congruent. In fact, the difference
in perspective was recognized by Loevinger (1957) when she
referred to the validity of the construct versus the validity of the
480 JOHN F. BINNING AND GERALD V. BARRETT
test as a measure of the construct (Landy, 1986). If theory build-
ing is of primary interest, Inferences 1, 2, 3, and 4 are all of
equal importance. On the other hand, in specific situations (e.g.,
development of a new test), Inference 2 (or 4) is emphasized.
This becomes potentially more confusing because the term con-
struct validity has a somewhat different connotation in the per-
sonnel selection literature. Here, it has been frequently used to
describe a specific evidential approach for justifying a specific
measure-construct link (i.e., the predictor-performance link-
age portrayed in Figure 2) by documenting underlying con-
struct-construct and construct-measure links (Schwab, 1980).
The inferences implicated in this latter meaning are described
in detail in the next section. Perhaps the most important issue
at this juncture is to realize that these various meanings of the
term construct validity are nothing more than different views
of the same logical system, with varying emphasis on different
inferences.
Examining Traditional Conceptions of Validity
A common conception of the personnel selection process in-
volves (a) analysis of the job to determine (b) a performance
domain, denned in terms of job behaviors or outcomes, which
then guides (c) the selection or development of certain assess-
ment procedures, which make possible (d) predictions about
the likelihood that applicants will perform the job with a certain
degree of proficiency, and then subsequently (e) evaluating indi-
vidual performance by some operational criterion measure
(e.g., Cascio, 1987; Muchinsky, 1987; Society for I/O Psychol-
ogy, 1987; APA, 1985; Uniform Guidelines, 1978). This pro-
cess implies a framework, presented in Figure 2, which parallels
Figure 1 in many respects. The framework represented in Fig-
ure 2 portrays the following inferences:
5. Predictor measurements relate to criterion measurements. 6. The predictor measure is an adequate sample from a psycholog- ical construct domain. 7. The predictor construct domain overlaps with the performance domain. 8. The criterion measure is an adequate sample from the perfor- mance domain. 9. The predictor measure is related to the performance domain.
These inferences serve to link the components in Figure 2 anal-
ogously to the inferences in Figure 1. It is important to realize,
however, that in the transition from Figure 1 to Figure 2, two
important differences have arisen. First, an additional mea-
sure-construct linkage (Inference 9) has been created, linking
the predictor measure and the performance domain. Second,
rather than equal emphasis being placed on all inferences, this
additional measure-construct (Inference 9) link has taken on
greater relative importance.
The implications of this way of thinking for understanding
the validation process are explored in the discussion that fol-
lows. Before detailing how validation of personnel selection de-
cisions is merely a special case of the more general construct
validation process, it would be helpful to discuss the process of
conceptualizing and constructing behavioral domains.
Figure?. A common conception of the inferences for personnel selection.
Contrasting Predictor Construct Domains and
Performance Domains
In an attempt to simplify the virtually infinite number of be-
haviors that can be exhibited by human beings, psychologists
attempt to identify naturally occurring clusters, then construct
labels for them, and investigate the covariance between them.
Predictor constructs, therefore, represent those clusters of co-
variant behaviors identified through psychological research and
constructed to enhance our general understanding of behavior.
In contrast to the psychologists' search for naturally occur-
ring behavioral construct domains across myriad situations, or-
ganizational designers in effect create behavioral domains to en-
hance their understanding and prediction of job behavior. In
fact, it is important to realize that from our pragmatic view-
point, a job performance domain is a construct, albeit in a con-
ceptually different sense than is usually implied in the psycho-
logical literature. Nonetheless, the performance of any job in
any organization is a cluster of interlocked and covariant behav-
iors, and this cluster consists of a subset of all possible behaviors
necessary for the organization to accomplish its broader goals
and objectives (Weick, 1979). Just as psychological constructs
represent behavioral domains, performance associated with a
job (or distinguishable aspects of job performance) represents
a behavioral domain.
Performance domains are conceptually distinct from predic-
tor constructs in that the universe to be sampled is delineated
differently. Construct domains on the predictor side are con-
ceived of by the research psychologist with reference to some
theoretical framework developed to explain general regularities
in human behavior. Performance domains are determined, or
at least influenced, by organizational decision makers and selec-
tion specialists collaborating to translate broad organizational
objectives into normative statements of valued behaviors and
outcomes.
The overriding reason for constructing behavioral domains
on both the predictor and the performance side is the parceling
of myriad behaviors into meaningful clusters to enhance under-
standing and communication. However, this parceling process
is different on the predictor versus the performance side because
of differences in (a) the conceptualization of predictor domains
versus performance domains, (b) the specific purposes for sped-
VALIDITY OF PERSONNEL DECISIONS 481
fying behavioral domains, (c) the methods used to cluster be-
haviors, and often (d) the language system used to communicate
about the resulting construct systems.
First, the source of covariance between job behaviors is de-
signed by the organization and induced by various external con-
trol and coordination mechanisms (Mintzberg, 1983). This can
be contrasted with the naturally occurring covariance resulting
from the individual's personal predispositions or the interaction
of these predispositions with untold environmental influences,
as is often conceptualized for psychological constructs. Second,
predictor constructs are clusters of behaviors created by re-
search psychologists to capture general regularities in behavior.
Performance domains are designed more or less rationally to
interlock in such a way as to maximize efficient attainment of
organizational goals. As a result, performance domains are
clusters of behavior-outcome units that are differentially val-
ued by the organization. Depending on which goals are most
operative, clustering systems can vary considerably (Campion
AThayer, 1985; Griffin, 1982; Harvey, 1986).
Third, through psychologists' use of both correlational (e.g.,
factor analytic) and experimental methods, behavior is empiri-
cally examined, and reliable regularities are conceptualized and
assigned construct labels. In contrast, organizational designers
typically rely on rational and relatively informal methods for
delineating performance domains. Finally, construct terminol-
ogy on the predictor side reflects the concern for identifying do-
mains of behavior caused by personal dispositions (e.g., "she is
extraverted"). Terminology used in organizations to describe
performance domains is more often goal-related (e.g., "she is
customer-service oriented"). These different terminologies for
describing behavioral domains have been discussed extensively
elsewhere (Fleishman, 1982; Pearlman, 1980).
On the predictor side, behaviors can be clustered hierarchi-
cally in varying levels of inclusiveness. For example, Dunnette
(1976) reviewed attempts to conceptualize intellectual func-
tioning by pointing out that behavior can be grouped into a
single, global construct (i.e., the g factor), several less inclusive
constructs (e.g., Thurstone's, 1938, seven primary mental abili-
ties), or literally hundreds of constructs (e.g., Guilford's, 1967,
structure of intellect model).
Organizations also conceptualize job behavior at different
levels of inclusiveness, depending on the purpose at hand. For
purposes of administrative decision making, global "construc-
tions" of job behavior are evoked, because the overriding im-
perative is the comparison of employees' overall contribution
to the organization. In this situation, the constructed system of
performance domains can be conceptualized as merely a system
of different job titles, each connoting a different domain of be-
haviors. On the other hand, when remedial feedback about job
behavior is required, organizations often cluster performance
into a number of behaviorally meaningful dimensions. For ex-
ample, the process of constructing behaviorally anchored rating
scales (Bernardin & P. C. Smith, 1981; P. C. Smith & Kendall,
1963) can be viewed as resulting in a performance construct
system that enhances intraorganizational communication and
decision making about job performance in a specific organiza-
tion (Feldman, 1986). A source of difficulty for many job ana-
lysts and performance appraisal system designers is the fact that
the organization's conceptual system for describing perfor-
mance is ineffective for certain purposes.
Performance domains result from the division of labor funda-
mental to organizing human activity. The conceptualization
and resulting terminology used in an organization to describe
performance differences (both within and between jobs) serves
to make behavior in organizations more understandable and or-
derly. Building on Weick's (1979) basic tenet that organizing is
a "consensually validated grammar for reducing equivocality
by means of sensible interlocked behavior" (p. 3), we propose
that performance construct systems are an important part of
the grammar and culture of a given organization. The manner
in which performance behavior is clustered and labeled is part
of the consensually validated conceptual scheme that helps
make sense out of the complex stream of interlocked behaviors
in the organization.
Viewed from this perspective, selection decisions represent
attempts to identify regularities in applicants' behavior, but
only those behaviors identified by the organization as valuable
for coordination with others' behavior that are necessary for
goal attainment. Personnel selection, then, is the process of
identifying and mapping predictor samples of behavior to
effectively overlap with performance domains. Validity, there-
fore, can be viewed as the extent to which these two construct
systems overlap.
The "Unitarian" Conception of Validity
The trilogy of construct, content, and criterion-related validi-
ties was first articulated in the "Technical Recommendations
for Psychological Tests and Diagnostic Techniques" (American
Psychological Association, American Educational Research As-
sociation, & National Council of Measurement Used in Educa-
tion, 1954). As noted by Landy (1986), this trilogy was quite
valuable in that it enhanced the clarity with which validity con-
cepts were typically discussed at the time. As is the case with
many popular conceptualizations, however, its initial usefulness
was replaced by growing confusion. This confusion is due, in
part, to the tendency for certain erroneous interpretations, mis-
conceptions, and legal mandates to become crystallized as part
of professional psychology's conventional wisdom (G. V. Bar-
rett, 1972) or tenets of orthodoxy (Guion, 1976). It was many
years before this conventional wisdom was questioned in a sys-
tematic way (Dunnette & Borman, 1979; Guion, 1977, 1978,
1980; Messick, 1975, 1980; Tenopyr, 1977; Tenopyr & Oeltjen,
1982) and, by then, confusion was running rampant.
For many years, the concepts of construct, content, and crite-
rion-related validity have been described as different types of
validity. Some recent descriptions have gone so far as to suggest
that each of these validity analysis strategies (Lawshe, 1985)
should be chosen according to the kinds of inferences or conclu-
sions one wishes to make about job applicants (e.g., Lawshe,
1985; Saal & Knight, 1988) or the nature of the selection proce-
dure (e.g., R. S. Barrett, 1980). Although the latter view has to
some degree been induced by the prevailing opinion in Title VII
litigation, this linking of different validities to different infer-
ences or types of predictors is logically problematic because of
the implication that in any given decision situation, only one of
482 JOHN F. BINNING AND GERALD V. BARRETT
the three validity concepts is useful. On the contrary, an infer- ence drawn from currently available information about some aspect of future job performance (Inference 9) is the single over- riding inference; and content-, construct-, and criterion-related considerations are all quite relevant for justifying its validity. These three concepts are more appropriately viewed as labels for three evidential bases (Messick, 1980) from which infer- ences about future job performance can be supported or justi- fied.
The applied decision maker is concerned about the extent to which test or assessment information will allow accurate pre- dictions about subsequent job performance (Inference 9). One general approach to justifying Inference 9 would be to generate direct empirical evidence that assessment scores relate to valid measurements of job performance (Sussmann & Robertson, 1986). Inference 5 represents this linkage, which has histori- cally been of primary pragmatic concern to personnel psycholo- gists. The term criterion-related has traditionally been used to denote this type of evidence and, in fact, often implies the un- necessary restriction that only correlational evidence is appro- priate. As Landy (1986) ably pointed out, substantive theories are seldom, if ever, built solely on correlational evidence. Viewed in this way, criterion-related evidence can be experi- mental and quasi-experimental in nature.
Why, therefore, have personnel specialists relied so heavily on correlational evidence of validity? This bias might derive from the fact that in personnel selection situations, the constructs of interest are conceived of as enduring person characteristics (e.g., abilities) on the predictor side and fixed job performance measures on the criterion side. Neither of these is typically thought to be amenable to experimental manipulation, except under the most contrived laboratory conditions. Perhaps an- other factor contributing to this bias in favor of correlational evidence was the conventionally held belief in the situational specificity of validity (Schmidt & Hunter, 1981), which would preclude the use of laboratory analogues to real work settings.
Logically, then, one approach for justifying Inference 9 is to empirically link the predictor and the criterion. However, this results in only partial justification. Analogous to the validation of inferences in Figure 1, to have complete confidence in the validity of Inference 9, both Inferences 5 and 8 must be justi- fied. The relative neglect of Inference 8 by those collecting crite- rion-related evidence represents a critical truncation of the vali- dation process. Suffice it to say that for criterion-related evi- dence to be a compelling argument for Inference 9, strong evidence of both Inferences 5 and 8 is required.
What personnel specialists have traditionally implied by the label construct validity is tied to Inferences 6 and 7. Analogous to the logic presented earlier, it is assumed that if Inferences 6 and 7 can be supported by sound evidence, then one can confi- dently believe Inference 9 to be true. The difference is merely one of focus. Therefore, the general conception of construct va- lidity is merely viewed differently in the context of validating personnel selection decisions. In a selection context, Inference 9 is most critical. If it can be shown that a test measures a specific construct (Inference 6) that has been determined to be critical for job performance (Inference 7), then inferences about job
performance from test scores (Inference 9) are, by logical im- plication, justified.
How does a personnel selection specialist support Inferences 6 and 7? Evidence supporting Inference 6 primarily takes the form of empirically based relationships and judgments that are both convergent and discriminant in nature (D. T. Campbell & Fiske, 1959; Cook & D. T. Campbell, 1979;Cronbach&Meehl, 1955; Drasgow & Miller, 1982; Rezmovic & Rezmovic, 1981). Convergent evidence exists when (a) test scores relate to scores on other tests of the same construct, (b) test scores from people who differ in the extent to which they possess the focal construct also differ in a predictable way, or (c) test scores relate to scores on tests of other constructs that are theoretically expected to be related. Discriminant evidence occurs when test scores do not relate to scores on tests of theoretically independent constructs. Note that this discussion can apply equally to criterion mea- surement (Inference 8).
Inference 7, because it links two hypothetical behavioral do- mains, cannot be examined empirically. Analogous to Infer- ence 3 in Figure 1, Inference 7 must be justified theoretically and logically on the basis of accumulated knowledge of con- struct-construct relations. On closer conceptual scrutiny, how- ever, the analogy loses its relevance because the two constructs being related do not share common nomological (Margenau, 1950) status. The unique conceptual issues that arise when re- lating predictor and performance domains are examined in de- tail later. For now, to the extent that Inferences 6 and 7 are sup- ported, the use of the predictor test to predict job performance is construct valid.
A third approach for justifying Inference 9 involves demon- strating that the predictor is isomorphic and obviously inter- changeable with the performance domain. This line of reason- ing is particularly defensible when it is realized that predictor tests are always samples of behavior from which we infer some- thing about behavior on a job (Dunnette, 1963). The behaviors sampled may be dissimilar or similar ("sign vs. sample") to the criterion behaviors being predicted (Wernimont & Campbell, 1968jl If an applicant performs behaviors as part of the assess- ment phase that closely resemble behaviors in the performance domain, then many personnel specialists feel that, logically, the inference about future job performance is better justified. This line of reasoning underlies the type of evidence traditionally la- beled content validity. Of course, various specific procedures for analyzing the degree of isomorphism between predictors and criteria have been proposed (Doran, 1987; Faley & Sundstrom, 1985; Hamilton, 1981;Lawshe, 1975;Schmitt&Ostroff, 1986; Trattner, 1982), but the same basic logic underlies each.
Content-related evidence of validity has traditionally in- volved justifying Inference 9 by rational examination of the manner in which the performance domain is sampled by the predictor. Analogous to statistical sampling theory, if a predic- tor sample is constructed in congruence with certain principles (e.g., ensuring representativeness as well as relevance of the sample), one can assume that scores from that sample will accu- rately estimate the universe from which the sample is drawn. It is this emphasis on operationalization and sample construction that motivated Tenopyr (1977) to refer to content validation as "content-oriented test construction" (p. 52). Therefore, when a
VALIDITY OF PERSONNEL DECISIONS 483
selection specialist can rationally defend the strategy for sam-
pling the performance domain used in a given testing situation,
content-related validity evidence supports the inference that
scores from the test are valid for predicting future performance.
Decision Validity Versus Predictor Development
Thus far, the concepts of construct-, content-, and criterion-
related evidence have been discussed solely as evidential bases
for justifying decision validity. However, the implications of
differences between the three can be traced back in the decision-
making process. By doing so, their differences can be more
clearly appreciated.
Personnel decision making involves two fundamental phases:
(a) constructing the predictor as a sample of some behavioral
domain and (b) using this behavioral information to make pre-
dictions about future job behavior. This latter data combination
phase is the immediate precursor to employment decisions and
has therefore received considerable legal and professional scru-
tiny. Yet, the data collection phase, which involves specifying
the behavioral data base, has equally important implications for
subsequent decision quality (Sawyer, 1966). The respective roles
of the construct-, content-, and criterion-related concepts in the
development of predictor samples of behavior deserves concep-
tual scrutiny.
With reference to Figure 2, the point of departure for the
development of any personnel selection system is the perfor-
mance domain. From this delineation of desirable job behaviors
or outcomes, selection specialists "work backwards" to specify
which behaviors or outcomes should be sampled by the predic-
tors. There are three routes from the performance domain to
predictor development: The construct-related approach in-
volves identifying psychological construct domains that overlap
significantly with the performance domain (Inference 7) and
then developing predictors that adequately sample these con-
struct domains (Inference 6). The content-related approach in-
volves developing predictors that directly sample the perfor-
mance domain. The criterion-related approach involves
developing some operational measure of behaviors in the perfor-
mance domain (Inference 8) and then identifying or developing
predictors that will relate empirically with the operational crite-
rion measure (Inference 5).
We would like to draw attention to a fundamental difference
between the criterion-related approach and the other two ap-
proaches. The criterion is merely an operational sample of the
performance domain. At its best—that is, being neither defi-
cient nor contaminated—it taps the entire performance do-
main, and the criterion-related approach reduces logically to
the content-related approach. At its worst, it represents an
atheoretical and circuitous, if not an entirely misleading route,
to predictor development (e.g., "dust-bowl empiricism"). From
this perspective, we propose that the construct-related and con-
tent-related approaches represent the two fundamental predic-
tor sampling strategies. Construct-related implies that predictor
sampling is guided by evoking a psychological construct do-
main. Content-related implies that predictor sampling is guided
by evoking a performance domain. To the extent that the two
domains are derived differently and relations between the two
are not well understood, construct- and content-related ap-
proaches can lead to substantive differences in predictor devel-
opment and consequent decision validity (R. S. Barrett, 1980).
In contrast with the construct- and content-related approaches,
the criterion-related approach is best characterized as a re-
search strategy for empirically assessing the quality of either
predictor sampling strategy. Viewed from this perspective, judg-
ments of validity are tantamount to judgments about the ade-
quacy of behavior sampling (construct- and content-related) or
empirical indexes of such adequacy (criterion-related).
Generating Evidence for Decision Validity
There has been considerable debate over the years regarding
whether the construct-related versus the content-related view-
point provides the most fruitful model for guiding predictor de-
velopment and subsequent decision making. For instance, a
fundamental conceptual issue was raised when Wernimont and
J. P. Campbell (1968) argued that the classic validity model and
its emphasis on predictor tests as signs of underlying constructs
should be replaced by the behavioral consistency approach in
which predictors represent samples of job behavior. Upon
closer examination, this issue is really one of how predictor do-
mains should best be specified. If the predictor test is labeled a
sign, it implies that the behavior domain was specified by the
theory surrounding a psychological construct. If the predictor
test is labeled a sample, it implies that the behavior domain was
specified by the "theory" surrounding job performance. Ulti-
mately, the resolution of this controversy depends on one's real-
izing that the two approaches are inextricably intertwined in
the inferential system portrayed in Figure 2. Interestingly, this
distinction parallels in certain respects the long-standing con-
troversy in personality psychology between traditional trait ver-
sus situationalist approaches for predicting behavior (Mischel,
1973). The issues of whether intrinsic attributes versus environ-
mental characteristics are the most potent influencers of behav-
ior are certainly not fully resolved; however, advances have been
made to integrate the trait-situation perspectives (Kenrick &
Funder, 1988; Mischel, 1984; Schneider, 1987).
In our view, personnel psychologists should never avidly rec-
ommend the abandonment of construct-based theory develop-
ment, because it is the hallmark of fruitful scientific inquiry.
Tenopyr (1977) pointed out that for a test to have high predic-
tive value, it must share the same psychological constructs that
underlie job behavior. This view recognizes that content speci-
fication is part of the construct validation process. That is, part
of justifying that a test measures a given construct is the exami-
nation of the internal structure of the test to assess the extent to
which it is consistent with the theory surrounding the construct.
Irrespective of this conceptual unity, the construct versus con-
tent perspectives are explicitly recognized in the Uniform
Guidelines (1978), Standards (1985), and Principles (1987),
and therefore it is pragmatically important to draw clear opera-
tional and semantic distinctions between them.
Criterion-related evidence is by its nature empirical, whereas
content-related and construct-related evidence are typically
conceived of as'relying more on human judgment and thus are
used differently to justify inferences from test scores. Perhaps
484 JOHN F. BINNING AND GERALD V. BARRETT
because the precision introduced by careful quantification of
psychological phenomena is fundamental to scientific inquiry,
criterion-related evidence has been endorsed as legally superior
to the other two forms of evidence (Uniform Guidelines, 1978).
However, the scientific superiority of criterion-related evidence
has not received this endorsement in professional guidelines
(e.g., Society for I/O Psychology, 1987; APA, 1985).
Although this issue has been debated for years, the present
framework makes it clear that there is no inherent or immutable
superiority of criterion-related evidence (especially when re-
stricted in form to predictor-criterion correlations) over other
lines of evidence. An uncritical bias in favor of criterion-related
evidence can have deleterious effects on theoretical understand-
ing. Validation research for assessment centers provides a case
in point. After reviewing the empirical validity evidence, Kli-
moski and Brickner (1987) concluded that despite consistent
empirical evidence, the theoretical explanations of "the predic-
tive validity of assessment centers remains a puzzle" (p. 256).
G. V. Barrett, Alexander, O'Connor, and Forbes (1978) argued
that coincidental empirical relationships can be discovered
when relying on a "dust-bowl empiricism" approach to valida-
tion. These coincidental relationships, when atheoretically dis-
covered and interpreted, will detract from our ultimate under-
standing of complex criterion behavior, because they lack what
Guion (1980) has called job-relatedness. The work of Schmidt
and Hunter (1981) also casts suspicion on the reliability of evi-
dence from individual criterion-related validity studies due to
the excessive sampling error that results from the use of small
validation samples.
One could reasonably argue that content-related and con-
struct-related evidence, when based on sound professional
judgment about appropriate test use, are often superior to crite-
rion-related evidence. Research does indicate that pooled esti-
mates of criterion-related validity, based on the opinions of per-
sonnel psychologists, are more accurate than empirical evi-
dence obtained from small-sample validity studies (Hirsch,
Schmidt, & Hunter, 1986; Schmidt, Hunter, Croll, & McKen-
zie, 1983). The traditional emphasis placed on criterion-related
evidence may suggest only that evidence largely based on judg-
ment is more likely to be questioned because of the widely held
belief that judgments are inherently fallible. It may also suggest
that people do not fully realize the subjective nature of judg-
ments about the relevance of criterion measures.
In some discussions of validity, an appeal is made to follow
"professionally accepted procedures" for generating evidence
(Lawshe, 1985). It is easily argued that there is nothing ap-
proaching a specific, unambiguous set of professionally ac-
cepted standards for determining the validity of inferences from
test scores (Landy, 1986). Even casual examination of legal tes-
timony and professional literature indicates that no consensus
exists regarding issues of content-related validation (Kleiman
& Faley, 1978); concurrent versus predictive validation (G. V.
Barrett, Phillips, & Alexander, 1981; Guion & Cranny, 1982;
Schtnitt, Gooding, Noe, & Kirsch, 1984); adequate sample
sizes (Monahan & Muchinsky, 1983; Schmidt, Hunter, & Urry,
1976); validity generalization (Burke, 1984; Callender & Os-
burn, 1980; Gutenberg, Arvey, Osbura, & Jeanneret, 1983;
James, Demaree, & Mulaik, 1986; Schmitt & Noe, 1986); and
criterion development (G. V. Barrett & Kernan, 1987; Kleiman
& Durham, 1981), to mention only a few. As a matter of fact,
the validation of certain inferences is actually being made by
the courts rather than by professionals who regularly use the
measuring instruments upon which these inferences are based.
For instance, from a breathalyzer reading of. 10, the courts infer
that driving ability is impaired. This may not be a valid infer-
ence, even though the test, by all psychometric and professional
standards, may be valid for measuring blood alcohol levels. A
number of the same issues will likely arise regarding the use of
polygraphs, drug tests, and genetic screening. Similarly, some
of the early courts attempted to mandate that a certain validity
coefficient, per se, made a test valid or not. Clearly, the process
of drawing inferences from test scores is a very complex one,
particularly if one considers the interrelated roles of technical,
practical, and legal opinion.
Which Inference Is Which?
Throughout this article, the goal has been to delineate the
inferences that logically underlie the process of validating psy-
chological constructs and measurement procedures. However,
there is another sense in which multiple inferences are discussed
by personnel decision makers. There are many potential infer-
ences about future job behaviors that may be drawn from the
same test scores. These multiple inferences about future job be-
haviors should not be confused with the inferences represented
in Figures 1 and 2. The many different inferences about future
job behavior are merely specific examples of Inference 9 in Fig-
ure 2. It is possible to conceive of tests that yield scores that are
unquestionably valid indicators of some underlying, theoreti-
cally meaningful construct (see Ebel's, 1961, discussion of mea-
surement in the physical sciences). Of course, what is valid is
the inference that test scores reflect differences in the construct
(e.g., Inference 6), but this is conceptually quite different from
inferences about future job behavior drawn from test scores (In-
ference 9). Guion (1974, 1987) highlighted this distinction by
differentiating between job relevance and validity of trait mea-
surement. With enough theoretical and empirical corrobora-
tion, it can be confidently concluded that test scores and con-
struct differences covary systematically. Therefore, test scores
make valid inferences about the construct possible. Yet, one
may attempt to infer whether a person who scores in a particu-
lar way on the test will perform in a certain way on a job, train-
ing program, and so on, but these are quite different kinds of
inferences. The test may not be valid for some or all of these
purposes, because each implies a different criterion to be pre-
dicted. Likewise, the inferences about job performance may
be valid, but inferences about other outcomes (e.g., tenure)
may not.
It is important to make this distinction to emphasize that a
test can be construct valid (in the sense that it validly measures
a given construct) and yet certain inferences about future be-
havior may not be valid. For selection purposes, then, this test
would not be construct valid in the traditional Title VII sense.
It is this differential past usage of the term construct validity
that motivated Guion (1980) to "identify the unifying concept
of validity as similar, but not necessarily identical, to what has
VALIDITY OF PERSONNEL DECISIONS 485
Predictor L !
Muiure 1 Criterion
Measure
\
a
Figure 3. A modified framework detailing the
inferences for criterion development.
been meant by construct validity" (p. 393). Perhaps some fu-
ture confusion could be allayed by using the term validity in
the unifying sense to refer to the justifiable confidence in our
selection decision. Canstruct-related validity should be reserved
for references to a particular evidential approach to demon-
strating validity that focuses on justifying certain critical con-
struct-measure and construct-construct inferences (Inferences
6 and 7).
Construct-Related Validity of the Criterion
The discussion thus far has reflected a common view of vali-
dation. That is, personnel specialists generally place more em-
phasis on validity of the predictors because the overriding or-
ganizational imperative is to gather predictor information as
the basis for important and inevitable selection decisions. How-
ever, we would like to call attention to Cascio's (1987) statement
that "in order to emerge from the 'dark ages' we need clear
thinking, in-depth theorizing about criteria, and identification
of the goals of criterion measurement" (p. 51). The importance
of this statement should become clearer on examination of Fig-
ure 3, which represents an adaptation of the validation para-
digm presented in Figure 2. The model presented in Figure 3
was designed specifically to link several traditional concepts
unique to personnel decision making and to highlight the con-
ceptual differences between predictor construct and perfor-
mance domains.
Note that the systems of inferences detailed in Figures 1, 2,
and 3 are logically symmetrical. Therefore, the issues raised
when discussing validity of the predictor are equally important
for validating criterion measures (Frederiksen, 1986). The ca-
veat proposed here is that criterion measures must be validated
analogously to predictors (Guion, 1961, 1976, 1987; James,
1973), with reference to the inferential linkages being sup-
ported by evidence. The importance of this point is often under-
estimated.
In a typical selection situation, again, Inference 9 is the criti-
cal inference for which confirming evidence is required. The
validation process involves accumulating evidence of various
forms to justify Inference 9 in either a direct empirical way (e.g.,
validity coefficients, contrasted groups, or test construction
analyses) or more judgmentally by confirmation of Inferences
6,7, and 8. Inference 7 represents whether a specific psychologi-
cal construct underlies job performance, whereas Inference 8
represents whether the operational criterion samples the perfor-
mance domain. Generating evidence for Inferences 7 and 8 is
the process of accumulating construct-related evidence of crite-
rion validity.
The present framework helps to identify possible loci for the
criterion problem. It results from a tendency to truncate the no-
mological network (specifically, Inferences 7, 8, and 10), which
in turn leads to a myopic view of criterion validity. Two interre-
lated effects of this myopia are likely to result. First, the develop-
ment of criterion measures is likely to be less psychometrically
rigorous than predictor development. Wiggins (1973) stated
that "basically, the 'problem' resides in the considerable dis-
crepancy that typically exists between our intuitive standards
of what criteria of performance should entail and the measures
that are currently employed for evaluating such criteria" (p.
39). Second, performance criteria are likely to be less deeply or
richly embedded in networks of theoretical relationships than
are constructs on the predictor side. Perhaps this state of affairs
has resulted partially from the differences between research for
administrative prediction and research for scientific under-
standing (Anderson & Shanteau, 1977; Loevinger, 1957). Re-
search for prediction tends to ignore the importance of multide-
terminant functional relationships between variables.
The value of an employee's behavior or accomplishments to
an organization is ultimately a relative value judgment by some
member or members of the organization (Fiske, 1951). Stated
strongly, Campbell (1983) maintains that "the meaning of per-
formance is not something to be 'discovered'; it should be im-
posed" (p. 286). As such, it is amenable to different interpre-
tations depending on who is making the judgment. As dominant
coalitions or critical alliances (Weick, 1979) shift and the orga-
nization's values change, so do normative judgments of an em-
ployee's worth (Guion, 1961). As a result, it is less likely that
the systematic procedures that characterize professional test
development will be applied to criterion development (Banks
& Roberson, 1985). Once a test is rigorously developed, it has
perceived potential for long-term usefulness in various predic-
tion and assessment applications. The same kind of rigor in the
development of criterion measures, even assuming that the or-
ganization would "foot the bill," might quickly lose its utility
with a change in values that often occurs with a change in the
organization's leadership. Also, idiosyncratic values regarding
behavior in different organizations logically require customized
criterion measurement systems. Still another factor contribut-
ing to a lack of substantive criterion development is the long-
held belief in the dynamic nature of criteria (Ghiselli, 1956).
This conventional, yet perhaps erroneous (G. V. Barrett, Cald-
well, & Alexander, 1985), belief that performance determinants
486 JOHN F. BINNING AND GERALD V. BARRETT
change significantly over time logically mandates a greater ex-
penditure of resources for criterion development than many are
willing to accept. One result is that little emphasis is likely to
be placed on research as a means of accumulating knowledge
about the appraisal system (Smith, 1976). For these and many
other economic and logistic reasons, it can be assumed that cri-
terion measures are not generally likely to command the con-
cern for rigor necessary for optimal development. Also note that
this issue of rigor in behavioral criterion development is not
unique to personnel selection research (O'Grady, 1982). Rush-
ton, Brainerd, & Pressley (1983) analyzed behavioral criterion
deficiency in 12 major areas of psychological research and con-
cluded that it is a formidable and pervasive problem.
There is a more fundamental conceptual basis for assuming
that criterion development is likely to be deficient. As Figure 3
illustrates, there are three inferences linking the psychological
construct required for job performance and the operational cri-
terion measure. The truth of Inference 8 is to some extent em-
pirically testable by the construct-related validation procedures
discussed earlier. However, Inference 7, linking the performance
construct with the underlying psychological construct, is justi-
fiable only through rational deductive analysis (Cascio, 1987).
This inference must be based on the judgments of certain peo-
ple. On the one hand, the criterion must be defined by organiza-
tion leaders who are responsible for formulating and translating
valued organizational outcomes. On the other hand, selection
specialists are required to infer from job analytic data the pre-
dictor constructs required for job performance. Incidentally,
Smith (1976) pointed out that the translation of goals to valued
behavior should also be validated (Inference 11). This mandate
for collaborative decision making between various professional
groups has obvious implications for the quality of the resulting
criterion measurement system.
When considering construct-related validation of the crite-
rion, unique conceptual and practical issues do arise. On the
predictor side, a test is constructed to sample certain criteria!
behaviors (Messick, 1980) that are specified by the psychologi-
cal construct theory and judged to be indicators of a specific
construct or set of constructs. Criterion measures, likewise, are
developed to be samples of an underlying behavior domain. No-
tice that the relative position of the psychological construct do-
main and performance domain has been changed in Figure 3.
The framework proposed here portrays psychological con-
structs as being more deeply embedded in the nomological net-
work and are more fruitfully conceptualized as labels for behav-
ioral regularities that underlie behavior both sampled by the
predictor and in the performance domain as sampled by the
criterion.
Delineating the Performance Domain
Two prevalent ways of conceptualizing performance domains
are discussed in the performance appraisal literature. One
school of thought places relative emphasis on a conceptualiza-
tion of performance domains as collections of overt job behav-
iors (e.g., Borman, 1983), whereas the other places relative em-
phasis on outcomes or results (e.g., Kane, 1986). The former is
motivated by concern for developing psychological theories that
capture behavioral regularities important to organizational
functioning. The latter recognizes the importance of goal at-
tainment to organizational functioning.
We join others in stressing the inextricable relationship be-
tween job behaviors and outcomes. We propose that perfor-
mance domains are composed of behavior-outcome units. Out-
comes are valued by the organization, and behaviors are the
means to these valued ends. As a result, behaviors take on
different value, depending on the value of their consequent out-
comes. Therefore, optimal description of the performance do-
main for a given job requires careful and complete delineation
of valued outcomes and the accompanying requisite behaviors
(Fine, 1986; James, 1973).
The behavior versus outcome distinction is reflected in the
distinction between composite and multiple criterion models.
The relative merits of these models have been examined in de-
tail elsewhere (Brogden & Taylor, 1950; J. P. Campbell, Dun-
nette, Lawler, & Weick, 1970; Carroll & Schneier, 1982; Dun-
nette, 1963; Guion, 1965; James, 1973; Schmidt & Kaplan,
1971; Smith, 1976; Thorndike, 1949). The important differ-
ence between these models is often viewed as whether different
types of operational criterion information should be combined
or not. For this analysis, however, a more fundamental differ-
ence between the two models is the way in which performance
domains are conceptualized. The composite criterion model im-
plies a unitary (and often economic) conception reflecting an
employee's total worth to an organization. As a result, opera-
tional criteria are designed to reflect the underlying domain by
sampling the "(economic) end products of job behaviors"
(Schmidt & Kaplan, 1971, p. 424; parentheses added). In con-
trast, the multiple criterion model conceptualizes performance
as a behavioral domain within which some behaviors are more
valuable than others for achieving organizational goals. Opera-
tional criteria developed to tap this domain are more behavior-
ally oriented, focusing on individual incidents or dimensions
of actual job behaviors that lead to the attainment of valued
organizational outcomes.
It is important to emphasize that these notions of total eco-
nomic worth versus performance behavior domain are no less
hypothetical constructions than, for instance, intelligence or
reading ability (Schwab, 1980). The fact that job performance
can be described in both ways is reflected in the job analysis
literature by reference to job-oriented (what is accomplished)
versus worker-oriented (what is done to accomplish) bases for
job description (Cummings & Schwab, 1973; McCormick,
1976). However, the relevance of this distinction for criterion
development has generally been unsystematically examined. In
the discussion that follows, we discuss in greater detail the pro-
cedures used to justify Inferences 7, 8, 10, and 11.
Generating Evidence of Criterion Validity
Job analysis provides the evidential basis for justifying Infer-
ences 7, 8, 10, and 11. Most personnel professionals are quick
to agree that systematic job analysis provides the prerequisite
data base for all subsequent selection activities. Yet, perhaps no
other professional activity is better characterized by the idio-
syncratic use of unstandardized procedures and lack of general
VALIDITY OF PERSONNEL DECISIONS 487
principles to guide data collection (Tenopyr, 1986). Clearly, the
proliferation of job analysis procedures is ample testimony to
the conclusion that very little in the way of standard job analytic
procedures exists. Regardless of the reasons for this lack of stan-
dard practice, one result is a relative dearth of both conceptual
and empirical guidelines for adequately justifying the critical
inferential linkages (critical Inferences 7,8, 10, and 11).
Job analysis involves examining job demands and translating
them into behavior-outcome units that define the performance
domain and that subsequently make optimal person-job
matches possible. Inference 10 represents the extent to which
the actual job demands have been adequately analyzed, result-
ing in a valid description of the performance domain. The pro-
cess of substantiating Inference 10 is commonly referred to as
job description. There are at least two fundamental reasons to
suspect the validity of Inference 10 in most applied selection
situations. First, fully adequate taxonomies of job characteris-
tics, which are required for proper delineation of the perfor-
mance domain, have yet to be developed (Fleishman, 1975;
Fleishman & Quaintance, 1984). Second, most jobs are accu-
rately characterized as collections of demands with associated
behavioral universes with fuzzy, if not indeterminant, bound-
aries (Weick, 1979), making their unequivocal delineation logi-
cally impossible.
Inference 11 represents the extent to which behavior-out-
come links have been substantiated. Again, job analysis is the
process of discovering and specifying these links. Some job
analysis procedures more systematically delineate behavior-
outcome links than do others. For example, the critical inci-
dents technique (Flanagan, 1954) formally elicits organization-
ally valued outcomes and systematically ties job behaviors to
these. Functional job analysis also formally assesses these link-
ages through group interviews of subject matter experts (Fine,
1986). Regardless of which method is used, to the extent that
job analysis is conducted without explicating behavior-out-
come links, the validity of Inference 11 is suspect.
Inference 8 links an operational criterion with the perfor-
mance domain. As such, it represents the inference that the op-
erational criterion validly measures the performance domain.
This process is what is commonly referred to as criterion devel-
opment. When the multiple criterion model is used to guide
criterion development, job analysis data in the form of worker-
oriented (what is done to accomplish) behavior requirements
are useful for justifying this inferential linkage. When the com-
posite criterion model is used to guide criterion development,
job analysis data in the form of job-oriented (what is accom-
plished) behavior requirements are most useful. In either case,
justification of Inference 8 typically takes the form of (a) claims
on the part of the job analyst that all major behavioral dimen-
sions or outcomes have been identified and are represented in
the operational criterion measure (e.g., performance rating in-
strument or objective indexes) and, occasionally, (b) psycho-
metric evidence of accuracy or lack of bias in indexes of job
performance (Dickinson, 1987; Kleiman& Durham, 1981). In
other words, criterion measures are usually validated (i.e., evi-
dence for Inference 8 is generated) by rational, albeit tacit,
claims about the content-related evidence of validity. The posi-
tion advanced in this article is that sole reliance on content-
related evidence of criterion validity necessarily means that the
evidential base is deficient relative to the numerous other forms
of evidence available. This is particularly evident in light of
Feldman's 11986) call for a taxonomy of appraisal tasks. He is
pointing out that different types of tasks influence the manner
in which appraisal judgments are made. He goes on to examine
the differences in how these judgments are validated.
The conventional practice of relying solely on single criterion
measures and methods, whose content is often rather unsystem-
atically determined, inevitably leads to many validation efforts
with questionable criterion validity (Guion, 1976). This issue
has been addressed over the years and labeled the criterion
problem. However, except for James's (1973) exposition, little
in-depth analysis of the conceptual issues surrounding the crite-
rion problem has been advanced. Suffice it to say that in many,
if not most, validation situations, the validity of Inference 8 is
suspect, which in turn weakens conclusions about the validity
of other inferences in the system.
Inference 7 is likewise typically justified by the job analyst's
claim that from a specific job analysis, he or she has inferred the
requisite psychological constructs that underlie performance.
This process is commonly referred to as deriving job specifica-
tions. Job analysis data in the form of ability requirements
(Dunnette, 1976; Fleishman, 1982;Pearlman, 1980) are useful
for justifying this inference. Some important theoretical strides
have been made in establishing both theoretical and empirical
linkages between job behaviors and underlying attributes (e.g.,
Fine, 1986;Fine&Wiley, 1971; Fleishman, 1978,1982; Lopez,
Kesselman, & Lopez, 1981; McCormick, 1976). However, in
practice, it is not uncommon for Inference 7 to be informally
justified by job analysts' judgments. Sole reliance on this induc-
tive approach (Bass & G. V. Barrett, 1981) means that the valid-
ity of Inference 7 is suspect whenever these judgments are not
based on current knowledge of construct-behavior relations
and sound reasoning about criterion development. Clearly,
Dunnette's (1976) call to link "the two worlds of behavioral
taxonomies" (p. 514) is still operative.
In addition, personnel specialists must adopt a broader view
of what qualifies as relevant empirical evidence for criterion
linkages. For example, a program designed to train critical job
skills, which is then evaluated by using job performance cri-
teria, provides criterion-related evidence for Inference 7. The
point here is that a wealth of empirical evidence supporting cri-
terion inferences might be more systematically derived from the
extant training literature. This is particularly relevant in those
cases in which training has altered psychological attributes gen-
erally regarded as enduring and less amenable to change. These
cases would be more relevant because selection programs are
more often designed to assess relatively stable constructs be-
cause of the perceived impracticality of trying to change them
through training.
Given that Inferences 7 and 8 may often be justified on tenu-
ous evidential bases, the model presented in Figure 3 leads logi-
cally to an intriguing conclusion regarding the relative superior-
ity of criterion-related versus construct-related evidence of va-
lidity for selection decisions. From this perspective, Inference
5 is a surrogate for the fundamental Inference 9, which links
predictor information to an applicant's true performance in the
488 JOHN F. BINNING AND GERALD V. BARRETT
organization. It is this inference for which validity evidence is
ultimately sought. Taking this logic one step further, to the ex-
tent that Inference 8 is questionable, empirical evidence of In-
ference 5 is not as relevant to Inference 9. Sound evidence of
Inferences 6 and 7 would provide a much more substantive jus-
tification of Inference 9, in this instance. Most selection special-
ists would find it rather easy to recall situations in which their
confidence in Inference 5 (as an index of Inference 9) was se-
verely weakened by information about the deficiency or con-
tamination of a poorly developed criterion measure. Yet, in the
same situation, use of an established assessment instrument, in
combination with a rigorous rationale for why performance re-
quires certain psychological constructs, would provide a firmer
evidential basis on which to conclude the validity of resulting
decisions. This is a case in which construct evidence of validity
is superior to criterion-related evidence. Although this has been
suggested by others, we contend that the lack of critical analysis
of Inferences 5 and 8, which characterizes most validation re-
search, has caused a dramatic underestimate of the frequency
with which construct-related evidence is judged superior to cri-
terion-related evidence.
In the next section, the personnel selection framework pre-
sented in Figure 3 is broadened. Recommendations for future
conceptions of the validation process are then discussed in this
context. It is important to remember that although terminology
will change somewhat and different inferences will be empha-
sized, the process is essentially one of traditional construct vali-
dation. It should become clear from this perspective that the
science of psychology as applied to personnel decision making
involves the development of theories, validation of constructs,
and generation of evidence to support important inferences
about people and their behavior at work.
The Psychological Science of Personnel
Decision Making
Our contention is that the validation process discussed thus
far, if adequately adapted to the unique needs of personnel se-
lection, provides a broader framework for expanding concep-
tions of validity. Thus far, however, our discussion of critical
inferential linkages has resulted in a focus on a more narrow
conception of the validation process than is ultimately desir-
able. We now present a broader view of the nomological frame-
work relevant for developing theory within personnel psychol-
ogy. This framework is schematically presented in Figure 4. The
left side of Figure 4 represents the more traditional notion of
construct validity described by Cronbach and Meehl (1955).
The center of Figure 4 represents the focus of greatest interest
to applied decision makers. Inferences 5, 6, 7, and 8 are of par-
ticular concern because of their direct relevance for justifying
Inference 9. That is, they are relevant for determining the extent
to which inferences from scores on a test of some predictor con-
struct allow predictions of actual job behavior.
Inference 9 is of utmost importance to applied decision mak-
ers. Empirical evidence of Inference 5 provides partial support
of Inference 9 and can be conceived of as a special case of Infer-
ence 13. Evidence supporting Inference 5 can be direct and take
the form of empirically observed relationships. Messick (1980),
when discussing construct-related validity, stated that "some of
the constructs nomological relations thus become criterial
when made specific to the applied setting" (p. 1019). He added
that these predictive relationships are singled out for special at-
tention under the rubric of criterion-related validity and differ
from general nomological relations in being more narrowly fo-
cused on specific sets of data and specific applied settings. Sim-
ilarly, Cook and D. T. Campbell (1979> explicitly stated that
priorities regarding validity issues are fundamentally different
between theoretical and applied researchers. From this perspec-
tive, it can be seen that criterion-related validity evidence is ap-
propriately viewed as a special type of convergent (or discrimi-
nant) evidence of construct-related validity. One can also gener-
ate various indexes of content overlap to support Inference 9.
Evidence of Inference 9 can also be indirect and take the form
of convergent and discriminant relationships between the com-
ponents linked by Inferences 6, 7, and 8. Extending this logic
beyond the original focus, Inference 6 is strengthened by evi-
dence of Inferences 11, 12, 13, and so forth.
The right side of Figure 4 represents construct-related validity
of the criterion. As mentioned earlier, performance domains
have traditionally not been as deeply embedded in networks of
theoretical relationships as constructs on the predictor side.
Theorists and practitioners need to be increasingly aware of the
need to empirically investigate linkages of Inferences 5, 16, 17,
and 18 as evidence to support Inferences 7,8,14, and 15(Vance,
MacCallum, Coovert, & Hedge, 1988).
Traditionally, the focus has been almost exclusively on Infer-
ence 5 through correlations between test scores and a criterion
measure, occasionally on Inference 17 through development of
alternative criterion measures (e.g., Alexander &Wilkins, 1982; •
Cascio & Valenzi, 1978; Holzbach, 1978; Lee, Malone, &
Greco, 1981), and on Inference 8 through assessment of rating
accuracy by "true scores" of performance domains (e.g., Ber-
nardin & Buckley, 1981; Borman, 1979; Hedge & Kavanagh,
1988). Recently, Heneman (1986), in a comparison of supervi-
sory ratings and results-oriented performance indexes, called
for greater emphasis on convergent evidence of criterion valid-
ity. Similarly, James (1973) called for more emphasis on the
three levels of criterion measurement proposed by J. P. Camp-
bell et al. (1970), namely, job behaviors, results, and organiza-
tional outcomes (Smith, 1976).
A Social-Cognitive Perspective on Job Analysis and
Criterion Development
We join others in calling for renewed interest in more rigor-
ous, conceptually coherent criterion development. One impor-
tant issue is that job analysis efforts need to be directed more
at capturing the reality of the organizational context in which
criterion judgments are actually made (Feldman, 1986; Stern,
Stein, & Bloom, 1956; Wiggins, 1973). Our contention is that
typical job analyses produce information that is useful for de-
veloping explicit performance criteria, yet is potentially irrele-
vant to the implicit criteria that often may be used to evaluate
day-to-day performance or promotability (Turnage & Muchin-
sky, 1984). To the extent that the validity of Inference 10 is ques-
tionable, all other inferences in the system are questionable.
VALIDITY OF PERSONNEL DECISIONS 489
Underlying Psychological Construct Domain
Figure 4. An elaborated model for personnel decision research.
Increased concern for the validity of Inference 10 has moti- vated considerable research in recent years on a variety of social and cognitive factors affecting job analysis data (Arvey, Davis, McGowen, & Dipboye, 1982; Cornelius, DeNisi, & Blencoe, 1984; DeNisi, Cornelius, & Blencoe, 1987; Friedman & Har- vey, 1986; Green & Stutzman, 1986; J. E. Smith & Hakel, 1979). The validity of both Inferences 7 and 8 is dependent on
job analytic data used to translate performance behavior into measurable criterion elements and to delineate overlap with predictor constructs. Better understanding the judgments that underlie perceptions of jobs and performance may contribute to improved criterion development (Guion, 1986). In other words, the basic cognitive processes that underlie perceptions of people may also underlie the perceptions of jobs, and there- fore the vast research on person perceptions can be integrated and generalized to enhance our understanding of the determi- nants of job perceptions (e.g., Binning, Zaba, & Whattam, 1986; Cantor &Mischel, 1979; Cleveland ALandy, 1983;Coo- per, 1981; Feldman, 1981, 1986; Funder, 1987; Lord, 1985a, 1985b;Swann, 1984).
A particularly integrative approach is exemplified by Cantor, Mischel, and Schwartz's (1982) prototype assessment of psy- chological situations. They assessed peoples' prototypical be- liefs about person-action combinations in common situations and found considerable consistency across people. This ap- proach could be adapted to the study of job prototypes and their effects on job analysts' perceptions of as well as incumbent performance. Similarly, a considerable amount of job design research exists showing that social information affects percep- tions of task characteristics (Salancik & Pfeffer, 1978; Thomas &Griffin, 1983; Weiss & Shaw, 1979).
At a more molar level, criterion development is largely a so- ciopolitical process and therefore deserves greater attention from this perspective (e.g., Katz & Kahn, 1966; Longenecker,
Sims, & Gioia, 1987; Mitchell & Linden, 1982; Weick, 1979). Programs designed to train raters of performance could also benefit from the integration of research described earlier (Ber- nardin & Buckley, 1981; Borman, 1979; Landy & Farr, 1980).
Expanding on the framework presented, Inference 16 might involve relating specific performance criteria to measures of nonjob behaviors that are theoretically expected to relate to per- formance behaviors in some specified way (e.g., Blau, 1985; Rousseau, 1978; Youngblood, 1984). Youngblood's (1984) study of work and nonwork explanations for absenteeism might exemplify this approach. He found that absenteeism could be explained by the importance of leisure activities engaged in away from the work setting. In a similar vein, perhaps successful managers are more proficient at organizing successful family vacations than are their less competent organizational counter- parts. The point we are making here is that the delineation of behavioral domains can be conceptualized by reference to the- ory surrounding psychological constructs (predictor side) or "theory" surrounding job performance (criterion side). Mean- ingful behavioral regularities may be discovered by investigat- ing relationships between either type of behavioral domain.
Methods used by personality and social psychologists could be adapted for the study of these criterion relationships. For instance, Mischel (1984), Funder (1987), and Funder and Col- vin (1988) reviewed studies relating both lay perceptions and objective measures of personality to independent measures of behavior gathered from peers and family members. Relation- ships between work and nonwork behavior could be investi- gated in an analogous manner. Similarly, the logic of biographi- cal data could be adapted to criterion research. Although bio- data instruments are typically used for prediction purposes, data about nonwork behavior could be collected using an analo- gous questionnaire format. For example, in attempting to verify the accuracy of biographical information, Shaffer, Saunders, &
490 JOHN F. BINNING AND GERALD V. BARRETT
Owens (1986) compared respondents' and parents' descrip-
tions. Extending this to criterion validation, family and peer
descriptions of nonwork behavior could be compared with per-
formance behavior We call attention to Weick's (1979) conten-
tion that "events inside organizations resemble events outside
organizations; sensitivities of the worker inside are continuous
with sensitivities of the worker outside" (p. 31).
In the present framework, the systematic discovery of replica-
ble relationships between measures of performance and non-
work behavior (or reactions) would strengthen the construct-
related validity of performance measurement. In a similar vein,
multivariate studies relating alternative performance criteria to
alternative predictors (Inference 18) are also desirable. We have
now come logically full circle. Figure 4 should eventually be-
come spherical in shape as the nomological network linking
psychological constructs and performance domains becomes
more fully articulated and interwoven.
Another Call for Experimenting Organizations
Perhaps the greatest advancement for the science of person-
nel psychology will come only when the values driving organiza-
tional administrators' decisions about behavioral science re-
search are changed. For many reasons, the behavioral sciences
have what Staw (1977) described as a "center-periphery" rela-
tionship to the administrative users of scientific knowledge. He
contends that new knowledge is created by researchers "who
are presumably at the center of knowledge" (p. 426) and it is
considered their responsibility to disseminate expertise to or-
ganizational users in a prepackaged, "formula-like" fashion.
Failures of behavioral science interventions are thus more likely
to be attributed to deficiencies in science and knowledge rather
than inappropriate expectations for generalizability. Staw
(1977) envisioned a much healthier relationship in which the
seat of innovation is at the local organizational level. This shift
in values reorients the role of behavioral science research so
that it lies in the periphery as a resource to guide organizational
experimentation. Concomitant with this reorientation is a shift
in the educational role of behavioral science and the manner
in which knowledge is disseminated. Rather than persuading
practitioners to adopt a particular theory or planned interven-
tion, psychologists' efforts should be directed more toward sell-
ing the benefits of experimenting organizations, where ongoing,
systemwide, multivariate research is made an integral part of
organizational functioning. Consequently, the psychologist's
role would increasingly involve training practitioners in re-
search evaluation skills.
A concomitant shift from summative to formative evaluation
(Staw, 1977) is also desirable. The process of inferring whether
a specific program or intervention has worked or has had a posi-
tive effect is referred to as summative evaluation. A more itera-
tive and ongoing process of selecting program goals and build-
ing organizational interventions is referred to as formative
evaluation. Formative evaluation implies the successive approx-
imation of desired organizational systems, built through a series
of trials in which failures are considered as informative as suc-
cesses. In this new research context, the term failure merely
implies some unpredicted outcome or result, equally useful for
refinement in the next stage of program development. It is nec-
essary to change typical organizational values so that systems
can be developed to effectively monitor, provide feedback, and
utilize negative as well as positive data.
The creation of experimenting organizations could have vast
implications for personnel selection research. Greater emphasis
would be placed on large-scale, programmatic research involv-
ing the melding of laboratory and field settings (G. V. Barrett,
1972; Flanagan & Dipboye, 1981). This could lead not only to
richer, more efficient theory development but also better under-
standing of longitudinal changes in employee and job character-
istics. For instance, Helmreich, Sawin, and Carsrud (1986)
demonstrated the effects of predictor-criterion time lags on the
predictive power of personality characteristics. They argued
that personality traits have their most potent effects on job per-
formance only after considerable time on a job. Longitudinal
data gathering is important for criterion measurement as well.
Meyer (1987) presented data suggesting that cognitive ability
tests are more predictive of managerial promotional progress
over time than of supervisory ratings of performance at a given
point in time. The same general issue can be raised regarding
social influence effects that result from any organizational inter-
vention. Administrators' preoccupation with one-shot, short-
term identification of successful selection procedures has most
likely masked many useful approaches to employee selection.
Experimenting organizations also would create an environ-
ment in which macro and micro issues could be more systemat-
ically integrated. For example, little work has been done to cre-
ate contingency models relating organizational structures with
job design and criterion development processes. Yet, Mintzberg
(1983) describes in detail how organizations' structural param-
eters affect individual-level control mechanisms and job charac-
teristics. Some organizational structures (e.g., machine bureau-
cracies) contain jobs that are more amenable to the multiple
criterion model. The primary coordination mechanism in ma-
chine bureaucracies is standardization of work processes. This
is possible because jobs are highly routine, and behavior-out-
come links are explicitly understood and programmed. In other
types of organizations (e.g., professional bureaucracies, adhoc-
racies) the composite criterion model may be more appropriate.
In these types of organizations (or parts of organizations) where
work processes are more complex and less programmable, co-
ordination is achieved through other mechanisms such as stan-
dardization of work output, necessitating a performance do-
main that comprises outcomes rather than behaviors. In still
other types of jobs such as higher level managerial jobs, neither
work processes nor work outputs are specifiable a priori (J. P.
Campbell, 1983; Feldman, 1986; Palermo, 1983). In these in-
stances, coordination is achieved primarily through standard-
ization of input skills and knowledge. In these jobs, an individu-
al's worth to the organization might be more fruitfully indexed
by assessment of changes in job-related knowledge. This logic
underlies trait-based and competency-based assessment (Sokol
& Oresick, 1986). We note that it is conceptually flawed to refer
to trait-based performance appraisal, because job performance
is not being appraised. Personal characteristics are being as-
sessed with the assumption that they will reflect an individual's
worth to the organization. Rather than assessing an individual's
VALIDITY OF PERSONNEL DECISIONS 491
contribution to organizational goal attainment, the potential of
such attainment is assessed by using the same logic as personnel
selection.
Micro-macro integration is desirable on the predictor side as
well. A basic tenet of Schneider's (1987) altraction-selection-
attrition framework is that macro-organizational structure
differences can be best understood through types or profiles of
individual employee characteristics. We believe that the cre-
ation of experimenting organizations will ultimately do more
to enhance our movement from test validation to selection re-
search (Guion, 1976), making it more likely that dynamic, mul-
tivariate relationships can be fruitfully understood and used to
enhance the quality of staffing decisions and ultimate organiza-
tional effectiveness.
References
Alexander, E. R., & Wilkins, R. D. (1982). Performance rating validity:
The relationship of objective and subjective measures of perfor-
mance. Group and Organization Studies, 7,485-496.
American Psychological Association. (1985). Standards for educational
and psychological testing. Washington, DC: Author.
American Psychological Association, American Educational Research
Association, & National Council of Measurement Used in Education
(joint committee). (1954). Technical recommendations for psycho-
logical tests and diagnostic techniques. Psychological Bulletin, 51,
201-238.
Anderson, N. H., & Shanteau, J. (1977). Weak inference with linear
models. Psychological Bulletin, 84, 1155-1170.
Arvey, R. D., Davis, G. A., McGowen, S. L., & Dipboye, R. L. (1982).
Potential sources of bias in job analytic processes. Academy of Man-
agement Journal, 25, 618-629.
Banks, C. G., & Roberson, L. (1985). Performance appraisers as test
developers. Academy of Management Review, 10, 128-142.
Barrett, G. V. (1972). Symposium: Research models of the future for
industrial and organizational psychology. Personnel Psychology, 25,
1-17.
Barrett, G. V., Alexander, R. A., O'Connoi; E. J., & Forbes, J. B. (1978).
Values and professional judgment in validating and litigating tests for
civil service positions. Professional Psychology, 9, 137-144.
Barrett, G. V., Caldwell, M. S., & Alexander, R. A. (1985). The concept
of dynamic criteria: A critical reanalysis. Personnel Psychology, 38,
41-56.
Barrett, G. V., & Kernan, M. C. (1987). Performance appraisal and ter-
minations: A review of court decisions since Brito v. Zia with implica-
tions for personnel practices. Personnel Psychology, 40, 489-504.
Barrett, G. V., Phillips, J. S., & Alexander, R. A. (1981). Concurrent and
predictive validity designs: A critical reanalysis. Journal of Applied
Psychology, 66, 1-6.
Barrett, R. S. (1980). Is the test content-valid: Or, does it really measure
a construct? Employee Relations Law Journal, 6, 459-475.
Bass, B. M., & Barrett, G. V. (-1981). People, work, and organizations.
(2nd ed.). Boston: Allyn & Bacon.
Bernardin, H. J., & Buckley, M. R. (1981). A consideration of strategies
in rater training. Academy of Management Review, 2, 205-212.
Bernardin, H. J., & Smith, P. C. (1981). Clarification of some issues
regarding the development and use of behaviorally anchored rating
scales (BARS). Journal of Applied Psychology, 66, 458-463.
Binning, J. F., Zaba, A. J., & Whattam, J. C. (1986). Explaining the
biasing effects of performance cues in terms of cognitive categoriza-
tion. Academy of Management Journal, 29, 521-535.
Blau, G. J. (1985). Relationship of extrinsic, intrinsic, and demographic
predictors to various types of withdrawal behaviors. Journal of Ap-
plied Psychology, 70,442-450.
Borman, W. C. (1979). Format and training effects on rating accuracy
and rating errors. Journal of Applied Psychology, 64, 410-421.
Borman, W. C. (1983). Implications of personality theory and research
for the rating of work performance in organizations. In F. Landy, S.
Zedeck, & J. Cleveland (Eds.), Performance measurement and theory
(pp. 127-172). Hillsdale, NJ: Erlbaum.
Brogden, H. E., & Taylor, E. K. (1950). The dollar criterion. Applying
the cost accounting concept to criterion construction. Personnel Psy-
chology. 3, 133-167.
Burke, M. J. (1984). Validity generalization: A review and critique of
the correlational model. Personnel Psychology, 37,93-116.
Callender, J. C, & Osburn, H. G. (1980). Development and test of a
new model of validity generalization. Journal of Applied Psychology,
65, 664-670.
Campbell, D. T, & Fiske, D. W. (1959). Convergent and discriminant
validation by the multitrait-multimethod matrix. Psychological Bul-
letin, 56, 81-105.
Campbell, J. P. (1983). Some possible implications of "modeling" for
the conceptualization of measurement. In F. Landy, S. Zedeck, & J.
Cleveland (Eds.), Performance measurement and theory (pp. 277-
298). Hillsdale, NJ: Erlbaum.
Campbell, J. P., Dunnette, M. D., Lawler, E. E., & Weick, K. E. (1970).
Managerial behavior, performance, and effectiveness. New York: Mc-
Graw-Hill.
Campion, M. A., & Thayer, P. W. (1985). Development and field evalua-
tion of an interdisciplinary measure of job design. Journal of Applied
Psychology, 70, 29-43.
Cantor, N., & Mischel, W. (1979). Prototypes in person perception. In
L. Berkowitz (Ed.), Advances in experimental social psychology (Vol.
12, pp. 3-52). New %rk: Academic Press.
Cantor, N., Mischel, W., & Schwartz, J. C. (1982). A prototype analysis
of psychological situations. Cognitive Psychology, 14,45-77.
Carroll, S. J., & Schneier, C. E. (1982). Performance appraisal and re-
view systems. Glenview, IL: Scott, Foresman.
Cascio, W. F. (1987). Applied psychology in personnel management (3rd
ed.). Englewood Cliffs, NJ: Prentice-Hall.
Cascio, W. E, & Valenzi, E. R. (1978). Relations among criteria of police
performance. Journal of Applied Psychology, 63, 22-28.
Cleveland, J. N., & Landy, F. J. (1983). The effects of person and job
stereotypes on two personnel decisions. Journal of Applied Psychol-
ogy, 68, 609-619.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimenlation: Design
& analysis issues for field settings. Chicago: Rand McNally.
Cooper, W. H. (1981). Ubiquitous halo. Psychological Bulletin, 90,218-
244.
Cornelius, E. T. Ill, DeNisi, A. S., & Blencoe, A. G. (1984). Expert and
naive raters using the PAQ: Does it matter? Personnel Psychology, 37,
453-464.
Cronbach, L. J. (1970). Essentials of psychological testing (3rd ed.).
New York: Harper & Row.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychologi-
cal tests. Psychological Bulletin, 52, 281-302.
Cummings, L. L., & Schwab, D. P. (1973). Performance in organiza-
tions: Determinants & appraisal. Glenview, IL: Scott, Foresman.
DeNisi, A. S., Cornelius, E. T, III, & Blencoe, A. G. (1987). Further
investigation of common knowledge effects on job analysis ratings.
Journal of Applied Psychology, 72, 262-268.
Dickinson, T. L. (1987). Designs for evaluating the validity and accu-
racy of performance ratings. Organizational Behavior and Human
Performance, 40, 1-21.
492 JOHN F. BINNING AND GERALD V. BARRETT
Doran, R. (1987). How to examine construct validity of item banks. Quality and Quantity, 21,139-151.
Drasgow, E, & Miller, H. E. (1982). Psychometric and substantive issues in scale construction and validation. Journal of Applied Psychology, 67, 268-279.
Dunnette, M. D. (1963). A note on the criterion. Journal of Applied Psychology, 47, 251-254.
Dunnette, M. D. (1976). Aptitudes, abilities, and skills. In M. D. Dun- nette (Ed.), Handbook of industrial and organizational psychology (pp. 473-520). Chicago: Rand McNally.
Dunnette, M. D., & Borman, W. C. (1979). Personnel selection and classification. Annual Review of Psychology, 30, 477-525.
Ebel, R. L. (1961). Must all tests1 be valid? American Psychologist, 16, 640-647.
Ebel, R. L. (1977). Comments on some problems of employment test- ing. Personnel Psychology,>.30,55-63.
Faley, R. H., & Sundstrom, E. (1985). Content representativeness: An empirical method of evaluation. Journal of Applied Psychology, 70, 567-571.
Feldman, 1. M. (1981). Beyond attribution theory: Cognitive processes
in performance appraisal. Journal of Applied Psychology, 66, 127- 148.
Feldman, J. M. (1986). Instrumentation and training for performance appraisal: A perceptual-cognitive viewpoint. In K. M. Rowland & G. R. Ferris (Eds.), Research in personnel and human resources man- agement (pp. 45-99). Greenwich, CT: JAI Press.
Fine, S. A. (1986). Jobanalysis. In R. A. Berk (Ed.), Performance assess- ment: Methods and applications (pp. 53-81). Baltimore, MD: Johns Hopkins University Press.
Fine, S. A., & Wiley, W. W. (1971). An introduction to functional job analysis. Kalamazoo, MI: W. E. Upjohn Institute for Employment Research.
Fiske, D. W. (1951). Values, theory, and the criterion problem. Person- nel Psychology, 4, 93-98.
Flanagan, J. C. (1954). The critical incident technique. Psychological Bulletin, 51, 327-358.
Flanagan, M. F, & Dipboye, R. L. (1981). Research settings in indus- trial and organizational psychology: Facts, fallacies, and the future. Personnel Psychology, 34, 37-47.
Fleishman, E. A. (1975). Toward a taxonomy of human performance. American Psychologist, 30,1127-1149.
Fleishman, E. A. (1978). Relating individual differences to the dimen- sions of human tasks. Ergonomics, 21, 1007-1019.
Fleishman, E. A. (1982). Systems for describing human tasks. American Psychologist. 37, 821-834.
Fleishman, E. A., &. Quaintance, M. K. (1984). The description of hu- man tasks. New York: Academic Press.
Frederiksen, N. (1986). Construct validity and construct similarity: Methods for use in test development and test validation. Multivariate Behavioral Research, 21, 3-28.
Friedman, L., & Harvey, R. J. (1986). Can raters with reduced job de- scriptive information provide accurate Position Analysis Question- naire (PAQ) ratings? Personnel Psychology. 39, 779-789.
Funder, D. C. (1987). Errors and mistakes: Evaluating the accuracy of social judgment. Psychological Bulletin, 101, 75-90.
Funder, D. C., & Colvin, C. R. (1988). Friends and strangers: Acquain- tanceship, agreement, and the accuracy of personality judgment. Journal of Personality and Social Psychology, 55. 149-158.
Ghiselli, E. E. (1956). Dimensional problems of criteria. Journal of Ap- plied Psychology, 40, 374-377.
Green, S. B., & Stutzman, T. (1986). An evaluation of methods to select respondents to structured job-analysis questionnaires. Personnel Psy- chology, 39, 543-564.
Griffin, R. W. (1982). Task design: An integrative approach. Glenview, IL: Scott, Foresman.
Guilford, J. P. (1967). The nature of human intelligence. New York: McGraw-Hill.
Guion, R. M. (1961). Criterion measurement and personnel judgments. Personnel Psychology, 14, 141-149.
Guion, R. M. (1965). Personnel testingAtev/ York: McGraw-Hill. Guion, R. M. (1974). Open a new window: Validities and values in psy-
chological measurement. American Psychologist.'29, 287-296.
Guion, R. M. (1976). Recruiting, selection, and job placement. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 777-828). Chicago: Rand McNally.
Guion, R. M. (1977). Content validity, the source of my discontent. Applied Psychological Measurement, 1, 1-10.
Guion, R. M. (1978). Content validity in moderation. Personnel Psy- chology, 31, 205-214.
Guion, R. M. (1980). On trinitarian doctrines of validity. Professional Psychology. 11. 385-398.
Guion, R. M. (1986). Personnel evaluation. In R. A. Berk (Ed.), Perfor- mance assessment: Methods and applications (pp. 345-375). Balti- more, MD: Johns Hopkins University Press.
Guion, R. M. (1987). Changing views for personnel selection research. Personnel Psychology, 40, 199-213.
Guion, R. M., & Cranny, C. J. (1982). A note on concurrent and predic- tive validity designs: A critical reanalysis. Journal of Applied Psychol- ogy, 67. 239-244.
Gutenberg, R. L., Arvey, R. D., Osburn, H. G., & Jeanneret, P. R. (1983). Moderating effects of decision-making/information-process- ing job dimensions on test validities. Journal of Applied Psychology, 68, 602-608.
Hamilton, J. W. (1981). Options for small sample sizes in validation: A case for the J-coefficient. Personnel Psychology, 34, 805-816.
Harvey, R. J. (1986). Quantitative approaches to job classification: A review and critique. Personnel Psychology, 39,267-289.
Hedge, J. W., & Kavanagh, M. J. (1988). Improving the accuracy of performance evaluations: Comparison of three methods of perfor- mance appraiser training. Journal of Applied Psychology, 73,68-73.
Helmreich, R. L., Sawin, L. L., & Carsrud, A. L. (1986). The honey- moon effect in job performance: Temporal increases in the predictive power of achievement motivation. Journal of Applied Psychology, 71, 185-188.
Heneman, R. L. (1986). The relationship between supervisory ratings and results-oriented measures of performance: A meta-analysis. Per- sonnel Psychology, 39, 811-826.
Hirsch, H. R., Schmidt, F. L., & Hunter, J. E. (1986). Estimation of employment validities by less experienced judges. Personnel Psychol- ogy. 39, 337-344.
Holzbach, R. L. (1978). Rater bias in performance ratings: Superior-, self-, and peer ratings. Journal of Applied Psychology, 63, 579-588.
James, L. R. (1973). Criterion models and construct validity for cri- teria. Psychological Bulletin, 80, 75-83.
James, L. R., Demaree, R. G., & Mulaik, S. A. (1986). A note on valid- ity generalization procedures. Journal of Applied Psychology, 71, 440-450.
Kane, M. T. (1982). A sampling model for validity. Applied Psychologi- cal Measurement, 6, 125-160.
Kane, J. S. (1986). Performance distribution assessment. In R. A. Berk (Ed.), Performance assessment: Methods and applications (pp. 237- 273). Baltimore, MD: Johns Hopkins University Press.
Katz, D., & Kahn, R. L. (1966). The social psychology of organizations. New York: Wiley.
Kenrick, D. T, & Funder, D. C. (1988). Profiting from controversy: Les-
VALIDITY OF PERSONNEL DECISIONS 493
sees from the person-situation debate. American Psychologist, 43, 23-34.
Kleiman, L. S., & Durham, R. L. (1981). Performance appraisal, pro- motion and the courts: A critical review. Personnel Psychology, 34, 103-121.
Kleiman, L. S., & Faley, R. H. (1978). Assessing content validity: Stan- dards set by the courts. Personnel Psychology, 57,701-713.
Klimoski, R.,& Brickner, M. (1987). Why do assessment centers work? The puzzle of assessment center validity. Personnel Psychology, 40, 243-260.
Landy, F. J. (1986). Stamp collecting versus science: Validation as hy- pothesis testing. American Psychologist, 41, 1183-1192.
Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72-107.
Lawshe, C. H. (1975). A quantitative approach to content validity. Per- sonnel Psychology. 28, 563-575.
Lawshe, C. H. (1985). Inferences from personnel tests and their validity. Journal of Applied Psychology, 70,237-238.
Lee, R., Malone, M., & Greco, S. (1981). Multitrait-multimethod- multirater analysis of performance ratings for law enforcement per- sonnel. Journal of Applied Psychology, 66, 625-632.
Loevinger, J. (1957). Objective tests as instruments of psychological the- ory [Monograph No. 9]. Psychological Reports, 3, 635-694.
Longenecker, C. Q, Sims, H. P., Jr., & Gioia, D. A. (1987). Behind the mask: The politics of employee appraisal. Academy of Management Executive,!, 183-193.
Lopez, F. M., Kesselman, G. A., & Lopez, F. E. (1981). An empirical test of a trait-oriented job analysis technique. Personnel Psychology, 34, 479-502.
Lord, R. G. (1985a). Social information processing and behavioral mea- surement: Application to leadership measurement. In B. M. Staw &
L. L. Cummings (Eds.), Research in organizational behavior (Vol. 7, pp. 87-128). Greenwich, CT: JAI Press.
Lord, R. G. (1985b). Accuracy in behavioral measurement: An alterna- tive definition based on raters' cognitive schema and signal detection theory. Journal of Applied Psychology, 70, 66-71.
Margenau, H. (1950). The nature of physical reality. New \fork: Mc- Graw-Hill.
McCormick, E. J. (1976). Job and task analysis. In M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 651-696). Chicago: Rand McNally.
Messick, S. (1975). Meaning and values in measurement and evalua- tion. American Psychologist, 30, 1012-1027.
Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012-1027.
Messick, S. (1981). Constructs and their vicissitudes in educational and psychological measurement. American Psychologist. 89.575-588.
Meyer, H. H. (1987). Predicting supervisory ratings versus promotional progress in test validation studies. Journal of Applied Psychology, 72, 696-697.
Mintzberg, H. (1983). Structure in fives: Designing effective organiza- tions. Englewood Cliffs, NJ: Prentice-Hall.
Mischel, W. (1973). Toward a cognitive social learning reconceptualiza- tion of personality. Psychological Review, 80, 252-283.
Mischel, W. (1984). Convergences and challenges in search for consis- tency. American Psychologist, 39, 351-364.
Mitchell, T. R., & Linden, R. C. (1982). The effects of the social context on performance evaluations. Organizational Behavior and Human Performance, 29, 241-256.
Monahan, C. J., &Muchinsky, P. M. (1983). Three decades of personnel selection research: A state-of-the-art analysis and evaluation. Journal of Occupational Psychology, 56, 215-225.
Muchinsky, P. M. (1987). Psychology applied to work (2nd ed.). Chi- cago: Dorsey Press.
Nunnally, J. C. (1978). Psychometric theory. New librk: McGraw-Hill.
O'Grady, K. E. (1982). Measures of explained variance: Cautions and limitations. Psychological Bulletin, 92, 766-777.
Palermo^ D. S. (1983). Cognition, concepts, and an employee's theory of the world. In F. Landy, S. Zedeck, & J. Cleveland (Eds.), Perfor- mance measurement and theory (pp. 97-115). Hillsdale, NJ: Erl- baum.
Pearlman, K. (1980). Job families: A review and discussion of their im- plications for personnel selection. Psychological Bulletin, 87, 1-27.
Rezmovic, E. L., & Rezmovic, V (1981). A confirmatory factor analysis approach to construct validation. Educational and Psychological Measurement, 41,61-12.
Rousseau, D. M. (1978). Relationship of work tononwork. Journal of Applied Psychology, 63, 513-517.
Rushton, J. P., Brainerd, C. J., & Pressley, M. (1983). Behavioral devel- opment and construct validity: The principle of aggregation. Psycho- logical Bulletin. 94, 18-38.
Saal, F. E., & Knight, P. A. (1988). Industrial/organizational psychol- ogy: Science and practice. Pacific Grove, CA: Brooks/Cole.
Salanciki G. R., & Pfeffer, J. (1978). A social information processing approach to job attitudes and task design. Administrative Science Quarterly, 23, 224-252.
Sawyer, J. (1966). Measurement and prediction, clinical and statistical. Psychological Bulletin, 66, 178-200.
Schmidt, F. L., & Hunter, J. E. (1981). Employment testing: Old theo- ries and new research. American Psychologist, 36, 1128-1137.
Schmidt, F. L., Hunter, J. E., Croll, P. R., & McKenzie, R. C. (1983). Estimation of employment test validities by expert judgment. Journal of Applied Psychology, 68, 590-601.
Schmidt, F. L., Hunter, J. E., & Urry, V. W. (1976). Statistical power in criterion-related validation studies. Journal of Applied Psychology, 61, 473-485.
Schmidt, F. L.,& Kaplan, L. B. (1971). Composite vs. multiple criteria: A review and resolution of the controversy. Personnel Psychology, 24, 419^134.
Schmitt, N., & Noe, R. A. (1986). On shifting standards for conclusions regarding validity generalization. Personnel Psychology, 39, 849-851.
Schmitt, N., & Ostroff, C. (1986). Operationalizing the "behavioral con- sistency" approach: Selection test development based on a content- oriented strategy. Personnel Psychology, 39, 91-108.
Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984). Meta- analysis of validity studies published between 1964 and 1982 and the investigation of study characteristics. Personnel Psychology. 37, 407- 422.
Schneider, B. (1987). The people make the place. Personnel Psychology, 40, 437-454.
Schwab, D. P. (1980). Construct validity in organizational behavior. In B. M. Staw & L. L. Cummings (Eds.), Research in organizational behavior (Vol. 2, pp. 3-44). Greenwich, CT: JAI Press.
Shaffer, G. S., Saunders, V., & Owens, W. A. (1986). Additional evidence for the accuracy of biographical data: Long-term retest and observer ratings. Personnel Psychology, 39, 791-809.
Smith, J. E., & Hakel, M. D. (1979). Convergence among data sources, response bias, and reliability and validity of a structured job analysis questionnaire. Personnel Psychology, 32,677-692.
Smith, P. C. (1976). Behaviors, results, and organizational effectiveness: The problem of criteriaflln M. D. Dunnette (Ed.), Handbook of in- dustrial and organisational psychology (pp. 745-776). Chicago: Rand McNally.
Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations:
494 JOHN F. BINNING AND GERALD V. BARRETT
An approach to the construction of unambiguous anchors for rating Kates. Journal of Applied Psychology, 47, 149-155.
Society for Industrial and Organizational Psychology. (1987). Principles for the validation and use of personnel selection procedures (3rd ed.). Washington, DC: Author.
Sokol, M, & Oresick, R. (1986). Managerial performance appraisal. In R. A. Berk (Ed.), Performance assessment: Methods & applications.
Baltimore, MD: Johns Hopkins University Press. Staw, B. M. (1977). The experimenting organization: Problems and
prospects. In B. M. Staw (Ed.), Psychological foundations of organiza- tional behavior (2nd ed.; pp. 421-437). Santa Monica, CA: Goodyear.
Stem, G. G., Stein, M. I., & Bloom, B. S. (1956). Methods in personality assessment. Glencoe, IL: Free Press.
Sussmann, M., & Robertson, D. U. (1986). The validity of validity: An
analysis of validation study designs. Journal of Applied Psychology,
71, 461-468. Swann, W. B. (1984). Quest for accuracy in person perception: A matter
of pragmatics. Psychological Review, 91, 457-477. Tenopyr, M. L. (1977). Content-construct confusion. Personnel Psychol-
ogy, 30, 47-54. Tenopyr, M. L. (1986). Needed directions for measurement in work set-
tings. In B. S. Flake & J. C. Witt (Eds.), The future oftesting(pp. 269-
288). Hillsdale, NJ: Erlbaum. Tenopyr, M. L., & Oeltjen, P. D. (1982). Personnel selection and classi-
fication. Annual Review of Psychology, 33, 581-618.
Thomas, J., & Griffin, R. (1983). The social information processing model of task design: A review of the literature. Academy of Manage-
ment Review, S, 672-682. Thorndike, R. L. (1949). Personnel selection: Test and measurement
techniques. New \brk: Wiley.
Thurstone, L. L. (1938). Primary mental abilities. Psychometric Mono-
graphs, No. 4. Chicago: University of Chicago Press. Trattner, M. H. (1982). Synthetic validity and its application to the Uni-
form Guidelines validation requirements. Personnel Psychology, 35, 383-397.
Turnage, J. L., & Muchinsky, P. M. (1984). A comparison of the predic- tive validity of assessment center evaluations versus traditional mea- sures in forecasting supervisory job performance: Interpretive im- plications of criterion distortion for the assessment paradigm. Jour-
nal of Applied Psychology, 69, 595-602. Uniform Guidelines on Employee Selection Procedures. 43 Federal
Register 38290-38315 (1978).
Vance, R. J., MacCallum, R. C., Coovert, M. D., & Hedge, J. W. (1988). Construct validity of multiple job performance measures using con- firmatory factor analysis. Journal of Applied Psychology, 73, 74-80.
Weick, K. E. (1979). The social psychology of organizing. New \brk: Random House.
Weiss, H. M., & Shaw, J. B. (1979). Social influences on judgments about tasks. Organizational Behavior and Human Performance, 24, 126-140.
Wernimont, P. F., & Campbell, J. P. (1968). Signs, samples, and criteria. Journal of Applied Psychology, 52, 372-376.
Wiggins, J. S. (1973). Personality and prediction: Principles of personal- ity assessment. Reading, MA: Addison-Wesley.
Ybungblood, S. A. (1984). Work, nonwork, and withdrawal. Journal of Applied Psychology, 69, 106-117.
Received December 8, 1987
Revision received September 23, 1988
Accepted October 4, 1988 •