Assessment Guide
https://doi.org/10.1177/0091026020935582
Public Personnel Management 2021, Vol. 50(2) 232 –257
© The Author(s) 2020 Article reuse guidelines:
sagepub.com/journals-permissions DOI: 10.1177/0091026020935582
journals.sagepub.com/home/ppm
Article
A Critical Examination of Content Validity Evidence and Personality Testing for Employee Selection
David M. Fisher1 , Christopher R. Milane2, Sarah Sullivan3, and Robert P. Tett1
Abstract Prominent standards/guidelines concerning test validation provide contradictory information about whether content-based evidence should be used as a means of validating personality test inferences for employee selection. This unresolved discrepancy is problematic considering the prevalence of personality testing, the importance of gathering sound validity evidence, and the deference given to these standards/guidelines in contemporary employee selection practice. As a consequence, test users and practitioners are likely to be reticent or uncertain about gathering content-based evidence for personality measures, which, in turn, may cause such evidence to be underutilized when personality testing is of interest. The current investigation critically examines whether (and how) content validity evidence should be used for measures of personality in relation to employee selection. The ensuing discussion, which is especially relevant in highly litigious contexts such as personnel selection in the public sector, sheds new light on test validation practices.
Keywords test validation, content validity, personality testing, employee selection
An essential consideration when using any test or measurement tool for employee selection is gathering and evaluating relevant validity evidence. In the contemporary employee selection context, validity evidence is generally understood to mean
1The University of Tulsa, OK, USA 2Qualtrics, Provo, UT, USA 3Rice University, Houston, TX, USA
Corresponding Author: David M. Fisher, Assistant Professor of Psychology, The University of Tulsa, 800 S. Tucker Drive, Tulsa, OK 74104, USA. Email: [email protected]
935582PPMXXX10.1177/0091026020935582Public Personnel ManagementFisher et al. research-article2020
Fisher et al. 233
evidence that substantiates inferences made from test scores. Various sources provide standards and guidelines for gathering validity evidence, including the Uniform Guidelines on Employee Selection Procedures (Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department of Justice, 1978; hereafter, Uniform Guidelines, 1978), the Principles for the Validation and Use of Personnel Selection Procedures (Society for Industrial and Organizational Psychology [SIOP], 2003; hereafter, SIOP Principles, 2003), and Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999/2014; hereafter, Joint Standards, 1999/2014), as well as the academic literature (e.g., Aguinis et al., 2001). Having such a variety of sources available is beneficial, but challenges arise when the various sources provide ambiguous or con- tradictory information. Such ambiguity can be particularly troublesome in highly liti- gious contexts, such as the public sector, where adherence to regulations governing selection is of paramount importance.
The current investigation attempts to shed light on one such area of ambiguity— whether evidence based on test content should be used as a means of validating per- sonality test inferences for employee selection. Rothstein and Goffin (2006) noted, “It has been estimated that personality testing is a $400 million industry in the United States and it is growing at an average of 10% a year” (Hsu, 2004, p. 156). Given this reality, it is important to carefully consider appropriate validation procedures for such measures. However, the various sources mentioned above present conflicting direc- tions on this issue, specifically in relation to content-based validity evidence. On one hand, evidence based on test content is one of five potential sources of validity evi- dence described by the Joint Standards (1999/2014), which is similarly endorsed by the SIOP Principles (2003). This form of evidence has further been suggested by some to be particularly relevant to personality tests (e.g., Murphy et al., 2009; O’Neill et al., 2009), and especially under challenging validation conditions, such as small sample sizes, test security concerns, or lack of a reliable criterion measure (Landy, 1986; Tan, 2009; Thornton, 2009). On the other hand, the Uniform Guidelines (1978) assert that “. . . a content strategy is not appropriate for demonstrating the validity of selection procedures which purport to measure traits or constructs, such as intelligence, apti- tude, personality, commonsense, judgment, leadership, and spatial ability [emphasis added]” (Section 14.C.1). Other sources similarly convey reticence toward content validity for measures of traits or constructs (e.g., Goldstein et al., 1993; Lawshe, 1985; Wollack, 1976). Thus, there appears to be conflicting guidance on the use of content validity evidence to support personality measures.
In light of this discrepancy, the current investigation offers a critical examination of content validity evidence and personality testing for employee selection. Such an investigation is valuable for several reasons. First, an important consequence of the inconsistency noted above is that content-based evidence may be overlooked as a valuable approach to validation when personality testing is of interest. Evidence for this can be seen in the fact that other approaches such as criterion-related validation are sometimes viewed as the only option for personality measures (Biddle, 2011).
234 Public Personnel Management 50(2)
Similarly, prominent writings on personality testing in the workplace (e.g., Morgeson et al., 2007b; O’Neill et al., 2013; Ones et al., 2007; Rothstein & Goffin, 2006; Tett & Christiansen, 2007) have tended to ignore the applicability of content validation to personality measures. Furthermore, considering the deference given to the various standards and guidelines in contemporary employee selection practice (Schmit & Ryan, 2013), those concerned about strict adherence to such standards/guidelines are likely to be reticent or uncertain about gathering content-based evidence for personal- ity measures—in no small part due to conflicting or ambiguous recommendations. The above circumstances tend to relegate content-based evidence to be seen as less desir- able or otherwise viewed as an afterthought. In turn, this represents a missed opportu- nity for valuable insight into the use of personality measures.
Second, the neglect or underutilization of content-based evidence is, in many ways, antithetical to the broader goal of developing a theory-based and scientifically grounded understanding of tests and measures used for employee selection (Binning & Barrett, 1989). For example, as elaborated below, there are various situations in which content-based evidence may be more optimal than criterion-based evidence, not the least of which includes an insufficient sample size for a criterion-based investiga- tion (McDaniel et al., 2011). Similarly, an exclusive focus on empirical prediction ignores the importance of underlying theory, which is critical for advancing employee selection research. Of relevance, the examination of content validity evidence forces one to carefully consider the correspondence between selection measures and underly- ing construct domains, as informed by theoretical considerations. Evidence for the value of content validity can also be found in trait activation theory (Tett & Burnett, 2003; Tett et al., 2013), which highlights the importance of a clear conceptual linkage between the content of personality traits/constructs and the job domain in question. Thus, content validity evidence should be of primary importance for personality test validation.
Third, it is useful to acknowledge that the prohibition against content validity evi- dence in relation to personality measures noted in the Uniform Guidelines (1978) appears to be at odds with contemporary thinking on validation (Joint Standards, 1999/2014). The focal passage quoted above from the Uniform Guidelines has been described as being “. . . as destructive to the interface of psychological theory and practice as any that might have been conceived” (Landy, 1986, p. 1189). Although there have been well-argued critiques of the Uniform Guidelines (e.g., McDaniel et al., 2011), in addition to thoughtful elaboration of issues surrounding content valid- ity (e.g., Binning & LeBreton, 2009), a direct attempt at resolving the noted contradic- tion remains conspicuously absent from the literature. This contradiction, in conjunction with the absence of a satisfactory explanation, is problematic given the importance of gathering sound validity evidence pertaining to psychological test use. As such, a critical examination of this issue is warranted.
Finally, the findings of the current investigation are likely to have broad applicabil- ity. Namely, although focused on personality testing, the discussion below is relevant to measures of other commonly assessed attributes classified under the Uniform Guidelines (1978) as “traits or constructs” (Section 14.C.1). Similarly, while we
Fisher et al. 235
address the Uniform Guidelines—which some argue are outdated (e.g., Jeanneret & Zedeck, 2010) and further limited by their applicability to employee selection in the United States—we believe the value of this discussion extends far beyond these guide- lines. It is important to carefully consider appropriate validation strategies in all cir- cumstances where psychological tests are used. Hence, the discussion presented herein is likely to be of relevance for content-based validation efforts in other areas beyond employee selection in the United States (e.g., educational testing, clinical practice, international employee selection efforts).
Following a brief overview of validity and content-based validation, our investiga- tion is organized around three fundamental questions. Question 1 asks whether current standards and guidelines support the use of content validity evidence for validation of personality test inferences in an employee selection context. Based on the concerns raised above, a preliminary answer to this question is that it is unclear. Question 2 then asks about the underlying bases of the inconsistency. Building on the identified causes of disagreement, Question 3 asks how one might actually gather evidence based on test content for personality measures. Ultimately, our goal in this effort is to reduce ambigu- ity and promote clarity regarding content-based validation of personality measures.
Overview of Validity and Evidence Based on Test Content
Broadly speaking, validity in measurement refers to how well an assessment device measures what it is supposed to (Schmitt, 2006). The focus of measurement is typi- cally described as a construct (Joint Standards, 1999/2014), which represents a latent attribute on which individuals can vary (e.g., cognitive ability, diligence, interpersonal skill, knowledge, the capacity to complete a given task). Importantly, a person’s level or relative standing with regard to the construct of interest is inferred from the test scores (SIOP Principles, 2003). As such, the notion of validity addresses the simple yet fundamental issue of whether test scores actually reflect the attribute or construct that the test is intended to measure. However, this succinct characterization of validity also belies the true complexity of this topic (Furr & Bacharach, 2014). Two particular com- plexities bear discussion in light of our current aims.
First, contemporary thinking holds that validity is not the property of a test per se, but rather of the inferences made from test scores (Binning & Barrett, 1989; Furr & Bacharach, 2014; Joint Standards, 1999/2014; Landy, 1986; SIOP Principles, 2003). The value of this approach can be seen when the same test is used for two different purposes—for example, when an interpersonal skills test developed for the selection of sales personnel is used for hiring both sales representatives and accountants. Notably, the test itself does not change, but the inferences made from the test scores regarding the job performance potential of the applicants may be more or less valid given the focal job in question. In accord with this perspective, the Joint Standards (1999/2014) describe validity as “the degree to which evidence and theory support the interpretations of test scores for proposed uses of the test” (p. 11). Inherent in this view is the idea that validity is difficult to fully assess without a clear explication of the
236 Public Personnel Management 50(2)
intended interpretation of scores and corresponding purpose of testing. Thus, substan- tiating relevant inferences in terms of the intended purpose of the test is of primary concern in the contemporary view of validity.
Second, validity has come to be understood as a unitary concept, as compared with the dated notion of distinct types of validity (Binning & Barrett, 1989; Furr & Bacharach, 2014; Joint Standards, 1999/2014; Landy, 1986; SIOP Principles, 2003). The older trinitarian view (Guion, 1980) posits three different types of validity, includ- ing criterion-related, content, and construct validity, each relevant for different test applications (Lawshe, 1985). By contrast, the more recent unitarian perspective (Landy, 1986) emphasizes that all measurement attempts are ultimately about assess- ing a target construct, and validation entails the collection of evidence to support the argument that test scores actually reflect the construct (and that the construct is rele- vant to the intended use of the test). Consistent with this latter perspective, the Joint Standards (1999/2014) espouse a unitary view of validity and identify five sources of validity evidence, including evidence based on test content, response processes, inter- nal structure, relations to other variables, and consequences of testing. In summary, the contemporary view of validity suggests that measurement efforts ultimately implicate constructs, and different sources of evidence can be marshaled to substantiate the validity of inferences based on test scores.
Drawing on the above discussion, evidence based on test content represents one of several potential sources of evidence for validity judgments. The collection of content- based evidence has become well-established as an important and viable validation strategy, as can be seen in the common discussion and endorsement of content validity in the academic literature (e.g., Aguinis et al., 2001; Binning & Barrett, 1989; Furr & Bacharach, 2014; Haynes et al., 1995; Landy, 1986) as well as in legal, professional, and technical standards or guidelines (e.g., Joint Standards, 1999/2014; SIOP Principles, 2003; Uniform Guidelines, 1978). The specific manner in which evidence based on test content can substantiate the validity of test score inferences is via an informed and judicious examination of the match between the content of an assess- ment tool (e.g., test instructions, item wording, response format) and the target con- struct in light of the assessment purpose (Haynes et al., 1995). For the sake of simplicity and ease of exposition, throughout this article, we use various terms interchangeably to represent the concept of evidence based on test content, such as content validity evidence, content validation strategy, content-based strategy, or simply content valid- ity. However, each reference to this concept is intended to reflect contemporary think- ing regarding validity as described above—specifically, content validity evidence is not a separate “type” of validity but rather, a category of evidence that can be used to substantiate the validity of inferences regarding test scores.
Do Current Standards Support Content Validity for Personality?
Having introduced the concepts of validity and evidence based on test content, we now turn to our primary purpose of discussing whether a content validation strategy should
Fisher et al. 237
be used as a means of validating personality test inferences for employee selection purposes. In doing so, a preliminary question becomes whether current standards and guidelines support this practice. The following four sources/areas are considered: (a) the Uniform Guidelines (1978), (b) the SIOP Principles (2003), (c) the Joint Standards (1999/2014), and (c) a general review of relevant academic literature. A summary of information derived from these sources is shown in Table 1.
The Uniform Guidelines (1978)
The Uniform Guidelines (1978) are federally endorsed standards pertaining to employee selection procedures, which were jointly developed by the Equal Employment Opportunity Commission, the Civil Service Commission, the Department of Labor, and the Department of Justice in the United States. Regarding content vali- dation, the guidelines state that,
Evidence of the validity of a test or other selection procedure by a content validity study should consist of data showing that the content of the selection procedure is representative of important aspects of performance on the job for which the candidates are to be evaluated. (Section 5.B)
The guidelines go on to describe specific technical standards and requirements for content validity studies. For example, a content validity study should include a review of information about the job under consideration (Section 14.A; Section 14.C.2). Furthermore, when the selection procedure focuses on work tasks or behaviors, it must be shown that the selection procedure includes a representative sample of on-the-job behaviors or work products (Section 14.C.1; Section 14.C.4). Conversely, under cer- tain circumstances, the guidelines also permit content validation where the selection procedure focuses on worker requirements or attributes, including knowledge, skills, or abilities (KSAs). In such cases, beyond showing that the selection procedure reflects a representative sample of the implicated KSA, it must additionally be documented that the KSA is needed to perform important work tasks (Section 14.C.1; Section 14.C.4), and the KSA must be operationally defined in terms of observable work behaviors (Section 14.C.4).
The above notwithstanding, the Uniform Guidelines (1978) explicitly prohibit con- tent validity for tests focusing on traits or constructs, including personality (Section 14.C.1). The logic underlying this restriction appears to be based on the seemingly reasonable notion that content-based validation becomes increasingly difficult as the focus of the selection test is farther removed from actual work behaviors (Section 14.C.4; Landy, 1986; Lawshe, 1985). This logic was confirmed in a subsequent “Questions and Answers” document, where it is stated that,
The Guidelines emphasize the importance of a close approximation between the content of the selection procedure and the observable behaviors or products of the job, so as to minimize the inferential leap between performance on the selection procedure and job performance [emphasis added]. (See http://www.uniformguidelines.com/questionandanswers.html)
238 Public Personnel Management 50(2)
Table 1. Review of Various Sources Regarding Content Validity and Personality Testing.
Source Description of content validity Position on personality measures
Uniform Guidelines (1978)
“Evidence of the validity of a test or other selection procedure by a content validity study should consist of data showing that the content of the selection procedure is representative of important aspects of performance on the job for which the candidates are to be evaluated” (Section 5.B)
Explicit prohibition related to the use of content validity for tests that focus on traits or constructs, such as personality: “. . . a content strategy is not appropriate for demonstrating the validity of selection procedures which purport to measure traits or constructs, such as intelligence, aptitude, personality, commonsense, judgment, leadership, and spatial ability” (Section 14.C.1)
SIOP Principles (2003)
“Evidence for validity based on content typically consists of a demonstration of a strong linkage between the content of the selection procedure and important work behaviors, activities, worker requirements, or outcomes on the job” (p. 21)
Approval of content validity approach for personality measures can be inferred from [1] the absence of an explicit prohibition against the use of content validity evidence for tests that focus on traits or constructs and [2] the stated scope of applicability for content-based evidence, which includes tests that focus on knowledge, skills, abilities, and other personal characteristics
Joint Standards (1999/2014)
“Important validity evidence can be obtained from an analysis of the relationship between the content of a test and the construct it is intended to measure” (p. 14)
Approval of content validity approach for personality measures can be inferred from [1] the absence of an explicit prohibition against the use of content validity evidence for tests that focus on traits or constructs, [2] the explicit description of content validity as pertaining to “the relationship between the content of a test and the construct it is intended to measure” (p. 14), and [3] the broad definition of the term construct (see p. 217), which makes it clear that personality variables would fall under the definition of a construct
General review of academic literature
Most, if not all, descriptions of content validity found in the literature embody the core notion of documenting the linkage between the content of a test and a particular domain that represents the target of measurement and/or purpose of testing (Haynes et al., 1995)
The sources that specifically discuss this issue collectively indicate mixed opinions; while some authors have expressed reticence toward the use of content-based evidence for measures of personality (e.g., Goldstein et al., 1993; Lawshe, 1985; Wollack, 1976); others consider this restriction to be problematic (e.g., Landy, 1986; McDaniel et al., 2011) or view content validity as particularly relevant to personality testing (e.g., Murphy et al., 2009; O’Neill et al., 2009)
Note. SIOP = Society for Industrial and Organizational Psychology.
Fisher et al. 239
Interestingly, in an apparent application of this logic, the guidelines permit content validation for selection procedures focusing on KSAs (as noted in the preceding para- graph). In such cases, the inferential leap necessary to link KSAs to job performance is ostensibly greater than if the selection procedures were to focus directly on work behaviors, which explains why the guidelines include additional requirements related to the content validation of tests focusing on these worker attributes (see Sections 14.C.1 and 14.C.4). Presumably, these additional requirements serve to bridge the larger inferential leap made when the test does not directly focus on work behaviors. Thus, the Uniform Guidelines do not limit the use of content validity to actual samples of work behavior, but additional evidence is needed to help bridge the larger inferen- tial leap made when selection tests target worker attributes (i.e., KSAs)—yet this same reasoning is not extended to what the guidelines characterize as traits or constructs.
The SIOP Principles (2003)
The SIOP Principles (2003) embody the formal pronouncements of the Society for Industrial and Organizational Psychology pertaining to appropriate validation and use of employee selection procedures. For content validation, the principles state that, “Evidence for validity based on content typically consists of a demonstration of a strong linkage between the content of the selection procedure and important work behaviors, activities, worker requirements, or outcomes on the job” (p. 21). Like the Uniform Guidelines (1978), the SIOP Principles stress the importance of capturing a representative sample of the target of measurement and further establishing a close correspondence between the selection procedure and the work domain. The principles also acknowledge that content validity evidence can be either “logical or empirical” (p. 6), highlighting the role of job analysis and expert judgment in generating content- based evidence. However, unlike the Uniform Guidelines, the SIOP Principles do not make a substantive distinction between work tasks/behaviors and worker require- ments/attributes in relation to content-based evidence but rather, collectively, consider selection procedures that focus on “work behaviors, activities, and/or worker KSAOs” (p. 21). Importantly, the addition of “O” to the KSA acronym represents “other per- sonal characteristics,” which are generally understood to include “interests, prefer- ences, temperament, and personality characteristics [emphasis added]” (Brannick et al., 2007, p. 62). Accordingly, although not explicitly stated, the use of content validity evidence as a mean of validating personality test inferences for employee selection purposes appears to be consistent with the SIOP Principles.
The Joint Standards (1999/2014)
The Joint Standards (1999/2014) are a set of guidelines for test development and valida- tion in the areas of psychological and educational testing, which were developed by a joint committee including representatives from the American Educational Research Association, the American Psychological Association, and the National Council of Measurement in Education. According to the standards, content validity is examined by
240 Public Personnel Management 50(2)
specifying the content domain to be measured and then conducting “logical or empirical analyses of the adequacy with which the test content represents the content domain and of the relevance of the content domain to the proposed interpretation of test scores” (p. 14). In other words, content validity is described as pertaining to “the relationship between the content of a test and the construct it is intended to measure” (p. 14), where “construct” is defined as “The concept or characteristic that a test is designed to measure” (p. 217). Because personality traits are easily understood as constructs, the Joint Standards suggest that personality test inferences may be subject to content-based validation.
Academic Literature
It is also informative to examine the academic literature regarding validation and per- sonality testing. In doing so, several general observations can be made. First, most if not all definitions of content validity share the core notion of documenting the linkage between the content of a test and a particular domain that represents the target of mea- surement and/or purpose of testing (e.g., Aguinis et al., 2001; Goldstein et al., 1993; Haynes et al., 1995; Sireci, 1998). Second, as noted previously, prominent writings on personality testing in the workplace (e.g., Morgeson et al., 2007b; O’Neill et al., 2013; Ones et al., 2007; Rothstein & Goffin, 2006; Tett & Christiansen, 2007) have tended to ignore the applicability of content validation to personality measures. Third, the sources that do specifically address this issue present mixed opinions. While some have expressed reticence about content-based evidence for measures of personality (e.g., Goldstein et al., 1993; Lawshe, 1985; Wollack, 1976), others consider this restriction to be problematic (e.g., Landy, 1986; McDaniel et al., 2011) or view content validity as particularly relevant to personality testing (e.g., Murphy et al., 2009; O’Neill et al., 2009). Thus, as with the technical standards and guidelines discussed above, those turning to the academic literature for guidance might similarly come away uncertain regarding the use of content validity evidence to support personality measures in an employee selection context.
What Are the Bases of Inconsistency?
This section attempts to identify the conceptual issues that form the bases for disagree- ment/misunderstanding regarding the use of content validity evidence for personality measures. Making these underlying matters explicit will help to identify some com- mon ground and the potential for a way forward. Based on the review of documents and literature above, the primary areas to be addressed include (a) vestiges of the trini- tarian view of validity, (b) the focus of the content match, and (c) a clear understanding of the inferences to be substantiated.
Vestiges of the Trinitarian View of Validity
Although it is now well-established that validity should be characterized in a manner consistent with the contemporary perspective described above (Binning & Barrett,
Fisher et al. 241
1989; Joint Standards, 1999/2014; Landy, 1986; SIOP Principles, 2003), the outdated trinitarian view continues to exert substantial influence. Perhaps the most prominent example of this can be found in the Uniform Guidelines (1978), which clearly reflects a trinitarian view of validity yet remains an important document that holds consider- able weight in contemporary employee selection practice (McDaniel et al., 2011). There are at least two important concerns related to this residual influence of the trini- tarian perspective.
First, the trinitarian view of validity suggests that constructs represent a separate category of measurement that is somehow distinct from other types of measurement efforts. This can most readily be seen in the simple fact that there is a separate label for construct validity, as compared with other categories of validity. As a result, the deter- mination of which “type” of validity to focus on rests on whether or not a construct—as opposed to some other type of attribute—is the target of measurement (Landy, 1986). This same logic is embodied in the Uniform Guidelines (1978), where it is indicated that certain validation strategies are appropriate when measuring traits or constructs (e.g., construct validity studies), whereas other strategies are not (e.g., content validity studies). Importantly, this perspective is in direct opposition to contemporary thinking regarding validity, which suggests that all measurement efforts ultimately implicate constructs (Joint Standards, 1999/2014). This notion is illuminated in a telling example provided by Landy (1986), where he contrasts a hypothetical typing ability test with a measure of reasoning ability. Of particular relevance is the idea that the typing ability test more readily implicates observable behaviors and, thus, could be subject to content validation according to the Uniform Guidelines. Conversely, the reasoning ability test is more easily described as trait- or construct-focused, in turn, precluding the use of content validation according to the Uniform Guidelines. Landy’s point was that both of these tests actually focus on constructs (i.e., typing ability, reasoning ability), neither of which is directly observable. Rather, in both cases, one must infer the level of the con- struct possessed by an individual via the administration of a test.
The above example is intended to highlight that all variables measured by psycho- logical tests can be characterized as constructs. As such, the notion that some variables are constructs while others are not is problematic at best and also in direct opposition to prevailing conceptions of test validation. However, this is not meant to imply that all constructs are the same. In the example above, it is clear that the typing ability construct might more easily be linked to job performance, as contrasted with the rea- soning ability construct, given that typing ability has a more direct and obvious behav- ioral manifestation (i.e., typing). Conversely, one might say that a greater inferential leap is required when validating a measure of reasoning ability, as reasoning ability is relatively farther removed from actual job behavior (although not irreconcilably far removed). This critical distinction will be discussed further in the sections to follow. For now, the important conclusion is that all psychological variables that might be measured in an employee selection context, including personality, are no more nor less constructs than any other.
A second concern related to the residual influence of the trinitarian perspective is that there appears to be a de facto preference given to criterion-related validity
242 Public Personnel Management 50(2)
evidence when considering employee selection procedures (Binning & Barrett, 1989). Nevertheless, it is critical to acknowledge that there are various circumstances where criterion-related validity evidence is far from optimal. It has been suggested that the minimum sample size for a criterion-related study be no fewer than 250 study partici- pants (Biddle, 2011). Yet, McDaniel et al. (2011) assert that most employers simply do not have enough employees or applicants to conduct such a study. The outcome of a criterion-related study is also highly contingent on the quality of the criterion measure. In this regard, it is useful to note that the appropriate development and validation of criterion measures is often given far less attention than the predictor side of the equa- tion (Binning & Barrett, 1989). Furthermore, in the context of a high-stakes testing situation, conducting a criterion-related study might represent a test security risk, as sensitive test content will be presented to study participants who might subsequently share confidential test information. None of the above is meant to suggest that crite- rion-related validity evidence is always bad. However, it is similarly important to real- ize that there are various situations in which evidence based on test content might well be the preferred approach toward validation.
Taken together, once you debunk the notion that personality measures somehow represent a different category of construct measurement than other “non-construct” variables—and further acknowledge that criterion-related validity evidence (although extremely useful in many situations) should not be de facto treated as the strategy of choice—then the idea of gathering content validity evidence for personality measures becomes much more relevant. In the words of Binning and Barrett (1989), “One could reasonably argue that content-related and construct-related evidence, when based on sound professional judgment about appropriate test use, are often superior to criterion- related evidence” (p. 484).
Focus of the Content Match
Another issue that may be fueling misunderstanding regarding the use of content validity evidence has to do with the focus of the content match. This issue becomes apparent when one carefully examines the various definitions for content validity dis- cussed previously and shown in Table 1. For example, the Uniform Guidelines (1978) indicate that content validity is applicable when it can be shown that “the content of the selection procedure is representative of important aspects of performance on the job” (Section 5.B). This suggests that the primary focus of content match should be between the content of the selection procedure and representative elements of job performance, such as work tasks or behaviors. In contrast, the Joint Standards (1999/2014) indicate that content validity is applicable whenever it can be shown that there is an overlap between the content of a test and the construct that is the focus of measurement, which for employee selection is often some worker attribute or require- ment. Hence, there appears to be a duality of focus when it comes to content validity evidence, where some would argue such evidence is derived from documenting over- lap with the job performance domain of application (e.g., specific tasks), whereas others would argue that content-based evidence is based on content overlap with the construct domain the test is intended to measure (e.g., some personal attribute).
Fisher et al. 243
Clarity regarding this issue can be found by considering the distinction between work samples versus signs (Binning & Barrett, 1989; Wernimont & Campbell, 1968). In the context of employee selection procedures, samples refer to those assessments that directly implicate or elicit behaviors relevant to performance on the job, such as the typing test from Landy’s (1986) example discussed above, where the test elicits behavior (i.e., typing) that can be seen as interchangeable with relevant on-the-job behavior. In contrast, employee selection measures characterized as signs refer to those assessments that do not directly target behaviors from the performance domain, but nonetheless attempt to assess attributes or capabilities that are thought to be rele- vant for job performance. An illustration of this would be the reasoning test from Landy’s example, where the actual behaviors elicited by the assessment (e.g., reading logic problems, completing multiple choice questions) may be less obviously relevant to primary job functions, but the test nevertheless measures an attribute that is undoubt- edly critical for effective performance in many occupations (i.e., the capacity for effective reasoning).
In light of this distinction, employee selection measures that fall on the work-sam- ple end of the spectrum are likely to focus on work tasks or behaviors, while those that fall on the sign end of the spectrum are likely to focus on worker attributes and require- ments. By extension, it appears that the Uniform Guidelines (1978) primarily permit the use of content validity evidence in relation to work samples that target important tasks and behaviors, whereas the SIOP Principles (2003) and Joint Standards (1999/2014) would additionally permit the use of content validity evidence for sign- based measures that focus on job-relevant personal capacities and worker require- ments. Importantly, although both work samples and signs can be used as predictors for employee selection, these two meaningfully differ in terms of whether the behav- iors implicated and/or elicited by the test are isomorphic with, or functionally similar to, the performance construct domain (Binning & Barrett, 1989; Binning & LeBreton, 2009). More specifically, work samples tend to implicate or elicit behaviors that exhibit a high degree of isomorphism with performance behaviors, whereas this tends to be less true (although not necessarily untrue) for sign-based measures.
The above discussion helps to clarify the divergent perspectives concerning the focus of the content match. First, work samples tend to be constructed with the intention of sampling from the performance domain, as exhibited in the increased isomorphism with performance behaviors (Binning & Barrett, 1989). As such, the appropriate focus of content validity in the case of work-sample-based measures should be the degree of match between the content of the measure and the job performance domain (Binning & LeBreton, 2009). Indeed, this appears to be the central logic espoused in the Uniform Guidelines (1978), which primarily permit the use of content validity evidence in rela- tion to work samples that target important tasks and behaviors. Second, sign-based mea- sures tend to be constructed with the intention of sampling from a separate construct domain that is technically distinct from—yet conceptually relevant to—the job perfor- mance domain (Binning & Barrett, 1989). Therefore, the appropriate focus of content validity in the case of sign-based measures should be the degree of match between the content of the selection measure and whatever distinct construct domain represents that
244 Public Personnel Management 50(2)
target of measurement (Binning & LeBreton, 2009). The logic behind this latter asser- tion appears consistent with the SIOP Principles (2003) and Joint Standards (1999/2014), which additionally permit the use of content validity evidence for sign-based measures that focus on job-relevant worker requirements and attributes.
Understanding the Inferences to Be Substantiated
As described previously, the collection of validity evidence is ultimately aimed at substantiating the inferences made from test scores. However, a careful consideration of the validation process indicates that there are several potential inferences that might be of relevance, especially when the intended purpose of testing is taken into account (Binning & Barrett, 1989). For example, one might aim to substantiate the inference that test scores accurately reflect varying levels of the underlying construct being mea- sured. While this is, indeed, a crucial inference with regard to validation, it does not necessarily capture the intended use of the test. As such, it is additionally relevant to substantiate the inference that test scores (and corresponding levels of the target con- struct) have relevance with regard to the purpose of testing. In the case of employee selection tests, this purpose typically informs inferences about job performance. This, in turn, suggests the importance of understanding the performance domain, which may additionally implicate other inferences, such as substantiating the degree to which operational measures of job performance accurately reflect an underlying performance construct. Given these various potential inferences, it becomes important to clearly understand the specific inferences that must be addressed as part of the validation process. Toward this end, a better understanding of such inferences can be achieved by visually depicting the relevant inferences that are necessary for linking the test in question to the construct it is intended to measure, in addition to the purpose of testing. This can be seen in Figure 1a, which is based on the seminal work of Binning and Barrett (1989; also see Arthur & Villado, 2008; Binning & LeBreton, 2009; Guion, 2004; Joint Standards, 1999/2014).
The framework depicted in Figure 1a allows one to clearly discuss the specific infer- ences involved in various approaches to validation. First and foremost, given that the ultimate purpose of most (if not all) employee selection measures is to understand job performance, it has been argued that the inference linking the predictor measure to the job performance construct domain represents the most critical inference in the employee selection context (Binning & Barrett, 1989). This inference is depicted in Figure 1a as Inference 1. To the extent that the predictor measure is work-sample-based, and, thus, exhibits a high degree of isomorphism with the job performance domain, this primary link can be directly substantiated by documenting the degree of overlap (or content match) between the predictor assessment and job performance (Binning & LeBreton, 2009). This is consistent with what the Uniform Guidelines (1978) describe as a content validity study for a selection measure that focuses on work tasks or behaviors, and is further consistent with what the SIOP Principles (2003) and Joint Standards (1999/2014) would consider evidence for validity based on test content. However, to the degree that the selection measure in question is sign-based, and, thus, departs from isomorphism
Fisher et al. 245
Figure 1. Inferences in the validation process: (a) common framework for depicting interferences in the validation process, (b) modified framework for depicting interferences in the validation process.
246 Public Personnel Management 50(2)
with job performance, then the direct substantiation of Inference 1 becomes more tenu- ous, in turn, requiring indirect substantiation via the pairing of additional inferences— which nonetheless represents an appropriate means of validation (Binning & Barrett, 1989; Binning & LeBreton, 2009; Joint Standards, 1999/2014).
A second viable approach for validation would be to collectively substantiate Inference 2 and Inference 3, as shown in Figure 1(a). Inference 2 represents the degree to which the operational predictor measure reflects the underlying construct it is pur- ported to measure, while Inference 3 represents the degree to which the underlying predictor construct is relevant to the job performance domain. Using the nomenclature of the traditional trinitarian view of validity, this approach might be labeled as con- struct validity (Binning & Barrett, 1989; Joint Standards, 1999/2014), given that refer- ence is made to underlying constructs. However, in light of the fact that contemporary conceptualizations of validity eschew the notion of a distinct form of “construct valid- ity,” Binning and LeBreton (2009) argue that this approach is better characterized as content-based evidence. Specifically, although the indirect approach discussed here is distinct from the direct substantiation of Inference 1 described above, both approaches rely heavily on comparing predictor content to some underlying construct domain. The difference lies in the fact that the substantiation of Inference 1 compares the pre- dictor content with the job performance construct domain, while the substantiation of Inference 2 compares the predictor content with the underlying predictor construct domain. Furthermore, to account for the less direct route of substantiation in the latter approach, additional evidence is required to bridge the larger inferential leap to the job performance domain, which is reflected in Inference 3. Importantly, the collective examination of Inference 2 and Inference 3 represents an appropriate means for deriv- ing content validity evidence for sign-based selection measures that exhibit less iso- morphism with the job performance domain (Binning & LeBreton, 2009). Indeed, this logic is explicitly included in the Joint Standards (1999/2014; see pp. 172–173) and also implicitly described in the Uniform Guidelines, where it is stated that,
For any selection procedure measuring a knowledge, skill, or ability [i.e., sign-based measure] the user should show that (a) the selection procedure measures and is a representative sample of that knowledge, skill, or ability [i.e., Inference 2]; and (b) that knowledge, skill, or ability is used in and is a necessary prerequisite to performance of critical or important work behavior(s) [i.e., Inference 3]. (Section 14.C.4)
Another approach to validation suggested by Figure 1a would be to collectively substantiate Inference 4 and Inference 5. Here, Inference 4 represents the empirical relationship between the predictor measure and a job performance/criterion measure, while Inference 5 represents the degree to which the performance/criterion measure reflects the underlying performance construct domain it is intended to capture. This approach is analogous to what the Uniform Guidelines (1978) would characterize as a criterion-related validity study, and is further consistent with what the SIOP Principles (2003) and Joint Standards (1999/2014) would describe as evidence for validity based on relations to other variables. In practice, however, criterion-related validity studies
Fisher et al. 247
often focus primarily and/or exclusively on Inference 4, at the exclusion of Inference 5. Unfortunately, this means that validation efforts of this nature are typically com- pleted with only cursory reference to underlying theory or consideration for the impli- cated construct domains. This again highlights the fact that criterion-related validation should not necessarily or always be seen as the most optimal strategy, especially from the perspective of generating a theory-based and scientifically grounded (as opposed to purely empirical) understanding of tests and measures used for employee selection. Conversely, when both Inference 4 and Inference 5 are given due consideration, one can see that this approach mirrors the approach pertaining to Inferences 2 and 3, sug- gesting two different but comparably informative approaches to understanding the relevance of the focal predictor measure to the underlying job performance domain.
Returning to the topic of content validity, the above discussion suggests two poten- tially viable approaches for examining content-based evidence pertaining to an employee selection test. First, to the degree that the measure is work-sample-based, and, thus, exhibits a high degree of isomorphism with the job performance domain, then the focus should be on Inference 1. This can be referred to as validity evidence based on test content for work-sample-based measures. Second, to the extent that the measure is sign-based, and, thus, departs from isomorphism with job performance, then the focus should be collectively on Inference 2 and Inference 3. This can be referred to as validity evidence based on test content for sign-based measures. Figure 1b presents a modified framework that accounts for these divergent approaches to validation. Importantly, this framework is consistent with a contemporary conceptual- ization of validity that places primary emphasis on inferences in the validation process and also explicitly acknowledges the purpose of testing. At the same time, this frame- work also explicitly represents the degree of inferential leap necessary for linking the predictor measure with the performance construct domain—an issue that is of para- mount importance in the Uniform Guidelines (1978). Specifically, as sign-based mea- sures exhibit lower isomorphism with the job performance domain as compared with work-sample-based measures, additional evidence is needed to bridge the larger infer- ential leap, which is manifested in the requirement to substantiate two inferences (e.g., Inferences 2 and 3), as opposed to just one (i.e., Inference 1). As aptly summarized by Binning and LeBreton (2009), “content validation involves either (a) directly match- ing predictor content to criterion CDs [construct domains] or (b) matching predictor content to psychological CDs which are in turn related to criterion CDs (i.e., delineat- ing psychological traits believed to influence job behavior)” (p. 489).
How to Gather Content Evidence for Personality Measures?
Building on the preceding sections, our view is that evidence based on test content can and should be used as a means of validating personality test inferences for employee selection purposes. This practice is consistent with contemporary conceptualizations of validity, as embodied in the SIOP Principles (2003) and Joint Standards (1999/2014), which both place primary emphasis on inferences in the validation process and further
248 Public Personnel Management 50(2)
view content-based evidence as one of several viable approaches to validating such inferences. Furthermore, the explicit prohibition against this practice in the Uniform Guidelines (1978) appears to be largely based on outdated notions of validity and the fallacious idea that tests focusing on constructs somehow represent a different category of measurement than other “non-construct” variables. Thus, content validity evidence should be treated as an appropriate means of validating personality test inferences.
Accordingly, we can now make informed recommendations regarding how best to collect content validity evidence for personality measures. There are at least two pri- mary conduits through which one might maximize the content-relevance of personal- ity tests and generate appropriate content-based evidence, including (a) maximizing isomorphism with the job performance domain during initial test development, and (b) substantiating the appropriate inferences via expert judgment after test development, but prior to operational use.
Maximizing Isomorphism During Test Development
One way to increase the content-relevance of personality measures used for employee selection is to draw directly from the job performance domain while developing the test. The difference between work-sample-based measures (that sample primarily from the job performance domain) and sign-based measures (that sample primarily from a distinct yet related construct domain) is a matter of degree, as opposed to a strict categorical distinction. In other words, psychological tests can move along this continuum by sampling predominantly from one domain or the other, in addition to a combination of both (Spengler et al., 2009). Of relevance, sampling from the perfor- mance domain has the effect of increasing the degree of isomorphism between the test content and job performance, in turn, moving the measure toward the work-sample end of the spectrum.
By way of illustration, traditional measures of personality such as those that might be found in the International Personality Item Pool (http://ipip.ori.org/; also see Goldberg et al., 2006) sample primarily from the personality construct domain of interest. As such, an example item for the trait of extraversion might include “I am the life of the party.” Conversely, personality measures specifically designed to be rele- vant in the workplace (e.g., Ellingson et al., 2013) sample both from the personality construct domain as well as the intended domain of application (i.e., work). Here, an example item for extraversion would be “I involve my coworker in what I am doing.” Notably, the personality statement in the latter example is more obviously and directly applicable to the job performance domain. Related to this, there is a growing body of literature that supports the practice of contextualizing personality scale content to spe- cifically reference the domain of work (e.g., Hunthausen et al., 2003; Lievens et al., 2008; Schmit et al., 1995; Shaffer & Postlethwaite, 2012; also see Ones & Viswesvaran, 2001). While this research on contextualization does not primarily focus on the issue of content validity, the practice of modifying personality scale content to explicitly reference the work context nonetheless has the consequence of extending content- relevance to the job performance domain. Indeed, it has been suggested that the use of
Fisher et al. 249
custom-developed personality tests based on work-contextualization “is extremely valuable because it may open up content validation as a potential validation strategy” (Morgeson et al., 2007a, p. 1043).
In summary, the content-relevance of personality measures used for employee selection can be improved by sampling both from the personality construct domain and the job performance domain. This practice is ultimately manifested in the creation and use of personality scale items that directly reference and/or implicate the perfor- mance domain in question. Practically, this can be accomplished by first identifying critical features of performance via job analysis efforts and subsequently creating personality-based items that explicitly reflect the identified performance elements. If this is done, the personality construct of interest will ultimately be operationalized in terms of relevant behaviors and experiences that collectively comprise important aspects of performance on-the-job. Consequently, the more the personality items directly implicate or reference job behaviors (e.g., “I show up to work on time”)—in addition to cognitive and affective performance-related experiences (e.g., “I get ner- vous when I talk to clients”)—the lower the inferential leap necessary for linking the content of the test directly to the job performance domain.
Substantiation of Appropriate Inferences via Expert Judgment
The considerations discussed in the preceding section are only applicable if a new test is being created or if the opportunity to modify an existing general-focused personality test is available. The topic of the current section is on generating content-based evi- dence for existing or unmodifiable personality measures to be used for employee selection. For this, the primary mechanism of generating such evidence involves elicit- ing expert judgment regarding content-relevance. More specifically, when the person- ality measure in question exhibits a high degree of isomorphism with the performance domain of interest (e.g., contextualized personality scales), content validity evidence can be generated by eliciting expert judgment regarding the direct overlap (or content match) between the personality measure content and the job performance domain (i.e., Inference 1 from Figure 1). However, to the extent that this is not the case, as with noncontextualized or general-focused personality measures, content-based validation would proceed via expert judgment regarding Inference 2 and Inference 3. Example scales that might be used to substantiate these various inferences via expert judgment are shown in Figure 2. These scales can be used to assist in the collection of content- based validity evidence.
The first approach involves substantiating the direct link between the predictor measure and the job performance construct domain (i.e., Inference 1). In other words, expert judges are asked to indicate the degree to which the specific items that comprise the test are directly relevant to performance on-the-job. A common method for quanti- fying such ratings is the content validity ratio (Lawshe, 1975), which—as originally conceptualized—asks subject-matter experts (SMEs) whether a particular skill or knowledge area measured by test items is “Essential,” “Useful but not essential,” or “Not necessary” for performance of the job in question. Although the content validity
250 Public Personnel Management 50(2)
ratio was originally intended for tests that focus on knowledge or skills, with slight modifications, it can be applied to measures of personality as well. Examples of this can be seen in Figure 2. Other scales may also be created to serve the same purpose as long as they adequately capture the degree to which the test content is relevant to the job performance domain of interest.
Figure 2. Example scales for substantiating relevant inferences related to content validation.
Fisher et al. 251
The second approach involves substantiating the link between the test in question and the underlying construct that it is purported to measures (i.e., Inference 2), in addi- tion to the link between the predictor construct and the underlying job performance domain (i.e., Inference 3). Aguinis et al.’s (2001) discussion of the content validation ratio suggests that it can also be modified to serve the purpose of substantiating Inference 2, as was the case with Inference 1 above. As noted by Aguinis et al., expert judges can be asked to “rate whether each item is essential, useful but not essential, or not necessary for measuring the attribute [emphasis added]” (p. 38). Thus, in accor- dance with the above discussion, the primary distinction between the methods for substantiating Inference 1 and Inference 2 is that the former focuses on the job perfor- mance domain whereas the latter focuses on the predictor construct domain. Again, other scales may be created to serve the same purpose as long as they adequately capture the degree to which the test content is reflective of the underlying predictor construct (see Figure 2).
In terms of Inference 3, Arthur and Villado (2008) indicate that this inference “is established via job analysis processes that are intended to identify the predictor con- structs deemed requisite for the successful performance of the specified job (or perfor- mance) behaviors in question” (p. 436). In this regard, personality-oriented job analysis efforts can be used to identify and substantiate the relevance of particular traits for the job under investigation (O’Neill et al., 2013; Raymark et al., 1997; Tett & Burnett, 2003; Tett & Christiansen, 2007). For example, as shown in Figure 2, Raymark et al. (1997) adopted the following stem for their Personality-Related Position Requirements Form: “Effective performance in this position requires the person to . . .” (p. 724). Taken together, expert judgment regarding the above inferences constitutes evidence for validity based on test content (Binning & LeBreton, 2009).
Discussion
There are conflicting recommendations regarding the use of content validity evidence to support personality test inferences for employee selection. Unfortunately, inconsis- tencies of this nature may be the inevitable result of various different constituencies (e.g., legal, professional, scientific) jointly vying to determine standards and guide- lines within the collective enterprise of test use and validation (see Binning & Barrett, 1989; Landy, 1986; McDaniel et al., 2011). As such, it may be an unrealistic ideal to achieve a perfect resolution that fully addresses any and all potential inconsistencies. Nonetheless, it is our hope that the discussion presented above has ameliorated the ambiguity to some meaningful degree. In particular, we believe that the recommenda- tions above concerning content validity evidence and personality testing are—to the greatest extent possible—consistent with the spirit and intention of prevailing stan- dards and guidelines, even if not with certain technical proscriptions (Uniform Guidelines, 1978). That being said, there remain a few caveats to be discussed below.
First, aside from the explicit prohibition against content validity for traits and con- structs, another requirement in the Uniform Guidelines (1978) is that tests validated via a content-based strategy should focus on attributes that are operationally defined
252 Public Personnel Management 50(2)
in terms of observable job behaviors (Section 14.C.4). This is potentially problematic for measures of personality, as personality constructs implicate not only observable behavioral manifestations (e.g., “I show up to work on time”), but also various cogni- tive and affective experiences (e.g., “I get nervous when I talk to clients”). As a poten- tial solution, when considering the cognitive and affective experiences implicated by a particular construct, it is helpful to think about the work-relevant behavioral conse- quences of such experiences (as informed by job analysis efforts), and correspond- ingly develop items to reflect such behaviors. Despite this potential solution, we view the strict requirement to operationalize all measures that are subject to content valida- tion in terms of observable behaviors as problematic. Specifically, in an information- based economy, many critical features of job performance may not be directly or obviously observable. As a result, measures that are limited to observable behaviors may suffer from construct deficiency to the extent that non-observable experiences represent important aspects of the construct domain. Furthermore, the recommenda- tions above provide viable paths for content validation regardless of whether the test in question exhibits complete isomorphism with performance in terms of observable job behaviors. To the extent that it does, the substantiation of Inference 1 represents an appropriate focus. Conversely, to the extent that it does not, the collective substantia- tion of Inference 2 and Inference 3 becomes an appropriate alternative (Binning & LeBreton, 2009). Regardless, those concerned about strict adherence to the Uniform Guidelines (1978) should consider limiting the personality items used to those that directly reference observable behavior.
Second, considering the critical attention given to the Uniform Guidelines (1978), it might come across as though our goal is to somehow vilify these guidelines. This is certainly not the case. The guidelines were created with admirable intentions regard- ing standardization of validation efforts and the prevention of discrimination in employment practices. These are extremely important concerns, and we support efforts that strive to accomplish these noble ideals. At the same time, it is important that efforts of this nature are concordant with contemporary scientific understanding per- taining to the subject matter. Unfortunately, as described by McDaniel et al. (2011), the Uniform Guidelines have “not been revised in over 3 decades [and as a result are] substantially inconsistent with scientific knowledge and professional guidelines and practice” (p. 494). As such, our intention is not to disparage these guidelines, but rather to ensure that validation practices are consistent with contemporary conceptualiza- tions of validity. We also believe that the value of the above discussion extends far beyond the Uniform Guidelines and employee selection in the United States. As noted in the introduction, it is important to carefully consider appropriate validation strate- gies so that accurate inferences are made about all test-takers, regardless of when, where, or why testing and validation efforts occur. Accordingly, the discussion pre- sented herein is likely to be of relevance for content-based validation efforts in all areas that utilize psychological testing, including, for example, educational testing, clinical practice, and international employee selection efforts.
Third, not everyone may agree with our choice of terminology in relation to the two paths for content-based validation described above. Specifically, we opted to label the
Fisher et al. 253
substantiation of Inference 1 as validity evidence based on test content for work-sam- ple-based measures and the collective substantiation of Inferences 2 and 3 as validity evidence based on test content for sign-based measures (see Figure 1). In considering our choice for terminology, we explored three possible avenues. First, one possibility was to adopt terminology that is consistent with the historical trinitarian view of valid- ity, which places primary emphasis on three “types” of validity—content, criterion, and construct. From this perspective, Inference 1 might be labeled as content validity and Inferences 2 and 3 as construct validity. However, as is clearly outlined above, this trini- tarian view is considered problematic and inconsistent with contemporary thinking regarding validation. Second, another possibility is to create a new label to replace construct validity for Inferences 2 and 3, given that a primary criticism of the historical trinitarian view is the notion of a separate category of construct validity. While we understand the appeal of this second approach, we also believe that creating a new label might have the unintended consequence of adding further confusion to the literature, especially considering that there is already much potential for confusion in the arena of validation terminology. This brings us to our third and favored approach, which is to adopt terminology that is consistent with a contemporary perspective on validity, wherein validity is a unitary concept with various sources of supporting evidence. From this perspective, the labeling of different inferential pathways should be based on a thoughtful consideration of which form of validity evidence best aligns with the valida- tion activities implicated by the paths of interest. Regarding validity evidence based on test content, this broadly refers to an analysis of whether the content of a measure ade- quately reflects (or samples from) a relevant underlying construct domain, which is precisely what is involved in both Inference 1 and the combination of Inferences 2/3.
Finally, and perhaps most critically, we are by no means arguing that a content- based strategy should necessarily be the preferred or only method of validation. The important point is that content-based evidence is not inherently more or less appropri- ate than other sources of validity evidence. Rather, evidence based on test content represents one of several potential sources (Joint Standards, 1999/2014), and the par- ticular circumstances of the validation effort should be carefully considered before determining which source(s) of evidence are most appropriate. Validity generaliza- tion, validity transport, and synthetic validity also represent viable options. Furthermore, validation efforts need not (and should not) be limited to just one form of evidence. For example, content-based evidence can be combined with evidence pertaining to an empirical relationship with a criterion measure, which, in turn, would result in a stronger validity argument. Or, to the extent that faking or response distor- tion is a concern (Morgeson et al., 2007b; Ones et al., 2007), evidence pertaining to response processes might be gathered as well. In a similar vein, although we discuss two potential paths for content validation (i.e., Inference 1 vs. Inferences 2 and 3; see Figure 1), this does not preclude efforts at validating all three inferences, which again would result in a stronger validity argument. Ultimately, “the process of validation involves accumulating relevant evidence to provide a sound scientific basis for the proposed score interpretations” (Joint Standards, 1999/2014, p. 11). To the extent feasible, the more evidence the better.
254 Public Personnel Management 50(2)
Authors’ Note
An earlier version of this manuscript was presented as a poster session at the 31st Annual Conference of the Society for Industrial and Organizational Psychology in Anaheim, California.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was, in part, supported by a Faculty Development Summer Fellowship Grant awarded by The University of Tulsa to the first author.
ORCID iD
David M. Fisher https://orcid.org/0000-0002-7810-3494
References
Aguinis, H., Henle, C. A., & Ostroff, C. (2001). Measurement in work and organizational psy- chology. In N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran (Eds.), Handbook of industrial, work and organizational psychology (Vol. 1, pp. 27–50). SAGE.
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for Educational and Psychological Testing. (Original work published 1999)
Arthur, W., Jr., & Villado, A. J. (2008). The importance of distinguishing between constructs and methods when comparing predictors in personnel selection research and practice. Journal of Applied Psychology, 93, 435–442. https://doi.org/10.1037/0021-9010.93.2.435
Biddle, D. A. (2011). Adverse impact and test validation: A practitioner’s handbook (3rd ed.). Infinity Publishing.
Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases. Journal of Applied Psychology, 74, 478–494. https:// doi.org/10.1037/0021-9010.74.3.478
Binning, J. F., & LeBreton, J. M. (2009). Coherent conceptualization is useful for many things, and understanding validity is one of them. Industrial and Organizational Psychology, 2, 486–492. https://doi.org/10.1111/j.1754-9434.2009.01178.x
Brannick, M. T., Levine, E. L., & Morgeson, F. P. (2007). Job and work analysis: Methods, research, and applications for human resource management (2nd ed.). SAGE.
Ellingson, J. E., Heggestad, E. D., & Myers, H. (2013). The workplace IPIP: A contextualized measure of personality [Unpublished manuscript].
Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43, 38290–38315.
Furr, R. M., & Bacharach, V. R. (2014). Psychometrics: An introduction (2nd ed.). SAGE. Goldberg, L. R., Johnson, J. A., Eber, H. W., Hogan, R., Ashton, M. C., Cloninger, C. R., &
Gough, H. C. (2006). The international personality item pool and the future of public- domain personality measures. Journal of Research in Personality, 40, 84–96. https://doi. org/10.1016/j.jrp.2005.08.007
Fisher et al. 255
Goldstein, I. L., Zedeck, S., & Schneider, B. (1993). An exploration of the job analysis–content validity process. In N. Schmitt & W. C. Borman (Eds.), Personnel selection in organiza- tions (pp. 3–34). Jossey-Bass.
Guion, R. M. (1980). On trinitarian doctrines of validity. Professional Psychology, 11, 385– 398. https://doi.org/10.1037/0735-7028.11.3.385
Guion, R. M. (2004). Validity and reliability. In S. G. Rogelberg (Ed.), Handbook of research methods in industrial and organizational psychology (pp. 57–76). Blackwell Publishing.
Haynes, S. N., Richard, D., & Kubany, E. S. (1995). Content validity in psychological assess- ment: A functional approach to concepts and methods. Psychological Assessment, 7, 238– 247. https://doi.org/10.1037/1040-3590.7.3.238
Hsu, C. (2004). The testing of America. U.S. News and World Report, 137, 68–69. Hunthausen, J. M., Truxillo, D. M., Bauer, T. N., & Hammer, L. B. (2003). A field study of
frame-of-reference effects on personality test validity. Journal of Applied Psychology, 88, 545–551. https://doi.org/10.1037/0021-9010.88.3.545
Jeanneret, P. R., & Zedeck, S. (2010). Professional guidelines/standards. In J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection (pp. 593–625). Routledge.
Landy, E. J. (1986). Stamp collecting versus science: Validation as hypothesis testing. American Psychologist, 41, 1183–1192. https://doi.org/10.1037/0003-066X.41.11.1183
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28, 563–575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
Lawshe, C. H. (1985). Inferences from personnel tests and their validity. Journal of Applied Psychology, 70, 237–238. https://doi.org/10.1037/0021-9010.70.1.237
Lievens, F., De Corte, W., & Schollaert, E. (2008). A closer look at the frame-of-reference effect in personality scale scores and validity. Journal of Applied Psychology, 93(2), 268– 279. https://doi.org/10.1037/0021-9010.93.2.268
McDaniel, M. A., Kepes, S., & Banks, G. C. (2011). The uniform guidelines are a detriment to the field of personnel selection. Industrial and Organizational Psychology, 4, 494–514. https://doi.org/10.1111/j.1754-9434.2011.01382.x
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007a). Are we getting fooled again? Coming to terms with limitations in the use of personality tests for personnel selection. Personnel Psychology, 60, 1029–1047. https://doi. org/10.1111/j.1744-6570.2007.00100.x
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007b). Reconsidering the use of personality tests in personnel selection contexts. Personnel Psychology, 60, 683–729. https://doi.org/10.1111/j.1744-6570.2007.00089.x
Murphy, K. R., Dzieweczynski, J. L., & Zhang, Y. (2009). Positive manifold limits the rel- evance of content-matching strategies for validating selection test batteries. Journal of Applied Psychology, 94, 1018–1031. https://doi.org/10.1037/a0014075
O’Neill, T. A., Goffin, R. D., & Rothstein, M. (2013). Personality and the need for personality- oriented work analysis. In N. D. Christiansen & R. P. Tett (Eds.), Handbook of personality at work (pp. 226–252). Routledge.
O’Neill, T. A., Goffin, R. D., & Tett, R. P. (2009). Content validation is fundamental for opti- mizing the criterion validity of personality tests. Industrial and Organizational Psychology, 2, 509–513. https://doi.org/10.1111/j.1754-9434.2009.01184.x
Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. A. (2007). In support of personality assessment in organizational settings. Personnel Psychology, 60, 995–1027. https://doi. org/10.1111/j.1744-6570.2007.00099.x
256 Public Personnel Management 50(2)
Ones, D. S., & Viswesvaran, C. (2001). Integrity tests and other criterion-focused occupational personality scales (COPS) used in personnel selection. International Journal of Selection and Assessment, 9, 31–39. https://doi.org/10.1111/1468-2389.00161
Raymark, P. H., Schmit, M. J., & Guion, R. M. (1997). Identifying potentially useful person- ality constructs for employee selection. Personnel Psychology, 50, 723–736. https://doi. org/10.1111/j.1744-6570.1997.tb00712.x
Rothstein, M. G., & Goffin, R. D. (2006). The use of personality measures in personnel selec- tion: What does current research support? Human Resource Management Review, 16, 155– 180. https://doi.org/10.1016/j.hrmr.2006.03.004
Schmit, M. J., & Ryan, A. M. (2013). Legal issues in personality testing. In N. D. Christiansen & R. P. Tett (Eds.), Handbook of personality at work (pp. 525–542). Routledge.
Schmit, M. J., Ryan, A. M., Stierwalt, S. L., & Powell, A. B. (1995). Frame-of-reference effects on personality scale scores and criterion-related validity. Journal of Applied Psychology, 80, 607–620. https://doi.org/10.1037/0021-9010.80.5.607
Schmitt, M. (2006). Conceptual, theoretical, and historical foundations of multimethod assess- ment. In M. Eid & E. Diener (Eds.), Handbook of multimethod measurement in psychology (pp. 9–25). American Psychological Association.
Shaffer, J. A., & Postlethwaite, B. W. (2012). A matter of context: A meta-analytic investiga- tion of the relative validity of contextualized and noncontextualized personality measures. Personnel Psychology, 65, 445–494. https://doi.org/10.1111/j.1744-6570.2012.01250.x
Sireci, S. G. (1998). The construct of content validity. Social Indicators Research, 45, 83–117. https://doi.org/10.1023/A:1006985528729
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of personnel selection procedures (4th ed.).
Spengler, M., Gelléri, P., & Schuler, H. (2009). The construct behind content validity: New approaches to a better understanding. Industrial and Organizational Psychology, 2, 504– 508. https://doi.org/10.1111/j.1754-9434.2009.01183.x
Tan, J. A. (2009). Babies, bathwater, and validity: Content validity is useful in the validation process. Industrial and Organizational Psychology, 2, 514–516. https://doi.org/10.1111/ j.1754-9434.2009.01185.x
Tett, R. P., & Burnett, D. D. (2003). A personality trait-based interactionist model of job per- formance. Journal of Applied Psychology, 88, 500–517. https://doi.org/10.1037/0021- 9010.88.3.500
Tett, R. P., & Christiansen, N. D. (2007). Personality tests at the crossroads: A response to Morgeson, Campion, Dipboye, Hollenbeck, Murphy, and Schmitt (2007). Personnel Psychology, 60, 967–993. https://doi.org/10.1111/j.1744-6570.2007.00098.x
Tett, R. P., Simonet, D. V., Walser, B., & Brown, C. (2013). Trait activation theory: Applications, developments, and implications for person–workplace fit. In N. D. Christiansen & R. P. Tett (Eds.), Handbook of personality at work (pp. 71–100). Routledge.
Thornton, G. C., III. (2009). Evidence of content matching is evidence of validity. Industrial and Organizational Psychology, 2, 469–474. https://doi.org/10.1111/j.1754- 9434.2009.01175.x
Wernimont, P. F., & Campbell, J. P. (1968). Signs, samples and criteria. Journal of Applied Psychology, 52, 372–376. https://doi.org/10.1037/h0026244
Wollack, S. (1976). Content validity: Its legal and psychometric basis. Public Personnel Management, 5, 397–408.
Fisher et al. 257
Author Biographies
David M. Fisher is an assistant professor of psychology at The University of Tulsa. Prior to his academic position, he did consulting work that focused on selection and testing for public safety agencies. His research interests include employee selection, organizational work teams, and occupational health/resilience.
Christopher R. Milane is a senior project manager of research services at Qualtrics. The majority of this manuscript was written while he was a graduate student at The University of Tulsa. His research interests include employee selection, organizational work teams, and leader- ship development.
Sarah Sullivan is the department coordinator at the Doerr Institute for New Leaders at Rice University. The majority of this manuscript was written while she was a graduate student at The University of Tulsa. Her research interests include leadership development, employee selection, and organizational work teams.
Robert P. Tett is professor of Industrial-Organizational (I-O) Psychology and director of the I-O Graduate Program at The University of Tulsa where he teaches courses in personnel selec- tion, psychometrics, statistics, personality at work, and evolutionary psychology. His research targets personality trait-situation interactions, meta-analysis, leadership competencies, and trait- emotional intelligence.
Copyright of Public Personnel Management is the property of Sage Publications Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.