presentation
Journal of Occupational and Organizational Psychology (2015), 88, 387–414
© 2014 The British Psychological Society
www.wileyonlinelibrary.com
Does rater personality matter? A meta-analysis of rater Big Five–performance rating relationships
Michael B. Harari1*, Cort W. Rudolph2 and Andrew J. Laginess3
1Florida Atlantic University, Boca Raton, Florida, USA 2Saint Louis University, Saint Louis, Missouri, USA 3Florida International University, Miami, Florida, USA
Weexamined rater personality traits consistent with the Five-FactorModel as sources of
systematic non-performance variance in job performance ratings using meta-analysis
(k = 28). Several personality factors, including agreeableness, extraversion, and emo- tional stability, were related to performance ratings (q = .25, .12, and .12, respectively), and features of the rating context (e.g., study setting, appraisal purpose, accountability)
moderated these relationships. Cumulatively, the Big Five accounted for between 6% and
22% of the variance in performance ratings. Implications for performance appraisal
research and practice are discussed.
Practitioner points
� Performance ratings serve a number of important functions in organizations, and their construct validity is a central issue.
� We identified rater personality traits, consistent with the Five-Factor Model, as sources of non- performance variance in performance ratings.
� To the extent that job performance ratings are contaminated by rater personality traits, requiring raters to justify their ratings may result in criterion scores that reflect greater levels of criterion
relevance.
Performance ratings serve central functions in organizations, with implications for
performance management and employee development (den Hartog, Boselie, & Paauwe, 2004; Smither, London, & Reilly, 2005), administrative decision-making (Cleveland,
Murphy, & Williams, 1989), and other human resource functions (e.g., strategic,
informational, maintenance, and documentation, see Aguinis, 2013, p. 18). Considering
the importance of performance ratings, their construct validity has been a crucial issue
and a large body of research on the subject exists (Austin & Villanova, 1992; Levy &
Williams, 2004; Murphy & Cleveland, 1995). While it is desirable for ratings of job
performance to be saturated by the ratee’s job-relevant behaviours, research suggests that
other sources contribute variance to performance ratings (Murphy, 2008). To identify and account for performance irrelevant sources of variance, research has focused on the role
of the organization’s social context (Levy & Williams, 2004). At present, this research
*Correspondence should be addressed toMichael B. Harari, Department ofManagement, Florida Atlantic University, 777Glades Rd., Boca Raton, FL 33431, USA (email: [email protected]). An earlier version of this paper was presented at the 2014 Society for Industrial and Organizational Psychology Conference Honolulu, HI.
DOI:10.1111/joop.12086
387
indicates that features of the evaluation context can influence performance rating
processes and outcomes (Ferris, Munyon, Basik, & Buckley, 2008). By examining
performance evaluation in situ, this research has provided useful avenues for
understanding why performance ratings may be inaccurate and means by which the quality of ratings can be improved.
While features of the rating context have been highlighted as determinants of
performance ratings in several systematic reviews (e.g., Jawahar &Williams, 1997; Levy &
Williams, 2004), much less research has emphasized how rater personality traits impact
performance assessments. Personality traits reflect one’s propensities to behave in
characteristic ways in response to situational demands (e.g., Mischel & Shoda, 1995;
Roberts, 2009), and research suggests that personality traits guide behaviours in contexts
that are relevant for trait expression (Tett & Guterman, 2000). We argue that the situational demands posed by the performance rating context provide ample opportunity
for raters to behave in trait-relevantways (Ferris et al., 2008; Tett & Burnett, 2003; Tziner,
Murphy, & Cleveland, 2005), and as a result, rater traits will influence performance rating
processes and outcomes. This issue bears on the validity of performance ratings, as rater
personality traits may act as a source of performance irrelevant variance. Thus, research
into rater personality–performance rating relationships can have important implications for performance evaluation in organizations.
A number of researchers and theorists have speculated that rater traits play a role in the performance appraisal process. For instance, Landy and Farr (1980) proposed a model of
job performance ratings where rater individual differences influenced rater performance
appraisal strategies, which influenced performance ratings. Kane, Bernardin, Villanova,
and Peyrefitte (1995) suggested that performance ratings could ‘reflect personality or
information-processing differences among raters’ (p. 1047). In a review article, Tziner
et al. (2005) noted that personality traits ‘likely play a part in shaping rating behavior’ (p.
94). Furthermore, a number of empirical investigations along these lines have taken place
(e.g., Bernardin & Orban, 1990; Fried, Levi, Ben-David, Tiegs, & Avital, 2000; Randall & Sharples, 2012; Yun, Donahue, Dudley, & McFarland, 2005). However, despite the
research conducted thus far, there are several issues in this literature that preclude the
ability to draw meaningful conclusions.
One issue is that primary studies have examined the effects of a wide variety of rater
personality traits on performance ratings with little coherency across studies. For
instance, research has examined the effects of rater hostility (Phillips, 1960), need for
achievement (Kovacs & Kapel, 1976), cognitive complexity (Bernardin & Orban, 1990;
Schneider, 1977), self-esteem (Guven, 2007; Wexley & Youtz, 1985), and ego concern (Chambers, 2003), among others. Thus, this literature has proceeded in a fragmented
fashion and has lacked a systematic framework for organizing personality traits.
Another issue is that there are many inconsistent findings in this literature. For
instance, some research indicates a positive relationship between conscientiousness and
performance ratings (Roch, Ayman, Newhouse, & Harris, 2005), while other research
indicates a negative relationship (Bernardin, Cooke, & Villanova, 2000; Bernardin, Tyler,
& Villanova, 2009), and still other research indicates virtually no relationship (Tziner,
Murphy, &Cleveland, 2002). The same state of affairs is evident for other personality traits as well (e.g., extraversion; Bernardin et al., 2000, 2009; Strauss, Barrick, & Connerley,
2001).
Compounding this issue is that features of the rating context vary across studies. It is
well recognized that personality traits influence one’s characteristic manner of respond-
ing to certain situations, rather than a tendency to behave similarly across situations
388 Michael B. Harari et al.
(Mischel & Shoda, 1995; Roberts, 2009; Stewart & Barrick, 2004; Tett &Guterman, 2000).
Thus, features of the rating context should influence the effect of rater personality on
performance ratings and variation in contextual features likely contributes to the
inconsistent findings that are evident in this literature. As such, contextual features should be examined as moderators of the rater personality–performance rating relationships.
The purpose of this study is to address these issues by conducting a quantitative
literature review. Meta-analytic methods allow us to assess the extent to which mixed
findings are due to sampling error as opposed to the presence of moderators (Hunter &
Schmidt, 2004). Furthermore, our analytic approach allows us to test the effects of
theoretically relevant features of the rating context asmoderators of the rater personality– performance rating relationships. Therefore, our approach will not only clarify the
existing literature, but may also result in a number of novel conclusions concerning the conditional influences of rater personality traits on performance ratings.
To classify the personality traits examined in prior studies consistent with a well-
accepted taxonomy, we draw upon the Five-Factor Model of personality (cf. Costa &
McCrae, 1992; Digman, 1990). This approach has been useful for quantifying general-
izable relationships between personality traits and workplace outcomes (e.g., Barrick &
Mount, 1991; Judge & Ilies, 2002). Note that we are not suggesting that traits that fall
outside of the Five-Factor Model are not useful for understanding rating behaviour.
Indeed, individual differences such as political skill and emotional intelligence may very well play a role as determinants of rating behaviour and outcomes (cf. Ferris et al., 2008).
However, the Five-Factor Model is a useful typology for organizing the existing
fragmented literature.
In the following paragraphs, we discuss the social context features of organizations
that are relevant to performance evaluation. Following this, we review the Five-Factor
Model of personality, discuss the relevance of these traits as determinants of performance
ratings in the light of the social context of performance appraisal, and form hypotheses
concerning rater personality–performance rating relationships. Finally, we discuss potential moderators of these relationships.
The social context of performance ratings
Research has recognized that the construct validity of performance ratings might be
problematic, and several approaches to dealing with this issue have subsequently
emerged (Murphy, 2008). Research first focused on measurement issues, suggesting that
the construct validity of performance ratings could be improved by, for example, developing better rating scales (cf. Austin & Villanova, 1992). However, this line of
research had limited success in improving performance ratings (Landy & Farr, 1980).
Following this, research began to focus on the cognitive processes involved in
performance appraisal, noting that raters were imperfect information processors and
that this could lead to errors and biases in the rating process and thus poor construct
validity of performance ratings (Feldman, 1981). However, this approach also had only
limited success in improving performance ratings in organizations (Ilgen, Barnes-Farrell,
& McKellin, 1993). In response to the limited utility of these approaches, research began to shift focus
towards the performance evaluation context. That is, raters were perceived as actors in a
rich and complex organizational context that could influence their motivations such that
accuracy may be only a subsidiary goal (Murphy, 2008; Murphy & Cleveland, 1995).
By recognizing the rich context in which performance ratings occur, research has
Does rater personality matter? 389
developed a more clear understanding of the non-performance factors that influence
performance ratings in organizations (Ferris et al., 2008; Levy & Williams, 2004).
Murphy and Cleveland (1995) brought light to the role of the organizational
context in performance evaluation by noting that the rating context influences performance judgments, performance ratings, and the evaluation process. Murphy and
Cleveland’s model distinguished between distal variables (i.e., those that influence
performance evaluation indirectly) and proximal variables (i.e., those that influence
performance evaluation directly). Distal variables include, for example, the organiza-
tional structure, the legal climate, and workforce composition. Proximal variables
include, for example, the rater–ratee relationship, consequences of performance ratings, and rater accountability.
Levy andWilliams (2004) conducted a cumulative review of the literature into the role of the organizational social context on performance ratings. In doing so, they expanded
upon Murphy and Cleveland’s (1995) typology to build a framework for summarizing the
existing literature. Similar to Murphy and Cleveland’s framework, Levy and Williams
distinguished between distal and proximal contextual determinants of performance
rating behaviour. However, the research also distinguished between two types of
proximal variables – process proximal variables (i.e., how the appraisal process is conducted) and structural proximal variables (i.e., formal design characteristics). Levy
and Williams’ review highlighted evidence suggesting that process and structural proximal variables influenced performance evaluation processes and rating behaviours.
More recently, Ferris et al. (2008) provided an integrative framework for understand-
ing the contextual backdrop in organizations as it pertains to performance evaluation.
Specifically, Ferris et al.’s reviewnoted that the context inwhich performance evaluation
occurs encompasses ‘cognitive, social and relationship, affective and emotional, and
political and relationship context features’ (p. 150). Rather than acting in isolation, each of
these context features interacts with one another continuously and dynamically in
shaping the evaluation context in an organization. Cognitive context refers to the rater’s cognitive processes involved in observing, encoding, storing, retrieving, and integrating
observations of a ratee’s performance. Social and relationship context concerns features
of the rater–ratee dyadic work relationship. Social influence and politics concerns the influence of deliberate manipulation by raters and ratees on performance ratings. Finally,
affect and emotion concerns the role of affective regard for a ratee (i.e., liking) on
performance ratings. Ferris et al. reviewed evidence suggesting that each of these
contextual features influenced performance ratings in organizations (see also Fletcher,
2001; Levy & Williams, 2004).
Personality and performance ratings
Consistent with Roberts (2009), we define personality as ‘relatively enduring patterns of
thoughts, feelings, and behaviors that reflect the tendency to respond in certain ways
under certain circumstances’ (p. 140). While personality traits themselves remain
relatively stable within person across appreciable amounts of time (Roberts &
DelVecchio, 2000), their influence on thoughts, feelings, and behaviours is shaped by environmental influences – features of the context in which the actor is embedded (Roberts & Jackson, 2008; Roberts, Lejuez, Krueger, Richards, &Hill, 2014). For example,
Tett and Guterman (2000) discussed the principle of trait activation. Trait activation
holds that ‘the behavioral expression of a trait requires arousal of that trait by trait-relevant
situational cues’ (Tett & Guterman, 2000, p. 398). Similarly, Mischel and Shoda (1995)
390 Michael B. Harari et al.
characterized personality as ‘a system of mediating processes, conscious and uncon-
scious, whose interactions are manifested in predictable patterns of situation-behavior
relations’ (p. 247).
We argue that the contextual features that serve as a backdrop to performance evaluation provide a landscape for rater personality traits to manifest themselves in the
rating process (Tett &Guterman, 2000; Tziner et al., 2005). That is, in response to certain
contextual features, rater personality traits would influence their thoughts, feelings, and
rating behaviours, and thus, rater personality traits should influence performance rating
scores (Tett & Burnett, 2003; Tett &Guterman, 2000). This in situ theoretical perspective
helps to explain the dynamic interplay between rating context and rater personality, and
the joint influence of context and personality in the performance rating process.
Considering this, rater personality traits have the potential to influence the validity of performance ratings, which bears on critical decisions that are guided by performance
rating scores. Examining and understanding these relationships can ultimately improve
the construct validity of performance ratings by spurring research into interventions that
can reduce their influence. Thus, a focus on rater personality will enrich our
understanding of the determinants of performance ratings in organizations beyond the
influence of the rating context alone (Ferris et al., 2008).
As already noted, the Five-Factor Model of personality (also referred to as the Big
Five) has been proposed as an integrative framework for studying individual differences in personality and is among the most well-accepted taxonomies of personality in the
literature (Costa & McCrae, 1992; Digman, 1990). According to this perspective, the
latent structure of personality is hierarchical with five factors at the highest level
(Goldberg, 1992; Hough & Ones, 2001). The five factors are as follows: Agreeableness,
extraversion, emotional stability, conscientiousness, and openness. We review each of
these factors in the following sections and discuss features of the rating context that are
relevant for the expression of each of the five factors in a performance evaluation setting.
Note that positive personality–performance rating relationships are interpreted as reflecting rating elevation (i.e., higher scores on the personality trait are associated with
higher, or more elevated, performance ratings), while negative personality–performance rating relationships are interpreted as reflecting rating stringency (i.e., higher scores on
the personality trait are associated with lower, or more stringent, performance ratings;
Bernardin et al., 2000).
Agreeableness
Agreeableness reflects traits such as trust, altruism, and cooperation (Costa & McCrae,
1992). Agreeableness is related to a tendency to favour positive social relationships and to
avoid conflict (Jensen-Campbell & Graziano, 2001). Considering this, we suggest that the
social and relationship context features as reviewed by Ferris et al. (2008) provide
opportunities for rater agreeableness to influence performance ratings. For example,
delivering negative feedback to an employee regarding his or her performance can have a
negative influence on the rater–ratee work relationship (Murphy & Cleveland, 1995) and raters who are high in agreeableness may be motivated to avoid disturbing this social relationship. Along these lines, research suggests that raters may respond to a motivation
to avoid negative exchanges with ratees by inflating performance rating scores (Shore &
Tashchian, 2002).
Agreeableness is also associated with a tendency to be sympathetic and concerned
for the welfare of others (Costa & McCrae, 1992). Such tendencies are relevant for
Does rater personality matter? 391
performance evaluation behaviours. Performance appraisal has both short- and long-
term consequences for employees. In terms of the former, Ferris et al. (2008) argued
that poor performance evaluations could negatively influence ratee affect, which in
turn could result in withdrawal and decreased relationship satisfaction. In terms of the later, poor performance ratings could influence an employee’s career (e.g., oppor-
tunities for promotions or salary increases; Aguinis, 2013; Cleveland et al., 1989).
Raters who are high in agreeableness may inflate performance ratings to protect ratees
from the consequences associated with poor performance evaluations. Considering
this, we propose that agreeableness is positively related to performance ratings.
Hypothesis 1: Rater agreeableness is positively related to performance ratings.
Extraversion
Extraversion reflects traits such as sociability, assertiveness, and gregariousness and is
associated with a tendency to be friendly and to prefer the company of others (Costa &
McCrae, 1992). Research suggests that extraversion is positively related to networking
behaviours in organizations as well as social network size (Forret & Dougherty, 2001).
Such tendencies are relevant when considering the social and relationship context
features of performance evaluation (Ferris et al., 2008), as raters who are high in
extraversion would be likely to form favourable dyadic work relationships with their ratees (John, Naumann, & Soto, 2008). Along these lines, research suggests that rater– ratee relationship quality is positively related to performance ratings, beyond the
influence of objective performance levels (Duarte, Goodson, & Klich, 1994). Thus, as
extraverted raters are likely to form favourable relationships with their ratees, and
favourable rater–ratee relationships are associated with elevated performance ratings, raters who are high in extraversionmay be likely to provide elevated performance ratings.
This line of reasoning suggests a positive rater extraversion–performance rating relationship.
Hypothesis 2: Rater extraversion is positively related to performance ratings.
Emotional stability
Emotional stability reflects traits such as calmness and even-temperedness (Costa &
McCrae, 1992). In discussing the proposed relationship between rater emotional
stability and performance ratings, it is useful to consider low emotional stability – neuroticism. Neuroticism encompasses traits such as anxiety and depression (John
et al., 2008) and is associated with a tendency to become easily angered and frustrated
by others (Costa & McCrae, 1992). Along these lines, research suggests that neuroticism
inhibits cooperation with others and may result in poor working relationships (George,
1990). This is relevant for performance rating behaviours, as neurotic raters may be
likely to have poor relationships with their ratees, which could result in lower
performance ratings (Duarte et al., 1994). As this perspective suggests that raters who
are low in emotional stability (i.e., high in neuroticism) should rate performance low, we propose that rater emotional stability (i.e., low neuroticism) is positively related to
performance ratings.
392 Michael B. Harari et al.
Hypothesis 3: Rater emotional stability is positively related to performance ratings.
Conscientiousness
Conscientiousness reflects traits such as dependability, thoroughness, and achievement
orientation (Costa & McCrae, 1992). Ferris et al.’s (2008) review noted that the
standards against which performance is evaluated could influence performance ratings
(see also Murphy & Cleveland, 1995). Employees who are high in conscientiousness
generally display superior job performance as compared to employees who are lower in this trait (Barrick & Mount, 1991; Hurtz & Donovan, 2000). As such, raters who are high
in conscientiousness may have relatively high standards for performance and may
therefore expect exceptional performance, which could result in lower performance
ratings. As a result, rater conscientiousness should be negatively related to performance
ratings.
Hypothesis 4: Rater conscientiousness is negatively related to performance ratings.
Openness
Openness reflects traits such as imaginative, creative, curious, and original (Costa &
McCrae, 1992). Openness is relevant for performance appraisal in organizations in the light of the cognitive context features discussed in Ferris et al. (2008). Cognitive models
of performance appraisal indicate that rating errors result from the rater’s finite
information-processing capabilities. As openness is positively associated with cognitive
functioning (DeYoung, Peterson, & Higgins, 2005), raters who are high in openness may
be better able to integrate larger numbers of performance episodes into their
performance judgments (Murphy & Cleveland, 1995). That is, raters who are high in
openness may be less prone to cognitive biases that influence performance evaluation
(Feldman, 1981). Openness is also related to a tendency to form more complex attributions for the
behaviours of others (Brookings, Zembar, & Hochstetler, 2003). In their review of the
performance rating context literature, Levy and Williams (2004) noted that ‘attributional
processing is an important element of the rating process and these attributions, in part,
determine raters’ reactions and ratings’ (p. 887). Thus, raters who are high in openness
may react to the performance rating context by deeply considering the causes and
meaning of a wide range of observed performance episodes. Considering this, rater
openness may facilitate accurate performance ratings that are neither elevated nor suppressed. Thus, there is little reason to predict a rater openness–performance rating relationship, and we therefore hypothesize the following:
Hypothesis 5: Rater openness is not related to performance ratings.
Moderators
Study setting
We have noted that the contextual landscape that characterizes the social system of
organizations provides the opportunity for personality traits to influence performance
rating behaviours (Ferris et al., 2008; Levy & Williams, 2004; Tett & Guterman, 2000).
Does rater personality matter? 393
While laboratory studies may be able to simulate some naturalistic context features of
performance appraisal, it is likely that the full spectrum of contextual features and
their dynamic interactions (cf. Ferris et al., 2008) would not be well represented. To
the extent that rater personality–performance rating relationships depend on such features, we would expect the magnitude of these effects to be larger in field studies
as compared to laboratory studies. For example, if a positive rater agreeableness– performance rating relationship occurs in response to interpersonal consequences
associated with poor performance ratings (Jensen-Campbell & Graziano, 2001;
Murphy & Cleveland, 1995), this effect may be reduced in laboratory settings where
there is no continued interaction among participants and therefore scant possibility
for long-term interpersonal consequences (see Bell, 2007 for an application of this
logic to research involving personality and team performance). As a result, we propose that study setting will moderate the rater personality–performance rating relationships such that the effects are stronger in field studies as compared to
laboratory studies.
Hypothesis 6: The rater personality–performance rating relationships are moderated by study setting such that the relationships are stronger in field settings
and weaker in laboratory settings.
Situational strength
Beyond the relevance of social context features for personality trait expression in
performance appraisal, it is also important to note that features of the situation can act to inhibit the effect of traits, regardless of the relevance of other contextual features for trait
expression (Tett & Burnett, 2003). Situational strength refers to the extent to which
demands placed on individuals by a situation act to constrain the influence of one’s
personality on their behaviour (Cooper&Withey, 2009;Mischel, 1977). A strong situation
is one in which situational demand characteristics pose pressure to behave in a certain
way, inducing conformity and reducing the influence of personality traits on behaviours
(Mischel, 1973). On the other hand, a weak situation is one in which demand
characteristics posed by the environment are low, allowing the individual to determine their own course of action in a given situation. Thus, weak situations better allow for
personality traits to influence behaviour. A number of features of a performance appraisal
context may influence situational strength, and we focus on two: Appraisal purpose and
accountability.
Appraisal purpose. Broadly speaking, research has considered two classes of purposes
for which performance ratings are collected: Administrative purposes (e.g., promotion, raises) and research/developmental purposes (e.g., feedback, training; Cleveland et al.,
1989; Levy & Williams, 2004). When rating performance for administrative purposes,
raters are making decisions that will potentially have a vast impact on the ratee’s career.
The results of the performance appraisal may be permanently included in the ratee’s
records and could influence their chances at receiving a promotion or raise and may
contribute towards their termination. Thus, when providing administrative ratings,
raters face strong situational pressures to, for example, inflate or distort performance
ratings in some way (Jawahar & Williams, 1997). Considering this, a situation in which a
394 Michael B. Harari et al.
rater is evaluating performance for administrative purposes could be characterized as
strong. In such a situation, the rater personality–performance rating relationships may be reduced. On the other hand, when ratings are collected for research or develop-
mental purposes, many of the aforementioned situational pressures are alleviated. Thus, performance appraisals conducted under research or developmental conditions
represent a weak situation, allowing rater personality traits to influence performance
ratings.
Hypothesis 7: The rater personality–performance rating relationships are moderated by appraisal purpose such that the relationships are attenuated when
ratings are collected for administrative purposes and strengthened when
ratings are collected for research or developmental purposes.
Accountability. Accountability refers to the extent to which raters are held responsible
for their ratings of an employee (Levy & Williams, 2004). For instance, when raters must
justify their ratings to others, accountability would be high (Wherry, 1952). Research
suggests that holding raters accountable for their ratings represents a strong situation that
influences performance rating outcomes (Klimoski & Inks, 1990; Mero & Motowidlo, 1995). Indeed, as raters are pressed to explain their performance ratings, they face
stronger situational pressures to rate performance accurately. As a result, high
accountability could be characterized as representing a strong situation that reduces
the influence of rater traits on performance ratings. On the other hand, when
accountability is low, raters are free from situational pressures that necessitate accurate
ratings. Therefore, we consider low accountability as reflecting a weak situation that
would allow for rater personality traits to drive rating behaviour. Thus,we suggest that the
rater personality–performance rating relationships will be attenuated when accountabil- ity is high and strengthened when accountability is low.
Hypothesis 8: The rater personality–performance rating relationships are moderated by accountability such that the relationships are attenuated when
accountability is high and strengthened when accountability is low.
In summary, the purpose of this study is to use meta-analytic methods to assess the
relationships between rater personality traits and performance ratings. In doing so, we
invoke the Five-Factor Model of personality as an organizing framework for classifying personality traits examined in the existing literature. We also examine the influence of a
number of theoretically relevant moderators on these relationships, including study
setting, appraisal purpose, and accountability.
Method
Literature search
To identify relevant studies, we first conducted a literature search using ProQuest,
PsycINFO, Google Scholar, and ABI-Inform. We searched for articles that included any
of the following: Rater individual differences or rater personality. We also searched
these databases for articles that contained various combinations of rater, rating,
individual differences, Big Five, five-factor model, personality, openness, conscien-
tiousness, extraversion, agreeableness, neuroticism, and emotional stability.
Does rater personality matter? 395
Furthermore, we identified additional studies by reviewing the reference sections of all
relevant articles returned by the literature search. This process yielded a total of 304
articles.
Criteria for inclusion
Todeterminewhich of these 304 articleswould be included in our analyses,we employed
the following decision rules. First, the dependent variable in the study had to be a rating of
another person’s performance, including performance on the job, work samples or
simulations (either in a field or laboratory setting), and vignettes describing a hypothetical
employee’s performance. Studies that reported ratings of traits, clinical assessments,
problematic behaviour, leadership style, and other non-performance variables were excluded, as were any studies that examined self-ratings of performance. Second, the
study had to include as an independent variable a personality measure that could be
classified as openness, conscientiousness, extraversion, agreeableness, or emotional
stability. Third, the study had to report a zero-order correlation between the personality
variable and mean performance ratings across ratees or enough statistical information so
that it could be computed. A total of 21 studies (and 28 independent samples) met this
inclusion criteria and were included in our analyses.
Coding scheme
Prior to coding, the first and third authors independently classified each of the personality
measures used in each study as an indicator of openness, conscientiousness, extraversion,
agreeableness, or emotional stability. Many studies included explicit measures of the Big
Five personality factors (e.g., the NEO-PI-R; Costa &McCrae, 1992), and in these cases, no
judgment calls weremade on the part of the authors.Where explicit measures of the Five-
Factor Model were not used, the authors consulted the taxons proposed by Hough and Ones (2001). TheHough andOnes taxonsweredeveloped to guidemeta-analytic research
that involves personality traits by explicating the Big Five personality factor assessed by a
variety of commercial personality scales. Since the introduction of the taxons, they have
been drawn upon in a number of meta-analyses involving personality (e.g., Dudley, Orvis,
Lebiecki, & Cortina, 2006; Foldes, Duehr, & Ones, 2008). When a measure was not
explicitly developed to assess one of the Big Five factors and was not included in the
Hough and Ones taxons, the two authors (1) reviewed the scale development papers and
scale items to determine what (if any) Big Five factor the measure was assessing and (2) searched the literature for statistical evidence that supported a classification of the
measure within the Big Five framework (e.g., factor analytic studies that included the
measure and established measures of the Big Five, correlations between the measure and
established measures of the Big Five). Agreement between the two authors was initially
94%, and the discrepancies were resolved between the two authors and the second
author. Ultimately, 100% agreement for all variables was reached.
Next, the first and third authors independently coded each study for sample size, effect
size, measurement reliability, and moderators. Measurement reliability for both the predictor and criterion measures was operationalized as internal consistency (i.e.,
coefficient alpha). Where studies reported effect sizes between multiple measures of the
same predictor and criterion, compositeswere formed according to themethods outlined
by Hunter and Schmidt (2004). If the information needed to compute composites was
unavailable, we computed composites by averaging the effect sizes. We computed
396 Michael B. Harari et al.
composite reliabilities using the Spearman–Brown formula or, where the needed information was not available, by averaging the coefficient alpha reliability estimates
reported for each measure.
In terms of the moderators, performance appraisal purpose and accountability were coded only if the primary study was explicit about these features. Agreement
between the two authors was initially 97%, and all discrepancies were resolved between
the two authors and the second author.1
Analyses
We conducted the meta-analysis according to the procedures outlined by Hunter and
Schmidt (2004). This method allowed us to estimate construct-level relationships by correcting observed relationships for statistical artefacts, specifically sampling error and
measurement reliability. First, we calculated the sample size-weighted mean correlation
for performance ratings with each of the personality traits. We then estimated the
population correlation (i.e., q) by correcting each mean correlation for measurement reliability in both the predictor and criterion using artefact distributions, as reliability
information was only sporadically available. Direct range restriction was not evident in
any studies, and therefore, we made no such corrections. Finally, we estimated 95%
confidence intervals (using the methods outlined by Viswesvaran, Schmidt, & Ones, 2002) and 80% credibility intervals around each estimate of q, as well as the percentage variance accounted for by statistical artefacts (i.e., sampling error and unreliability in the
predictor and criterion measures) for each estimate of q. To evaluate the presence of moderators, we employed the 75% rule (i.e., a moderator is likely to be present when the
percentage variance accounted for by statistical artefacts is <75%; Hunter & Schmidt, 2004). The same analytic procedure was repeated for each level of each moderator in the
moderator analyses.
While the moderators examined in our study were conceptually distinct, it is important to establish that they are empirically distinct aswell. Therefore,we assessed the
relationships between the moderators using the phi coefficient. The phi coefficient is
used to test the relationship between variables in a two-by-two contingency table and can
be interpreted by the same standards as a correlation coefficient. The study setting– accountability and appraisal purpose–accountability relationshipswereminimal (Φ = .22 and .18, respectively). The study setting–appraisal purpose relationship was stronger (Φ = .40), but still did not suggest that these two moderators were redundant. Thus, we concluded that themoderators examined herewere sufficiently distinct tomerit inclusion in our analyses.
An important consideration for our moderator analyses is the minimum number of
studies that must be included in any given analysis to provide informative results. For
instance, several meta-analytic reviews in the organizational sciences have required k = 3 effect sizes to proceed with analyses (e.g., Eby, Allen, Evans, Ng, & DuBois, 2008;
Viswesvaran et al., 2002). However, Valentine, Pigott, and Rothstein (2010) noted that
meta-analytic methods can be applied effectively to as few as two effect sizes and that this
approach is superior to othermeans bywhich findings from a small number of studies can be interpreted (e.g., so-called cognitive algebra whereby one tries to mentally integrate
findings across studies). Therefore, we determined that we would proceed with our
1 The complete code sheet is available upon request from the first author.
Does rater personality matter? 397
moderator analyses so long as at least two effect sizes were available (however, all of our
analyses involved at least k = 3 effect sizes). While estimating bivariate rater personality–performance rating relationships is
informative, we also used our bivariate meta-analytic estimates to assess the multiple correlation of the Five-Factor Model of personality as a set on performance ratings.
Doing so required us to build a correlation matrix involving the relationships estimated in
this study as well as the intercorrelations among the Big Five factors of personality.
Consistent with other meta-analyses (e.g., Munyon, Summers, Thompson, & Ferris, 2014;
Ones, Dilchert, Viswesvaran, & Judge, 2007), we used the estimates derived by Ones
(1993; also reported in Ones, Viswesvaran, & Reiss, 1996) for these analyses. Consistent
with recommendations (Viswesvaran & Ones, 1995), we used the harmonic mean across
cells as the sample size in our analyses. We conducted these analyses by regressing performance ratings onto the five personality factors.We repeated these analyses for each
level of each moderator.
Note that when the predictors included in a regression model are correlated, the
relative contribution of each predictor to the model R2 cannot be accurately determined
by examining the beta weights alone (LeBreton, Ployhart, & Ladd, 2004). Therefore, we
conducted relative importance analyses to determine the relative contribution of each of
the Big Five factors to the prediction of performance ratings (Tonidandel & LeBreton,
2011). This approach has recently been adopted in meta-analyses to provide a more nuanced perspective of howpredictors in a regressionmodel operate in concert with one
another in influencing the criterion (e.g., Munyon et al., 2014; O’Boyle, Humphrey,
Pollack, Hawver, & Story, 2010). We conducted these analyses using a relative weight
analysis framework (Johnson, 2000) and repeated these analyses for each level of
each moderator examined. Relative weight analysis produces two types of coefficients – relative weights and rescaled relative weights. The former reflects the proportion of
variance in the outcome (i.e., performance ratings) that is attributed to each of the
predictor variables, while the latter reflects the percentage of predicted variance that is accounted for by each predictor variable (calculated by dividing the relative weights by
the model R2; LeBreton, Hargis, Griepentrog, Oswald, & Ployhart, 2007).
Results
The bivariate meta-analytic results are reported in Table 1, the results of our multiple regression analyses are reported in Table 2, and the results of our relative weight analyses
are reported in Table 3. The results of our overall multiple regression analysis indicated
that the Big Five personality factors accounted for 8% of the variance in performance
ratings. Hypothesis 1 predicted a positive relationship between rater agreeableness and
performance ratings. The results of our bivariate analysis indicated a corrected correlation
of q = .25. Furthermore, the results of our multiple regression analysis indicated a statistically significant betaweight of .22. Finally, the results of our relativeweight analysis
indicated that agreeableness accounted for 65.9% of the variance in performance ratings explained by the Big Five personality factors. Cumulatively, these findings support
hypothesis 1.
Hypothesis 2 predicted a positive rater extraversion–performance rating relationship. The results of our bivariate analysis indicated a corrected rater extraversion–performance rating correlation of q = .12. Furthermore, the results of our multiple regression analysis indicated a statistically significant betaweight of .08, and the results of our relative weight
398 Michael B. Harari et al.
Table 1. Meta-analysis of rater personality–performance rating correlations
Moderator k N r q rq % Var CVL CVU CIL CIU
Agreeableness
Overall 12 1,899 .20 .25 .17 25.65 .03 .46 .15 .35
Study setting
Field 6 874 .26 .33 .00 100 .33 .33 .33 .33
Laboratory 6 1,025 .14 .17 .21 17.05 �.09 .44 .00 .34 Appraisal purpose
Administrative 4 378 .28 .36 .00 100 .36 .36 .36 .36
Research/developmental 4 582 .18 .22 .23 15.93 �.08 .52 �.01 .45 Accountability
Low 5 864 .22 .27 .09 50.78 .15 .39 .19 .35
High 5 792 .16 .20 .24 14.65 �.10 .50 �.01 .41 Publication status
Published 9 1,015 .22 .27 .17 30.85 .05 .49 .16 .38
Unpublished 3 884 .17 .22 .15 17.91 .02 .42 .05 .39
Extraversion
Overall Study setting 11 1,109 .10 .12 .11 56.74 �.02 .26 .05 .19 Field 5 503 .08 .10 .08 69.70 �.00 .21 .03 .17 Laboratory 6 606 .11 .14 .12 49.80 �.02 .29 .04 .24
Appraisal purpose
Administrative 4 414 �.02 �.03 .08 72.06 �.13 .07 �.11 .05 Research/developmental 5 566 .17 .21 .00 100 .21 .21 .21 .21
Accountability
Low 6 656 .17 .22 .00 100 .22 .22 .22 .22
High – – – – – – – – – – Publication status
Published 8 771 .05 .06 .08 70.20 �.04 .17 .00 .12 Unpublished 3 338 .20 .25 .00 100 .25 .25 .25 .25
Emotional stability
Overall Study setting 13 1,311 .11 .14 .08 69.83 .03 .24 .10 .18
Field 7 582 .15 .20 .12 56.99 .04 .35 .11 .29
Laboratory 6 729 .07 .09 .00 100 .09 .09 .09 .09
Appraisal purpose
Administrative 5 438 .15 .19 .00 100 .19 .19 .19 .19
Research/developmental 4 357 .22 .28 .00 100 .28 .28 .28 .28
Accountability
Low 6 502 .11 .14 .14 48.58 �.04 .32 .03 .25 High 3 467 .06 .07 .00 100 .07 .07 .07 .07
Publication status
Published 10 850 .16 .20 .08 75.00 .10 .30 .15 .25
Unpublished 3 461 .02 .03 .00 100 .03 .03 .03 .03
Conscientiousness
Overall 18 2,974 .08 .10 .17 22.99 �.12 .32 .02 .18 Study setting
Field 11 1,780 .16 .19 .09 53.36 .08 .31 .14 .24
Laboratory 7 1,194 �.03 �.04 .18 20.87 �.27 .20 �.17 .09 Appraisal purpose
Continued
Does rater personality matter? 399
analysis indicated that extraversion accounted for 12.5% of the explained variance in
performance ratings. Overall, these results supported hypothesis 2.
Hypothesis 3 predicted a positive rater emotional stability–performance rating relationship. The results of our bivariate analysis indicated a corrected correlation of
q = .14. Furthermore, the results of our regression analysis indicated a statistically significant beta weight of .08, while our relative weight analysis indicated that emotional
stability accounted for 14.2% of the explained variance in performance ratings. These
results provided support for hypothesis 3. Hypothesis 4 predicted a negative rater conscientiousness–performance rating
relationship. The results of our bivariate analysis indicated a corrected correlation of
q = .10. Thus, counter to our prediction, the effect of conscientiousness on performance ratings was positive. Therefore, hypothesis 4 was not supported. The results of our
Table 1. (Continued)
Moderator k N r q rq % Var CVL CVU CIL CIU
Administrative 3 354 �.09 �.11 .21 21.76 �.39 .16 �.35 .13 Research/developmental 8 993 .10 .12 .16 32.11 �.08 .33 .01 .23
Accountability
Low 11 1,886 .13 .16 .16 25.46 �.04 .36 .07 .25 High 4 637 �.03 �.04 .20 18.90 �.29 .22 �.24 .16
Publication status
Published 13 1,711 .07 .09 .20 22.69 �.16 .34 �.02 .20 Unpublished 5 1,263 .10 .12 .13 24.41 �.05 .29 .01 .23
Openness
Overall 9 829 �.01 �.01 .00 100 �.01 �.01 �.01 �.01 Publication Status
Published 8 643 �.01 �.01 .00 100 �.01 �.01 �.01 �.01 Unpublished – – – – – – – – – –
Note. k, number of independent correlations;N, total sample size for all studies combined; r, sample size-
weighted average observed correlation; q, sample size-weighted average-corrected correlation; rq, standard deviation of sample size-weighted average-corrected correlation; % Var, percentage of variance
accounted for by statistical artefacts; CV, 80% credibility interval; CI, 95% confidence interval.
Table 2. Results of regression analyses
Overall
Study setting Purpose Accountability
Field Laboratory Admin R/D Low High
Agreeableness .22** .28** .17** .42** .14** .21** .25**
Extraversion .08** .04* .11** �.12** .16** .19** �.18** Emotional stability .08** .11** .06** .20** .22** .04 .08
Conscientiousness .02 .08** �.11** �.23** .02 .09** �.13** Openness �.06** �.06** �.06** �.08** �.09** �.07** �.03 R2 .08 .14 .06 .22 .13 .12 .09
Note. Admin, administrative; R/D, research/developmental.
Regression coefficients are in standardized form.
*p < .05; **p < .01.
400 Michael B. Harari et al.
multiple regression analysis indicated a small and non-significant beta weight of .02
(p > .05). Furthermore, our relative weight analysis indicated that conscientiousness accounted for only 5.5% of explained variance in performance ratings. These results
suggest that, once accounting for the effect of the other Big Five factors, conscientious-
ness contributed relatively little variance to performance ratings.
Hypothesis 5 predicted that rater openness was not related to performance ratings.
Our bivariate analysis indicated a corrected correlation of q = �.01. Our analysis also indicated that this relationship was consistent across study populations, as artefacts
accounted for 100% of the variance in observed correlations. That is, our results suggested
that therewere nomoderators of the rater openness–performance rating relationship and that the population correlationwas q = �.01 across rating contexts. Thus, we concluded that the effect of rater openness on performance ratings across contexts is of little
practical value (Cohen, 1992). This notion is bolstered by the results of our relativeweight
analysis, which indicated that openness accounted for only 1.9% of explained variance in
performance ratings. Overall, this pattern of results provided support for hypothesis 5, suggesting that rater openness does not influence performance ratings.
Moderator analyses
We proceeded with moderator analyses for those relationships where our results
suggested that moderators might be present – conscientiousness, agreeableness, extraversion, and emotional stability. The results of the moderator analyses are presented
in Tables 1–3. We review these results in the following sections.
Study setting
Hypothesis 6 predicted that the effect of rater personality on performance ratings would
be stronger in field settings as compared to laboratory settings. The results of our bivariate
analyses indicated that study setting moderated the rater personality–performance rating relationship for agreeableness, conscientiousness, and emotional stability. Furthermore,
and consistent with hypothesis 6, these relationships were stronger in field settings as opposed to laboratory settings. On the other hand, the relationship between rater
extraversion and performance ratings did not vary as a function of study setting.
The results of our regression analyses indicated that the Big Five as a set accounted for 14%
of the variance in performance ratings in field studies, but only 6% of the variance in
Table 3. Results of relative weight analyses
Overall (%)
Study setting (%) Purpose (%) Accountability (%)
Field Laboratory Admin R/D Low High
Agreeableness .05 (65.9) .09 (63.7) .03 (48.8) .14 (62.9) .03 (23.4) .05 (45.3) .05 (55.4)
Extraversion .01 (12.5) .01 (3.9) .02 (27.6) .01 (3.1) .03 (25.1) .04 (33.4) .02 (28.2)
Emotional stability .01 (14.2) .02 (16.9) .01 (10.5) .04 (15.9) .06 (44.6) .01 (6.9) .01 (7.0)
Conscientiousness .00 (5.5) .02 (14.2) .01 (10.3) .04 (16.9) .01 (4.5) .02 (12.8) .01 (8.7)
Openness .00 (1.9) .00 (1.3) .00 (2.8) .00 (1.2) .00 (2.4) .00 (1.7) .00 (0.7)
R2 .08 .14 .06 .22 .13 .12 .09
Note. Admin, administrative; R/D, research/developmental.
Raw relative weights appear in cells with rescaled relative weights appearing in parentheses.
Does rater personality matter? 401
performance ratings in laboratory studies. Overall, these results suggested that features of
a naturalistic rating context (cf. Ferris et al., 2008)might be necessary for trait expression
in a performance appraisal context (Tett & Guterman, 2000) and provided support for
hypothesis 6.
Appraisal purpose
Hypothesis 7 predicted that the effect of rater personality on performance ratings would
be stronger when ratings were collected for research/developmental purposes (i.e., a
weak situation) and attenuated when ratings were collected for administrative purposes
(i.e., a strong situation). The results of our bivariate analyses provided support for this
prediction with regard to conscientiousness, extraversion, and emotional stability, as their relationships with performance ratings were stronger when ratings were collected
for research/developmental purposes as opposed to administrative purposes. On the
other hand, for rater agreeableness, the opposite was observed. That is, rater agreeable-
ness accounted for a greater amount of variance inperformance ratingswhen ratingswere
collected for administrative purposes as opposed to research/developmental purposes.
Also, the results of our multiple regression analyses indicated that personality accounted
for a greater amount of variance in ratings that were collected for administrative purposes
(R2 = .22) as compared to research or developmental purposes (R2 = .13). However, inspection of the relative weights (Table 3) indicated that this effect was predominantly
driven by agreeableness (RW = .14, rescaled-RW = 62.9%). This surprising finding could perhaps be accounted for by study setting. Specifically,
no field studies into the rater agreeableness–performance rating relationship included ratings collected for research/developmental purposes. All studies that assessed
agreeableness–research/developmental performance rating relationships were con- ducted in laboratory settings, while all but one study involving administrative ratings
were conducted in field settings. As discussed earlier with respect to hypothesis 6, the effect of rater personality on performance ratings should be lower in studies conducted in
laboratory as opposed to field settings. Therefore, this study setting–rating purpose confound observedwith respect to agreeableness could have accounted for the pattern of
results observed here. It is possible that for research/developmental ratings collected in
field settings, the effect of rater agreeablenesswould bemore pronounced. Therefore, it is
premature to interpret these effects concerning agreeableness too strongly. When
removing agreeableness fromourmultiple regressionmodel,we found that the remaining
four personality traits accounted for 7% of the variance in administrative ratings as compared to 11% of the variance in research/developmental ratings, which is consistent
with hypothesis 7. Overall, the pattern of results observed here suggested that rater
personality–performance rating relationships were more robust when ratings were collected for research/developmental purposes (i.e., a weak situation) as opposed to
administrative purposes (i.e., a strong situation), supporting hypothesis 7.
Accountability
Hypothesis 8 predicted that the effect of rater personality on performance ratings would
be strengthened when accountability was low (i.e., a weak situation) and attenuated
when accountability was high (i.e., a strong situation). Only one study reported a rater
extraversion–performance rating relationship where accountability was high. Therefore, we could not examine accountability as a moderator of the rater extraversion–
402 Michael B. Harari et al.
performance rating relationship in our bivariate analyses. However, consistent with
existing recommendations (Viswesvaran & Ones, 1995), we included the single estimate
in the correlation matrix used to conduct our multiple regression and relative weight
analyses. In terms of our bivariate analyses, for rater agreeableness, conscientiousness, and
emotional stability, the 95% confidence intervals associated with the correlations at each
level of the moderator (i.e., high vs. low) overlapped. However, in all cases, the point
estimates were lower when accountability was high as opposed to low. Furthermore, for
conscientiousness and agreeableness, we failed to detect non-zero relationships with
performance ratings when accountability was high (95% CI = �.24 to .16 for conscien- tiousness and�.01 to .41 for agreeableness). However, when accountability was low, we observed evidence suggesting that these effects were non-zero (95% CI = .07 to .25 for conscientiousness and .19–.35 for agreeableness). In terms of our multiple regression analyses, the Big Five as a set accounted for 9% of variance in performance ratings when
accountability was high and 12% of variance in performance ratings when accountability
was low. Overall, these results suggested that the effect of rater personality on
performance ratings was stronger when accountability was low and weaker when
accountability was high, providing support for hypothesis 8.
Publication bias
To assess the extent to which our findings may have been influenced by publication bias,
we compared effect size estimates derived from published and unpublished studies. If
correlations derived from published studies were significantly larger than those derived
from unpublished studies, this would provide some evidence suggesting that publication
bias may have influenced our findings. A comparison between sample size-weighted
corrected correlations and their associated 95% confidence intervals suggested limited
evidence for publication bias (see Table 1). In terms of agreeableness, conscientiousness, and extraversion, correlations fromunpublished studieswere not significantly lower than
those derived from published studies. In terms of openness, only one unpublished study
was identified. The correlation reported in this study (r = .00) differed from the sample size-weighted mean correlation derived from published studies (r = �.01) by only .01, and therefore, it does not appear that publication bias has influenced our findings with
respect to openness either. However, in terms of emotional stability, publication status
did moderate the relationship with performance ratings such that the effect was
significantly smaller in unpublished studies as compared to published studies. Therefore, publication bias may have influenced our estimate of the rater emotional stability– performance rating relationship.
Considering this, we conducted an additional test of publication bias for the rater
emotional stability–performance rating relationship using the trim-and-fill method. The trim-and-fill method is regarded as one of the most powerful tests of publication bias
available (Aguinis, Pierce, Bosco, Dalton, & Dalton, 2011). Trim and fill is a funnel plot-
based method that uses an iterative procedure to estimate the degree of asymmetry in the
funnel plot (i.e.,with effect sizes plotted on the x-axis and standard errors plotted on the y- axis) to identify the number and magnitude of missing correlations, impute those missing
values, and recalculate the sample size-weighted mean correlation (Duval & Tweedie,
2000a,b). The influence of publication bias on study findings can be ascertained by
comparing the original sample size-weightedmean correlationwith the one derived once
the missing studies have been imputed.
Does rater personality matter? 403
However, as this test is based on a funnel plot, it must be assumed that the only source
of variance influencing study findings is sampling error (Duval & Tweedie, 2000a,b). Note
that in our overall analyses involving emotional stability, artefacts accounted for nearly
70% of the variance across studies. Therefore, the trim-and-fill procedures may be appropriate in this case. We further assessed the tenability of the trim-and-fill method’s
homogeneity assumption by testing for heterogeneity using the Q statistic, Q
(12) = 20.66, n.s., which failed to detect the presence of heterogeneity. This provided additional support for the appropriateness of the trim-and-fill method as a means of
detecting publication bias. We conducted these analyses using the metafor package
(Viechtbauer, 2010) for R (R Development Core Team, 2010). Our results indicated no
funnel plot asymmetry and no missing studies and provided evidence suggesting that
publication bias did not influence our findings concerning the rater emotional stability– performance rating relationship.2 Overall, the pattern of results observed here indicated
that publication bias has not likely influenced the relationships estimated in this study.
Discussion
Considering the centrality of performance ratings for research and practice in Human Resource Management and Industrial, Work, and Organizational Psychology (Aguinis,
2013), research into the construct validity of performance ratings has been critical (Austin
&Villanova, 1992; Murphy, 2008). Research has largely considered the influence of rating
context features on performance ratings (Ferris et al., 2008; Fletcher, 2001; Levy &
Williams, 2004; Murphy & Cleveland, 1995). We argued that rater personality traits are
relevant determinants of performance ratings as well, and we found support for this
notion in our cumulative meta-analysis of this literature. Across 21 studies and 28
independent samples, we found evidence suggesting that rater personality traits consistent with the Five-Factor Model accounted for between 6% and 22% of the variance
in performance ratings. Throughout ourmanuscript, we argued that themanner inwhich
raters respond to various features of the performance evaluation context may be
determined, at least in part, by their personalities, and that this would have implications
for the construct validity of performance ratings. Our results supported this notion.
The results of our bivariate meta-analytic results as well as our multiple regression
analyses largely supported our predictions. Specifically, and consistent with our
hypotheses, rater agreeableness, extraversion, and emotional stability were all positively related to performance ratings. In terms of conscientiousness, although we predicted a
negative relationship, the results of our bivariate analyses indicated that conscientious-
ness was overall positively related to performance ratings. However, the effect of
conscientiousness in our multiple regression model was small and not statistically
significant (b = .02, n.s.). Furthermore, conscientiousness accounted for only a small portion of the explained variance in performance ratings (RW = .00, rescaled- RW = 5.5%). Thus, while we found evidence for an overall positive rater conscientious- ness–performance rating relationship in isolation, this effect seemed to be trivial when taking the other Big Five factors into account.
Ferris et al. (2008) described performance appraisal as a formal accountability system
that organizations use as a means of holding employees accountable and aligning their
performance with organizational objectives. Given their tendency to follow organiza-
2 The complete results of this analysis are available from the first author.
404 Michael B. Harari et al.
tional rules and procedures (John et al., 2008), raters who are high in conscientiousness
may be sympathetic towards the organization’s need to hold employees accountable and
may understand the role of accurate ratings in this process. Also, as conscientiousness is
indicative of a tendency to approach tasks in a meticulous and well-organized fashion (Saucier & Ostendorf, 1999), raters who are high in conscientiousness may react to the
importance of accurate performance ratings for organizational success by approaching
the rating task with great care and consideration, resulting in accurate performance
ratings. This line of reasoning could potentially account for our findings concerning
conscientiousness.
In terms of openness, we predicted a nil relationship with performance ratings. Our
results supported this prediction. The effect of openness on performance ratings was
small and not practically significant, as we estimated a q of �.01 across nine independent samples and a total sample size of N = 829. Furthermore, moderators did not act upon the rater openness–performance rating relationship. This finding was consistent with our argument that raters who are high in openness may respond to the
performance rating context by deeply considering a wide range of performance
episodes and the appropriate inferences that could be drawn about the employee’s
performance based on them.
The results of our moderator analyses were also generally consistent with our
predictions. We argued that rater personality likely played a role in performance ratings because personality would, in part, determine the rater’s characteristic responses to
features of the rating context, which would ultimately influence performance rating
behaviours (Roberts, 2009; Tett & Guterman, 2000). Based on this notion, we predicted
that the effect of rater personality on performance ratings should be stronger in field
studies as compared to laboratory studies. Our results supported this prediction. An
important implication is that while it may be possible to study various performance
appraisal processes in laboratory settings (Murphy, Herr, Lockhart, & Maguire, 1986), it
appears that to best understand the influence of rater personality on performance ratings, field studies should be preferred. However, it is also important to note that as laboratory
studies are more likely to make use of inexperienced raters than field studies, variation in
findings between laboratory and field studies could potentially be due to variation in
experience as opposed to variation in the presence of rating context features alone.
Future research may wish to examine rating experience as a moderator of the
relationships examined here.
We also noted that situational strength was an important consideration, as strong
situations act to mute the influence of personality traits on behaviour (Cooper &Withey, 2009; Mischel, 1973, 1977). We considered two characteristics of the rating context that
speak to situational strength – appraisal purpose and accountability (Levy & Williams, 2004). In terms of appraisal purpose, we proposed that the effect of rater personality on
performance ratings should be strengthened when ratings were collected for research/
developmental purposes (i.e., a weak situation) and attenuated when ratings were
collected for administrative purposes (i.e., a strong situation). Our findings were largely
consistent with our hypothesis, but our results concerning agreeableness were
contradictory. As discussed earlier in the Results section, it is possible that the contradictory results
were due to a study setting–appraisal purpose confoundwith respect to studies assessing agreeableness. However, it is also possible that our finding concerning appraisal purpose
as a moderator of the rater agreeableness–performance rating relationship was meaning- ful. For instance, the traits encompassed by agreeableness includealtruism, a tendency to
Does rater personality matter? 405
be concerned for the welfare of others (Costa & McCrae, 1992). As administrative ratings
have far-reaching implications for employees (e.g., pay and promotion decisions), raters
who are high in agreeableness may be especially likely to inflate administrative ratings to
protect employees from the negative effects of poor administrative ratings. From this perspective, rather than speaking exclusively to situational strength, conducting
administrative ratings may provide opportunities for agreeable raters to behave in trait-
relevant ways. Unfortunately, our study could not tease apart these two competing
explanations, and future research into this issue is needed.
In terms of accountability, we proposed that the effect of rater personality would be
strengthened when accountability was low (i.e., a weak situation) and attenuated when
accountability was high (i.e., a strong situation). Our results supported this prediction.
Therefore, overall, our findings did suggest that the effect of rater personality on performance ratings was attenuated in strong situations and strengthened in weak
situations.
Implications
Research into the construct validity of performance ratings has largely focused on the
influence of the rating context on performance rating processes and outcomes (Ferris
et al., 2008; Fletcher, 2001; Levy & Williams, 2004). This approach has been useful and has led to a number of important insights. However, we argued that rater personality traits
might also play a role as determinants of performance ratings, and we found evidence
supporting this assertion. Research has shifted focus to the role of the social context in
ratings because of a number of reviews that had recognized that to understand
performance appraisal, it had to be studied in the context inwhich it occurred (e.g., Bretz,
Milkovich, & Read, 1992; Ilgen et al., 1993). Similarly, as our results suggested that rater
personality influences performance evaluation in applied contexts, consideration of rater
personality is also criticalwhen attempting to study performance evaluation as it occurs in organizations (as all raters have some standing on the Big Five personality factors and this
has implications for their reactions to features of the rating context and their rating
behaviours).
Our findings are also relevant for performance appraisal practices. Specifically, once
sources of extraneous variance in performance ratings have been identified, the construct
validity of ratings can be improved by identifying conditions that reduce their influence
(Murphy, 2008). We not only identified rater personality as a source of extraneous
variance, but also examinedmoderators that influenced these relationships that may bear relevance for performance appraisal practices. Of most practical value, the relationships
between rater personality and performance ratings were generally lower when raters
were held accountable for their ratings. Agency theory notes that firm owners (i.e.,
principals) and managers (i.e., agents) may at times have conflicting goals, and therefore,
monitoring mechanisms must be in place to ensure that the manager is acting in the best
interest of the firm’s shareholders (Fama, 1980; Fama & Jensen, 1983). The presence of a
monitoring system that allows the principal to be aware of the agent’s ratings and requires
the agent to justify those ratings to the principal is one means by which raters could be held accountable for their performance ratings, reducing the influence of rater traits on
performance rating scores and improving the construct validity of performance ratings.
For example, part of a rater’s performance evaluation (one of the most ubiquitous control
mechanisms; Ferris et al., 2008) could include the rater’s ability to justify the ratings that
he or she has documented for his or her subordinates.
406 Michael B. Harari et al.
We also note that a number of additional performance appraisal practices may help to
mitigate the influence of rater personality traits on performance ratings, although not
explicitly examined here. Research has argued that involving multiple raters in the
performance appraisal process (e.g., through 360-degree performance rating systems; Smither et al., 2005) can improve the psychometric properties of performance ratings
(Viswesvaran, Ones, & Schmidt, 2008). Such an approach may also be useful in
accounting for the effects of rater personality on performance ratings, as pooling ratings
across raters would limit the effect of any single rater’s personality traits on composite
performance ratings.
Rating scale format may also influence the strength of the rater personality– performance rating relationships. For example, rating formats that do not allow raters
to inflate the ratings given to all employees, such as forced ranking distributions, may be more robust to the effect of rater personality. As an example, raters who are high in
agreeablenessmaybemotivated to inflate performance rating scores across ratees to avoid
disturbing the quality of their dyadic work relationships with their ratees (Jensen-
Campbell & Graziano, 2001; Shore & Tashchian, 2002). However, a forced ranking
distribution systemwould require the rater to discriminate between employees; thus, the
effect of rater agreeableness in this example may be reduced.
This study has implications for the personality literature as well. A major initiative in
thepersonality literature has been identifying the situations underwhichpersonality traits influence behaviour (Blickle et al., 2008; Hogan, 1991; Mischel & Shoda, 1995). Research
along these lines has long since recognized the role of situational strength (Mischel, 1973),
and many empirical investigations into the effect of situation strength on trait–behaviour relationships exist (cf. Cooper & Withey, 2009). However, the trait activation approach
has emergedmore recently.While evidencehas been steadilymounting for the role of trait
activation in trait–behaviour relationships (Tett & Guterman, 2000), more research along these lines is needed. Our study provided continued support for the trait activation
framework. Specifically, our study setting moderator analysis was consistent with trait activation, as the effect of rater personality on performance ratings was stronger in field
settings (where situation trait relevance was expected to be stronger) and weaker in
laboratory settings (where situation trait relevance was expected to be weaker).
Limitations and future directions
Several of our moderator analyses were based off of small sample sizes, with some
including as few as three independent correlations. Note that small sample sizes in meta- analyses are not problematic for interpreting the sample size-weighted mean correlation.
Indeed, pooling findings across even a small number of studies results in a more stable
estimate of the effect size as compared to interpreting the findings of any single study in
isolation (Valentine et al., 2010). However, small sample sizes inmeta-analysis are likely to
result in biased estimates of the standard deviation of correlations due to second-order
sampling error (Hunter & Schmidt, 2004). As a result, we cannot make any strong
conclusions regarding the stability (or lack thereof) of effect sizes observed in our
moderator analyses across samples. Nonetheless, the point estimates observed in our moderator analyses are readily interpretable and provide useful information to perfor-
mance appraisal scholars.
While our review includedmany studies into the rater personality–performance rating relationships, many relevant moderators were not addressed in the available sample of
studies. For instance, we considered situational strength as a moderator of rater
Does rater personality matter? 407
personality–performance rating relationships, operationalized as appraisal purpose and accountability. However, treatments or interventions can also be considered strong to the
extent that they ‘lead all persons to construe the particular events the same way, induce
uniform expectancies regarding the most appropriate response pattern, provide adequate incentives for the performance of that response pattern, and instill the skills
necessary for its satisfactory construction and execution’ (Mischel, 1973, p. 276). Thus,
interventions such as rater training may be considered strong and therefore may also be
likely to reduce rater personality–performance rating relationships. However, few studies included any form of rater training, and thus, we could not examine this as a moderator.
Future research should consider how rater training and other relevant interventionsmight
reduce the influence of rater personality traits on performance ratings.
A related issue is that in some cases, levels of the moderators examined were not represented in our analyses. For example, we identified only one study into the rater
extraversion–performance rating relationship where accountability was high. We also faced a study setting–appraisal purpose confound with respect to our analyses involving agreeableness, and therefore, the results of the appraisal purpose moderator analysis
pertaining to agreeableness are not readily interpretable. Thus, while the present review
summarized the extant literature, it also suggested the need for a number of futureprimary
studies. For instance, it would be useful for future research to explicitly test appraisal
purpose as a moderator of the rater personality–performance rating relationships to clarify the effects observed here.
We focused exclusively on the Big Five factors as opposed to their facets. While a
number of studies assessed narrow personality facets, very few assessed the same facets,
and therefore, we could not cumulate results across studies into the same narrow trait.
However, we believe that narrow personality facets likely play a role in driving the rater’s
reaction to the rating context and in shaping rating behaviour. For instance, in terms of
extraversion, the sociability facet may drive high-performance ratings as raters who are
high in sociability may be likely to form favourable work relationships with their ratees (John et al., 2008), which research suggests can positively influence performance ratings
(Duarte et al., 1994). However, the dominance facet bears relevance to different
contextual features. For instance, raters who are high in dominance may react to political
context features of performance ratings (Ferris et al., 2008) and may use performance
ratings as an opportunity to punish ratees for insubordinate behaviours (Longenecker,
Sims, & Gioia, 1987; Murphy & Cleveland, 1995), resulting in negative rater dominance– performance rating relationships under some circumstances. This line of reasoning
suggests that the effect of rater personality on performance ratings may bemore nuanced and complex when dealing with narrow personality traits (as opposed to the broader
personality factorswhichmay influencebehaviour throughbroadermechanisms; Barrick,
Stewart, & Piotrowski, 2002).
Similarly, not all traits are well represented within the Five-Factor Model and research
is needed to consider traits that fall outside of this framework yet bear theoretical
relevance when considered in relation to the rating context. For instance, Ferris et al.
(2008) noted that rater emotional intelligence and political skill may have implications for
performance rating behaviours, and we agree with this notion. Furthermore, several studies have indicated that task-specific individual differences, such as rating self-efficacy
(Bernardin & Villanova, 2005) and performance appraisal discomfort (Villanova,
Bernardin, Dahmus, & Sims, 1993), can influence performance rating behaviour. We
urge future research to continue investigating the role of these traits as predictors of
performance ratings.
408 Michael B. Harari et al.
While our focus was on the effect of rater personality traits on performance ratings, it
would be useful for research to consider the effect of rater–ratee personality similarity on performance ratings aswell. Rater–ratee personality similarity could potentially influence performance ratings through its effect on psychological distance (e.g., perceptions of similarity; Ferris et al., 2008). Furthermore, Bauer and Green (1996) found support for a
model whereby leader–follower trait affectivity congruence influenced the leader’s perceptions of follower performance through affect and perceptions of trustworthiness.
However, in a study that drew on the Five-Factor Model of personality, Strauss et al.
(2001) found that only perceived personality similarity as opposed to actual personality
similarity had a significant effect on performance ratings. Overall, only a handful of studies
have investigated the role of rater–ratee personality similarity onperformance ratings, and more research along these lines is needed.
Conclusion
While a sizeable literature is dedicated to examining the effect of rater personality on
performance ratings, few definitive answers existed, as there were many mixed findings.
Weusedmeta-analyticmethods to clarify this literature, finding evidence for generalizable
relationships between several of the Big Five personality factors and performance ratings. Consistent with the trait activation and situational strength literatures, we identified that
the relationships were more robust when (1) studies were conducted in field settings as
opposed to laboratory settings, (2) ratings were collected for research/developmental
purposes as opposed to administrative purposes, and (3) accountability was low as
opposed to high. Cumulatively, based on our comprehensive meta-analytic review of this
literature, we conclude that when it comes to accounting for variance in performance
ratings, rater personality does indeed matter.
References
References marked with an asterisk (*) were included in the meta-analysis. Aguinis, H. (2013). Performancemanagement. (3rd ed.), Upper Saddle River, NJ: Pearson Prentice
Hall.
Aguinis, H., Pierce, C. A., Bosco, F. A., Dalton, D. R., & Dalton, C. M. (2011). Debunking myths and
urban legends about meta-analysis. Organizational Research Methods, 14, 306–331. doi:10. 1177/1094428110375720
Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917–1992. Journal of Applied Psychology, 77, 836–874. doi:10.1037/0021-9010.77.6.836
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A
meta-analysis. Personnel Psychology, 44, 1–26. doi:10.1111/j.1744-6570.1991.tb00688.x Barrick, M. R., Stewart, G. L., & Piotrowski, M. (2002). Personality and job performance: Test of the
mediating effects of motivation among sales representatives. Journal of Applied Psychology, 87
(1), 43–51. doi:10.1037/0021-9010.87.1.43 Bauer, T. N., & Green, S. G. (1996). Development of leader-member exchange: A longitudinal test.
Academy of Management Journal, 39, 1538–1567. doi:10.2307/257068 Bell, S. T. (2007). Deep-level composition variables as predictors of team performance: A meta-
analysis. Journal of Applied Psychology, 92, 595–615. doi:10.1037/0021-9010.92.3.595 *Bernardin, H. J., Cooke, D. K., & Villanova, P. (2000). Conscientiousness and agreeableness as
predictors of rating leniency. Journal of Applied Psychology, 85, 232–234. doi:10.1037/0021- 9010.85.2.232
Does rater personality matter? 409
*Bernardin, H. J., & Orban, J. A. (1990). Leniency effect as a function of rating format, purpose for
appraisal, and rater individual differences. Journal of Business and Psychology, 5, 197–211. doi:10.1007/BF01014332
*Bernardin, H. J., Tyler, C. L., & Villanova, P. (2009). Rating level and accuracy as a function of rater
personality. International Journal of Selection and Assessment, 17, 300–310. doi:10.1111/j. 1468-2389.2009.00472.x
Bernardin, H. J., & Villanova, P. (2005). Research streams in rater self-efficacy. Group and
Organization Management, 30(1), 61–88. doi:10.1177/1059601104267675 Blickle, G., Meurs, J. A., Zettler, I., Solga, J., Noethen, D., Kramer, J., & Ferris, G. R. (2008).
Personality, political skill, and job performance. Journal of Vocational Behavior, 72, 377–387. doi:10.1016/j.jvb.2007.11.008
*Bravo, I. M. (1994). Rater’s personality as a moderator of context effects in performance
appraisals (Unpublished master’s thesis). Florida International University, Miami, FL.
Bretz, R.D.,Milkovich,G. T.,&Read,W. (1992). The current state of performance appraisal research
and practice: Concerns, directions, and implications. Journal of Management, 18, 321–352. doi:10.1177/014920639201800206
Brookings, J. B., Zembar, M. J., &Hochstetler, G. M. (2003). An interpersonal circumplex/five-factor
analysis of the Rejection Sensitivity Questionnaire. Personality and Individual Differences, 34,
449–461. doi:10.1016/S0191-8869(02)00065-X *Chambers, B. A. (2003). Leniency bias in the performance appraisal process: A person 9
situation perspective (Doctoral dissertation). Available from ProQuest Dissertations and Theses
database (UMI No. 3092122).
Cleveland, J. N., Murphy, K. R., & Williams, R. E. (1989). Multiple uses of performance appraisal:
Prevalence and correlates. Journal of Applied Psychology, 74(1), 130–135. doi:10.1037/0021- 9010.74.1.130
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. doi:10.1037/0033- 2909.112.1.155
Cooper, W. H., & Withey, M. J. (2009). The strong situation hypothesis. Personality and Social
Psychology Review, 13, 62–72. doi:10.1177/1088868308329378 Costa, P. T. Jr, & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO
Five-Factor (NEO-FFI) Inventory professional manual. Odessa, FL: Psychological Assessment
Resources.
DeYoung, C. G., Peterson, J. B., & Higgins, D. M. (2005). Sources of openness/intellect: Cognitive
and neuropsychological correlates of the fifth factor of personality. Journal of Personality, 73,
825–858. doi:10.1111/j.1467-6494.2005.00330.x Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual Review of
Psychology, 41, 417–440. doi:10.1146/annurev.ps.41.020190.002221 Duarte, N. T., Goodson, J. R., & Klich, N. R. (1994). Effects of dyadic quality and duration
on performance appraisal. Academy of Management Journal, 37, 499–521. doi:10.2307/ 256698
Dudley, N. M., Orvis, K. A., Lebiecki, J. E., & Cortina, J. M. (2006). A meta-analytic investigation of
conscientiousness in theprediction of job performance: Examining the intercorrelations and the
incremental validity of narrow traits. Journal of Applied Psychology,91(1), 40–57. doi:10.1037/ 0021-9010.91.1.40
Duval, S. J., & Tweedie, R. L. (2000a). A nonparametric “trim and fill” method for accounting for
publication bias in meta-analysis. Journal of American Statistical Association, 95, 89–98. doi:10.1080/01621459.2000.10473905
Duval, S. J., & Tweedie, R. L. (2000b). Trim and fill: A simple funnel plot-basedmethod of testing and
adjusting for publication bias in meta-analysis. Biometrics, 56, 455–463. doi:10.1111/j.0006- 341X.2000.00455.x
Eby, L. T., Allen, T. D., Evans, S. C., Ng, T., & DuBois, D. (2008). Does mentoring matter? A
multidisciplinary meta-analysis comparing mentored and non-mentored individuals. Journal of
Vocational Behavior, 72, 254–267. doi:10.1016/j.jvb.2007.04.005
410 Michael B. Harari et al.
Fama, E. F. (1980). Agency problems and the theory of the firm. The Journal of Political Economy,
88, 288–307. Fama, E. F., & Jensen, M. (1983). Separation of ownership and control. Journal of Law and
Economics, 26, 301–325. Feldman, J. M. (1981). Beyond attribution theory: Cognitive processes in performance appraisal.
Journal of Applied Psychology, 66, 127–148. doi:10.1037/0021-9010.66.2.127 Ferris, G. R., Munyon, T. P., Basik, K., & Buckley, M. R. (2008). The performance evaluation context:
Social, emotional, cognitive, political, and relationship components. Human Resource
Management Review, 18, 146–163. doi:10.1016/j.hrmr.2008.07.006 Fletcher, C. (2001). Performance appraisal and management: The developing research agenda.
Journal of Occupational and Organizational Psychology, 74, 473–487. doi:10.1348/ 096317901167488
Foldes, H. J., Duehr, E. E., & Ones, D. S. (2008). Group differences in personality: Meta-analyses
comparing five U.S. racial groups. Personnel Psychology, 61, 579–616. doi:10.1111/j.1744- 6570.2008.00123.x
Forret, M. L., & Dougherty, T. W. (2001). Correlates of networking behavior for managerial and
professional employees. Group and Organization Management, 26, 283–311. doi:10.1177/ 1059601101263004
*Fried, Y., Levi, A. S., Ben-David, H. A., Tiegs, R. B., & Avital, N. (2000). Rater positive and negative
mood predisposition as predictors of performance ratings of ratees in simulated and real
organizational settings: Evidence from US and Israeli samples. Journal of Occupational and
Organizational Psychology, 73, 373–378. doi:10.1348/096317900167083 George, J. M. (1990). Personality, affect, and behavior in groups. Journal of Applied Psychology, 75,
107–116. doi:10.1037/0021-9010.75.2.107 Goldberg, L. R. (1992). The development of markers for the big-five factor structure. Psychological
Assessment, 4, 26–42. doi:10.1037/1040-3590.4.1.26 *Griffin, B. N. (2010). Interviewer bias in medical school selection. Medical Journal of Australia,
193, 343–346. *Guven, L. (2007). The effect of positive core self and external evaluations on performance
appraisal (Unpublished master’s thesis). Middle East Technical University, Ankara, Turkey.
denHartog, D. N., Boselie, P., & Paauwe, J. (2004). Performancemanagement: Amodel and research
agenda. Applied Psychology: An International Review, 53, 556–569. doi:10.1111/j.1464-0597. 2004.00188.x
Hogan, R. (1991). Personality and personality assessment. In M. D. Dunnette & L. Hough (Eds.),
Handbook of industrial and organizational psychology (pp. 873–919). Chicago, IL: Rand McNally.
Hough, L. M., & Ones, D. S. (2001). The structure, measurement, validity, and use of personality
variables in industrial, work, and organizational psychology. In N. R. Anderson, D. S. Ones, H. K.
Sinangil & C. Viswesvaran (Eds.), International handbook of work and organizational
psychology (pp. 233–377). Thousand Oaks, CA: Sage. Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in
research findings. (2nd ed.). Newbury Park, CA: Sage.
Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited.
Journal of Applied Psychology, 85, 869–879. doi:10.1037/0021-9010.85.6.869 Ilgen, D. R., Barnes-Farrell, J. L., &McKellin, D. B. (1993). Performance appraisal process research in
the 1980s: What has it contributed to appraisals in use?Organizational Behavior and Human
Decision Processes, 54, 321–368. doi:10.1006/obhd.1993.1015 Jawahar, I. M., &Williams, C. R. (1997). Where all the children are above average: The performance
appraisal purpose effect. Personnel Psychology, 50, 905–925. doi:10.1111/j.1744-6570.1997. tb01487.x
Jensen-Campbell, L. A., & Graziano, W. G. (2001). Agreeableness as a moderator of interpersonal
conflict. Journal of Personality, 69, 323–361. doi:10.1111/1467-6494.00148
Does rater personality matter? 411
John, O. P., Naumann, L. P., & Soto, C. J. (2008). Paradigm shift to the integrative big-five trait
taxonomy: History, measurement, and conceptual issues. In O. P. John, R. W. Robins & L. A.
Pervin (Eds.), Handbook of personality: Theory and research (pp. 114–158). New York, NY: Guilford Press.
Johnson, J.W. (2000). A heuristic method for estimating the relativeweight of predictor variables in
multiple regression. Multivariate Behavioral Research, 35, 1–19. doi:10.1207/S15327906 MBR3501_1
Judge, T. A., & Ilies, R. (2002). Relationship of personality to performance motivation: A meta-
analytic review. Journal of Applied Psychology, 87, 797–807. doi:10.1037/0021-9010.87.4.797 Kane, J. S., Bernardin, H. J., Villanova, P., & Peyrefitte, J. (1995). Stability of rater leniency: Three
studies. Academy of Management Journal, 38, 1036–1051. doi:10.2307/256619 Klimoski, R., & Inks, L. (1990). Accountability forces in performance appraisal. Organizational
Behavior and Human Decision Processes, 45, 194–208. doi:10.1016/0749-5978(90)90011-W *Kovacs, R., &Kapel, D. E. (1976). Personality correlates of faculty and course evaluations.Research
in Higher Education, 5, 335–344. doi:10.1007/BF00993432 *Kuang, D. C. Y. (2004).Assessing performance: Investigation of the influence of item context and
rater personality (Doctoral dissertation). Available from ProQuest Dissertations and Theses
database (UMI No. 3150616).
Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87(1), 72–107. doi:10. 1037/0033-2909.87.1.72
LeBreton, J. M., Hargis, M. B., Griepentrog, B., Oswald, F. L., & Ployhart, R. E. (2007). A
multidimensional approach for evaluating variables in organizational research and practice.
Personnel Psychology, 60, 475–498. doi:10.1111/j.1744-6570.2007.00080.x LeBreton, J. M., Ployhart, R. E., & Ladd, R. T. (2004). A Monte Carlo comparison of relative
importance methodologies. Organizational Research Methods, 7, 258–282. doi:10.1177/ 1094428104266017
Levy, P. E., & Williams, J. R. (2004). The social context of performance appraisal: A review
and framework for the future. Journal ofManagement,30, 881–905. doi:10.1016/j.jm.2004.06. 005
Longenecker, C. O., Sims, H. P., & Gioia, D. A. (1987). Behind the mask: The politics of employee
appraisal. The Academy of Management Executive, 1, 183–193. doi:10.5465/AME.1987. 4275731
Mero, N. P., & Motowidlo, S. J. (1995). Effects of rater accountability on the accuracy and the
favorability of performance ratings. Journal of Applied Psychology, 80, 517–524. doi:10.1037/ 0021-9010.80.4.517
Mischel, W. (1973). Toward a cognitive social learning reconceptualization of personality.
Psychological Review, 80, 252–283. doi:10.1037/h0035002 Mischel, W. (1977). The interaction of person and situation. In D. Magnusson & N. S. Endler (Eds.),
Personality at the crossroads: Current issues in interactional psychology (pp. 333–352). Hillsdale, NJ: Lawrence Erlbaum.
Mischel, W., & Shoda, Y. (1995). A cognitive-affective system theory of personality:
Reconceptualizing situations, dispositions, dynamics, and invariance in personality structure.
Psychological Review, 102, 246–268. doi:10.1037/0033-295X.102.2.246 Munyon, T. P., Summers, J. K., Thompson, K. M., & Ferris, G. R. (2014). Political skill and work
outcomes: A theoretical extension, meta-analytic investigation, and agenda for the future.
Personnel Psychology, 68(1), 143–184. doi:10.1111/peps.12066 Murphy, K. R. (2008). Explaining theweak relationship between job performance and ratings of job
performance. Industrial and Organizational Psychology, 1, 148–160. doi:10.1111/j.1754- 9434.2008.00030.x
Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social,
organizational, and goal-based perspectives. Thousand Oaks, CA: Sage.
Murphy, K. R., Herr, B. M., Lockhart, M. C., & Maguire, E. (1986). Evaluating the performance
of paper people. Journal of Applied Psychology,71, 654–661. doi:10.1037/0021-9010.71.4.654
412 Michael B. Harari et al.
*Naffziger, D. W. (1983). Personality and biodata correlates of rater effectiveness (Doctoral
dissertation). Available from ProQuest Dissertations and Theses database (UMI No. 8400923).
O’Boyle, E. H. Jr, Humphrey, R. H., Pollack, J. M., Hawver, T. H., & Story, P. A. (2010). The relation
between emotional intelligence and job performance: A meta-analysis. Journal of
Organizational Behavior, 32, 788–818. doi:10.1002/job.714 Ones, D. S. (1993). The construct validity of integrity tests (Doctoral dissertation). Available from
ProQuest Dissertations and Theses database (UMI No. 9404524).
Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. A. (2007). In support of personality assessment
in organizational settings.Personnel Psychology, 60, 995–1027. doi:10.1111/j.1744-6570.2007. 00099.x
Ones, D. S., Viswesvaran, C., &Reiss, A. D. (1996). Role of social desirability in personality testing for
personnel selection: The redherring. Journal ofAppliedPsychology,81, 660–679. doi:10.1037/ 0021-9010.81.6.660
*Phillips, B. N. (1960). Authoritarian, hostile, and anxious students’ ratings of an instructor.
California Journal of Educational Research, 11(1), 19–23. R Development Core Team (2010). R: A language and environment for statistical computing.
Vienna, Austria: R Foundation for Statistical Computing.
*Randall, R., & Sharples, D. (2012). The impact of rater agreeableness and rating context on the
evaluation of poor performance. Journal of Occupational andOrganizational Psychology,85,
42–59. doi:10.1348/2044-8325.002002 Roberts, B.W. (2009). Back to the future: Personality and assessment and personality development.
Journal of Research in Personality, 43, 137–145. doi:10.1016/j.jrp.2008.12.015 Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency of personality traits from
childhood to old age: A quantitative review of longitudinal studies. Psychological Bulletin, 126
(1), 3–25. doi:10.1037/0033-2909.126.1.3 Roberts, B. W., & Jackson, J. J. (2008). Sociogenomic personality psychology. Journal of
Personality, 76, 1523–1544. doi:10.1111/j.1467-6494.2008.00530.x Roberts, B. W., Lejuez, C., Krueger, R. F., Richards, J. M., & Hill, P. L. (2014). What is
conscientiousness and how can it be assessed? Developmental Psychology, 50(5), 1315–1330. doi:10.1037/a0031109
*Roch, S. G., Ayman, R., Newhouse, N., & Harris, M. (2005). Effect of identifiability, rating audience,
and conscientiousness on rating level. International Journal of Selection and Assessment, 13
(1), 53–62. doi:10.1111/j.0965-075X.2005.00299.x Saucier, G., & Ostendorf, F. (1999). Hierarchical subcomponents of the Big Five personality factors:
A cross-language replication. Journal of Personality and Social Psychology, 76, 613–627. doi:10.1037/0022-3514.76.4.613
*Schneider, C. E. (1977). Operational utility and psychometric characteristics of behavioral
expectations scales: A cognitive reinterpretation. Journal of Applied Psychology, 62, 541–548. doi:10.1037/0021-9010.62.5.541
Shore, T. H., & Tashchian, A. (2002). Accountability forces in performance appraisal: Effects of self-
appraisal information, normative information, and task performance. Journal of Business and
Psychology, 17, 261–274. doi:10.1023/A:1019689616654 Smither, J. W., London, M., & Reilly, R. R. (2005). Does performance improvement follow
multisource feedback? A theoretical model, meta-analysis, and review of empirical findings.
Personnel Psychology, 58, 33–66. doi:10.1111/j.1744-6570.2005.514_1.x *Stephens, O. D. (2001). Capturing the policy that air force raters use when writing performance
appraisals on junior officers (Unpublished master’s thesis). Air Force Institute of Technology,
Wright-Patterson Air Force Base, OH.
Stewart, G. G., & Barrick, M. R. (2004). Four lessons learned from the person-situation debate: A
review and research agenda. In B. Schneider & D. B. Smith (Eds.), Personality and
organizations (pp. 61–85). Mahwah, NJ: Erlbaum. *Strauss, J. P., Barrick, M. R., & Connerley, M. L. (2001). An investigation of personality similarity
effects (relational and perceived) on peer and supervisor ratings and the role of familiarity and
Does rater personality matter? 413
liking. Journal of Occupational and Organizational Psychology, 74, 637–657. doi:10.1348/ 096317901167569
Tett, R. P., &Burnett, D.D. (2003). Apersonality trait-based interactionistmodel of jobperformance.
Journal of Applied Psychology, 88, 500–517. doi:10.1037/0021-9010.88.3.500 Tett, R. P., &Guterman, H. A. (2000). Situation trait relevance, trait expression, and cross-situational
consistency: Testing a principle of trait activation. Journal of Research in Personality, 34, 397– 423. doi:10.1006/jrpe.2000.2292
Tonidandel, S., & LeBreton, J. M. (2011). Relative importance analysis: A useful supplement to
regression analysis. Journal of Business and Psychology, 26, 1–9. doi:10.1007/s10869-010- 9204-3
*Tziner, A., Murphy, K. R., & Cleveland, J. N. (2002). Does conscientiousness moderate the
relationship between attitudes and beliefs regarding performance appraisal and rating behavior?
International Journal of Selection and Assessment, 10, 218–224. doi:10.1111/1468-2389. 00211
*Tziner, A.,Murphy,K. R.,&Cleveland, J. N. (2003). Personalitymoderates the relationship between
context factors and rating behavior. Advances in Psychology Research, 22, 107–120. Tziner, A., Murphy, K. R., & Cleveland, J. N. (2005). Contextual and rater factors affecting
rating behavior. Group and Organization Management, 30, 89–98. doi:10.1177/ 1059601104267920
Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). Howmany studies do you need? A primer on
statistical power for meta-analysis. Journal of Educational and Behavioral Statistics, 35, 215– 247. doi:10.3102/1076998609346961
Viechtbauer, W. (2010). Conducting meta-analysis in R with the metafor package. Journal of
Statistical Software, 36, 1–48. Villanova, P., Bernardin, H. J., Dahmus, S. A., & Sims, R. L. (1993). Rater leniency and performance
appraisal discomfort. Educational and Psychological Measurement, 53, 789–799. doi:10. 1177/0013164493053003023
Viswesvaran, C., & Ones, D. S. (1995). Theory testing: Combining psychometric meta-analysis and
structural equation modeling. Personnel Psychology, 48, 865–885. doi:10.1111/j.1744-6570. 1995.tb01784.x
Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (2008). No new terrain: Reliability and construct
validity of job performance ratings. Industrial and Organizational Psychology, 1, 174–179. doi:10.1111/j.1754-9434.2008.00033.x
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2002). The moderating influence of job performance
dimensions on convergence of supervisory andpeer ratings of jobperformance: Unconfounding
construct level congruence and rating difficulty. Journal of Applied Psychology, 87, 345–354. doi:10.1037/0021-9010.87.2.345
*Wexley, K. N., & Youtz, M. A. (1985). Rater beliefs about others: Their effects on rating errors and
rater accuracy. Journal of Occupational Psychology, 58, 265–275. doi:10.1111/j.2044-8325. 1985.tb00200.x
Wherry, R. J. (1952). The control of bias in ratings: A theory of ratings. Columbus, OH: The Ohio
State Research.
*Yun, G. J., Donahue, L. M., Dudley, N. M., & McFarland, L. A. (2005). Rater personality, rating
format, and social context: Implications for performance appraisal ratings. International
Journal of Selection and Assessment, 13, 97–107. doi:10.1111/j.0965-075X.2005.00304.x
Received 5 November 2013; revised version received 28 July 2014
414 Michael B. Harari et al.
Copyright of Journal of Occupational & Organizational Psychology is the property of Wiley- Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.