presentation

profileannyzyl123
6013.pdf

Journal of Occupational and Organizational Psychology (2015), 88, 387–414

© 2014 The British Psychological Society

www.wileyonlinelibrary.com

Does rater personality matter? A meta-analysis of rater Big Five–performance rating relationships

Michael B. Harari1*, Cort W. Rudolph2 and Andrew J. Laginess3

1Florida Atlantic University, Boca Raton, Florida, USA 2Saint Louis University, Saint Louis, Missouri, USA 3Florida International University, Miami, Florida, USA

Weexamined rater personality traits consistent with the Five-FactorModel as sources of

systematic non-performance variance in job performance ratings using meta-analysis

(k = 28). Several personality factors, including agreeableness, extraversion, and emo- tional stability, were related to performance ratings (q = .25, .12, and .12, respectively), and features of the rating context (e.g., study setting, appraisal purpose, accountability)

moderated these relationships. Cumulatively, the Big Five accounted for between 6% and

22% of the variance in performance ratings. Implications for performance appraisal

research and practice are discussed.

Practitioner points

� Performance ratings serve a number of important functions in organizations, and their construct validity is a central issue.

� We identified rater personality traits, consistent with the Five-Factor Model, as sources of non- performance variance in performance ratings.

� To the extent that job performance ratings are contaminated by rater personality traits, requiring raters to justify their ratings may result in criterion scores that reflect greater levels of criterion

relevance.

Performance ratings serve central functions in organizations, with implications for

performance management and employee development (den Hartog, Boselie, & Paauwe, 2004; Smither, London, & Reilly, 2005), administrative decision-making (Cleveland,

Murphy, & Williams, 1989), and other human resource functions (e.g., strategic,

informational, maintenance, and documentation, see Aguinis, 2013, p. 18). Considering

the importance of performance ratings, their construct validity has been a crucial issue

and a large body of research on the subject exists (Austin & Villanova, 1992; Levy &

Williams, 2004; Murphy & Cleveland, 1995). While it is desirable for ratings of job

performance to be saturated by the ratee’s job-relevant behaviours, research suggests that

other sources contribute variance to performance ratings (Murphy, 2008). To identify and account for performance irrelevant sources of variance, research has focused on the role

of the organization’s social context (Levy & Williams, 2004). At present, this research

*Correspondence should be addressed toMichael B. Harari, Department ofManagement, Florida Atlantic University, 777Glades Rd., Boca Raton, FL 33431, USA (email: [email protected]). An earlier version of this paper was presented at the 2014 Society for Industrial and Organizational Psychology Conference Honolulu, HI.

DOI:10.1111/joop.12086

387

indicates that features of the evaluation context can influence performance rating

processes and outcomes (Ferris, Munyon, Basik, & Buckley, 2008). By examining

performance evaluation in situ, this research has provided useful avenues for

understanding why performance ratings may be inaccurate and means by which the quality of ratings can be improved.

While features of the rating context have been highlighted as determinants of

performance ratings in several systematic reviews (e.g., Jawahar &Williams, 1997; Levy &

Williams, 2004), much less research has emphasized how rater personality traits impact

performance assessments. Personality traits reflect one’s propensities to behave in

characteristic ways in response to situational demands (e.g., Mischel & Shoda, 1995;

Roberts, 2009), and research suggests that personality traits guide behaviours in contexts

that are relevant for trait expression (Tett & Guterman, 2000). We argue that the situational demands posed by the performance rating context provide ample opportunity

for raters to behave in trait-relevantways (Ferris et al., 2008; Tett & Burnett, 2003; Tziner,

Murphy, & Cleveland, 2005), and as a result, rater traits will influence performance rating

processes and outcomes. This issue bears on the validity of performance ratings, as rater

personality traits may act as a source of performance irrelevant variance. Thus, research

into rater personality–performance rating relationships can have important implications for performance evaluation in organizations.

A number of researchers and theorists have speculated that rater traits play a role in the performance appraisal process. For instance, Landy and Farr (1980) proposed a model of

job performance ratings where rater individual differences influenced rater performance

appraisal strategies, which influenced performance ratings. Kane, Bernardin, Villanova,

and Peyrefitte (1995) suggested that performance ratings could ‘reflect personality or

information-processing differences among raters’ (p. 1047). In a review article, Tziner

et al. (2005) noted that personality traits ‘likely play a part in shaping rating behavior’ (p.

94). Furthermore, a number of empirical investigations along these lines have taken place

(e.g., Bernardin & Orban, 1990; Fried, Levi, Ben-David, Tiegs, & Avital, 2000; Randall & Sharples, 2012; Yun, Donahue, Dudley, & McFarland, 2005). However, despite the

research conducted thus far, there are several issues in this literature that preclude the

ability to draw meaningful conclusions.

One issue is that primary studies have examined the effects of a wide variety of rater

personality traits on performance ratings with little coherency across studies. For

instance, research has examined the effects of rater hostility (Phillips, 1960), need for

achievement (Kovacs & Kapel, 1976), cognitive complexity (Bernardin & Orban, 1990;

Schneider, 1977), self-esteem (Guven, 2007; Wexley & Youtz, 1985), and ego concern (Chambers, 2003), among others. Thus, this literature has proceeded in a fragmented

fashion and has lacked a systematic framework for organizing personality traits.

Another issue is that there are many inconsistent findings in this literature. For

instance, some research indicates a positive relationship between conscientiousness and

performance ratings (Roch, Ayman, Newhouse, & Harris, 2005), while other research

indicates a negative relationship (Bernardin, Cooke, & Villanova, 2000; Bernardin, Tyler,

& Villanova, 2009), and still other research indicates virtually no relationship (Tziner,

Murphy, &Cleveland, 2002). The same state of affairs is evident for other personality traits as well (e.g., extraversion; Bernardin et al., 2000, 2009; Strauss, Barrick, & Connerley,

2001).

Compounding this issue is that features of the rating context vary across studies. It is

well recognized that personality traits influence one’s characteristic manner of respond-

ing to certain situations, rather than a tendency to behave similarly across situations

388 Michael B. Harari et al.

(Mischel & Shoda, 1995; Roberts, 2009; Stewart & Barrick, 2004; Tett &Guterman, 2000).

Thus, features of the rating context should influence the effect of rater personality on

performance ratings and variation in contextual features likely contributes to the

inconsistent findings that are evident in this literature. As such, contextual features should be examined as moderators of the rater personality–performance rating relationships.

The purpose of this study is to address these issues by conducting a quantitative

literature review. Meta-analytic methods allow us to assess the extent to which mixed

findings are due to sampling error as opposed to the presence of moderators (Hunter &

Schmidt, 2004). Furthermore, our analytic approach allows us to test the effects of

theoretically relevant features of the rating context asmoderators of the rater personality– performance rating relationships. Therefore, our approach will not only clarify the

existing literature, but may also result in a number of novel conclusions concerning the conditional influences of rater personality traits on performance ratings.

To classify the personality traits examined in prior studies consistent with a well-

accepted taxonomy, we draw upon the Five-Factor Model of personality (cf. Costa &

McCrae, 1992; Digman, 1990). This approach has been useful for quantifying general-

izable relationships between personality traits and workplace outcomes (e.g., Barrick &

Mount, 1991; Judge & Ilies, 2002). Note that we are not suggesting that traits that fall

outside of the Five-Factor Model are not useful for understanding rating behaviour.

Indeed, individual differences such as political skill and emotional intelligence may very well play a role as determinants of rating behaviour and outcomes (cf. Ferris et al., 2008).

However, the Five-Factor Model is a useful typology for organizing the existing

fragmented literature.

In the following paragraphs, we discuss the social context features of organizations

that are relevant to performance evaluation. Following this, we review the Five-Factor

Model of personality, discuss the relevance of these traits as determinants of performance

ratings in the light of the social context of performance appraisal, and form hypotheses

concerning rater personality–performance rating relationships. Finally, we discuss potential moderators of these relationships.

The social context of performance ratings

Research has recognized that the construct validity of performance ratings might be

problematic, and several approaches to dealing with this issue have subsequently

emerged (Murphy, 2008). Research first focused on measurement issues, suggesting that

the construct validity of performance ratings could be improved by, for example, developing better rating scales (cf. Austin & Villanova, 1992). However, this line of

research had limited success in improving performance ratings (Landy & Farr, 1980).

Following this, research began to focus on the cognitive processes involved in

performance appraisal, noting that raters were imperfect information processors and

that this could lead to errors and biases in the rating process and thus poor construct

validity of performance ratings (Feldman, 1981). However, this approach also had only

limited success in improving performance ratings in organizations (Ilgen, Barnes-Farrell,

& McKellin, 1993). In response to the limited utility of these approaches, research began to shift focus

towards the performance evaluation context. That is, raters were perceived as actors in a

rich and complex organizational context that could influence their motivations such that

accuracy may be only a subsidiary goal (Murphy, 2008; Murphy & Cleveland, 1995).

By recognizing the rich context in which performance ratings occur, research has

Does rater personality matter? 389

developed a more clear understanding of the non-performance factors that influence

performance ratings in organizations (Ferris et al., 2008; Levy & Williams, 2004).

Murphy and Cleveland (1995) brought light to the role of the organizational

context in performance evaluation by noting that the rating context influences performance judgments, performance ratings, and the evaluation process. Murphy and

Cleveland’s model distinguished between distal variables (i.e., those that influence

performance evaluation indirectly) and proximal variables (i.e., those that influence

performance evaluation directly). Distal variables include, for example, the organiza-

tional structure, the legal climate, and workforce composition. Proximal variables

include, for example, the rater–ratee relationship, consequences of performance ratings, and rater accountability.

Levy andWilliams (2004) conducted a cumulative review of the literature into the role of the organizational social context on performance ratings. In doing so, they expanded

upon Murphy and Cleveland’s (1995) typology to build a framework for summarizing the

existing literature. Similar to Murphy and Cleveland’s framework, Levy and Williams

distinguished between distal and proximal contextual determinants of performance

rating behaviour. However, the research also distinguished between two types of

proximal variables – process proximal variables (i.e., how the appraisal process is conducted) and structural proximal variables (i.e., formal design characteristics). Levy

and Williams’ review highlighted evidence suggesting that process and structural proximal variables influenced performance evaluation processes and rating behaviours.

More recently, Ferris et al. (2008) provided an integrative framework for understand-

ing the contextual backdrop in organizations as it pertains to performance evaluation.

Specifically, Ferris et al.’s reviewnoted that the context inwhich performance evaluation

occurs encompasses ‘cognitive, social and relationship, affective and emotional, and

political and relationship context features’ (p. 150). Rather than acting in isolation, each of

these context features interacts with one another continuously and dynamically in

shaping the evaluation context in an organization. Cognitive context refers to the rater’s cognitive processes involved in observing, encoding, storing, retrieving, and integrating

observations of a ratee’s performance. Social and relationship context concerns features

of the rater–ratee dyadic work relationship. Social influence and politics concerns the influence of deliberate manipulation by raters and ratees on performance ratings. Finally,

affect and emotion concerns the role of affective regard for a ratee (i.e., liking) on

performance ratings. Ferris et al. reviewed evidence suggesting that each of these

contextual features influenced performance ratings in organizations (see also Fletcher,

2001; Levy & Williams, 2004).

Personality and performance ratings

Consistent with Roberts (2009), we define personality as ‘relatively enduring patterns of

thoughts, feelings, and behaviors that reflect the tendency to respond in certain ways

under certain circumstances’ (p. 140). While personality traits themselves remain

relatively stable within person across appreciable amounts of time (Roberts &

DelVecchio, 2000), their influence on thoughts, feelings, and behaviours is shaped by environmental influences – features of the context in which the actor is embedded (Roberts & Jackson, 2008; Roberts, Lejuez, Krueger, Richards, &Hill, 2014). For example,

Tett and Guterman (2000) discussed the principle of trait activation. Trait activation

holds that ‘the behavioral expression of a trait requires arousal of that trait by trait-relevant

situational cues’ (Tett & Guterman, 2000, p. 398). Similarly, Mischel and Shoda (1995)

390 Michael B. Harari et al.

characterized personality as ‘a system of mediating processes, conscious and uncon-

scious, whose interactions are manifested in predictable patterns of situation-behavior

relations’ (p. 247).

We argue that the contextual features that serve as a backdrop to performance evaluation provide a landscape for rater personality traits to manifest themselves in the

rating process (Tett &Guterman, 2000; Tziner et al., 2005). That is, in response to certain

contextual features, rater personality traits would influence their thoughts, feelings, and

rating behaviours, and thus, rater personality traits should influence performance rating

scores (Tett & Burnett, 2003; Tett &Guterman, 2000). This in situ theoretical perspective

helps to explain the dynamic interplay between rating context and rater personality, and

the joint influence of context and personality in the performance rating process.

Considering this, rater personality traits have the potential to influence the validity of performance ratings, which bears on critical decisions that are guided by performance

rating scores. Examining and understanding these relationships can ultimately improve

the construct validity of performance ratings by spurring research into interventions that

can reduce their influence. Thus, a focus on rater personality will enrich our

understanding of the determinants of performance ratings in organizations beyond the

influence of the rating context alone (Ferris et al., 2008).

As already noted, the Five-Factor Model of personality (also referred to as the Big

Five) has been proposed as an integrative framework for studying individual differences in personality and is among the most well-accepted taxonomies of personality in the

literature (Costa & McCrae, 1992; Digman, 1990). According to this perspective, the

latent structure of personality is hierarchical with five factors at the highest level

(Goldberg, 1992; Hough & Ones, 2001). The five factors are as follows: Agreeableness,

extraversion, emotional stability, conscientiousness, and openness. We review each of

these factors in the following sections and discuss features of the rating context that are

relevant for the expression of each of the five factors in a performance evaluation setting.

Note that positive personality–performance rating relationships are interpreted as reflecting rating elevation (i.e., higher scores on the personality trait are associated with

higher, or more elevated, performance ratings), while negative personality–performance rating relationships are interpreted as reflecting rating stringency (i.e., higher scores on

the personality trait are associated with lower, or more stringent, performance ratings;

Bernardin et al., 2000).

Agreeableness

Agreeableness reflects traits such as trust, altruism, and cooperation (Costa & McCrae,

1992). Agreeableness is related to a tendency to favour positive social relationships and to

avoid conflict (Jensen-Campbell & Graziano, 2001). Considering this, we suggest that the

social and relationship context features as reviewed by Ferris et al. (2008) provide

opportunities for rater agreeableness to influence performance ratings. For example,

delivering negative feedback to an employee regarding his or her performance can have a

negative influence on the rater–ratee work relationship (Murphy & Cleveland, 1995) and raters who are high in agreeableness may be motivated to avoid disturbing this social relationship. Along these lines, research suggests that raters may respond to a motivation

to avoid negative exchanges with ratees by inflating performance rating scores (Shore &

Tashchian, 2002).

Agreeableness is also associated with a tendency to be sympathetic and concerned

for the welfare of others (Costa & McCrae, 1992). Such tendencies are relevant for

Does rater personality matter? 391

performance evaluation behaviours. Performance appraisal has both short- and long-

term consequences for employees. In terms of the former, Ferris et al. (2008) argued

that poor performance evaluations could negatively influence ratee affect, which in

turn could result in withdrawal and decreased relationship satisfaction. In terms of the later, poor performance ratings could influence an employee’s career (e.g., oppor-

tunities for promotions or salary increases; Aguinis, 2013; Cleveland et al., 1989).

Raters who are high in agreeableness may inflate performance ratings to protect ratees

from the consequences associated with poor performance evaluations. Considering

this, we propose that agreeableness is positively related to performance ratings.

Hypothesis 1: Rater agreeableness is positively related to performance ratings.

Extraversion

Extraversion reflects traits such as sociability, assertiveness, and gregariousness and is

associated with a tendency to be friendly and to prefer the company of others (Costa &

McCrae, 1992). Research suggests that extraversion is positively related to networking

behaviours in organizations as well as social network size (Forret & Dougherty, 2001).

Such tendencies are relevant when considering the social and relationship context

features of performance evaluation (Ferris et al., 2008), as raters who are high in

extraversion would be likely to form favourable dyadic work relationships with their ratees (John, Naumann, & Soto, 2008). Along these lines, research suggests that rater– ratee relationship quality is positively related to performance ratings, beyond the

influence of objective performance levels (Duarte, Goodson, & Klich, 1994). Thus, as

extraverted raters are likely to form favourable relationships with their ratees, and

favourable rater–ratee relationships are associated with elevated performance ratings, raters who are high in extraversionmay be likely to provide elevated performance ratings.

This line of reasoning suggests a positive rater extraversion–performance rating relationship.

Hypothesis 2: Rater extraversion is positively related to performance ratings.

Emotional stability

Emotional stability reflects traits such as calmness and even-temperedness (Costa &

McCrae, 1992). In discussing the proposed relationship between rater emotional

stability and performance ratings, it is useful to consider low emotional stability – neuroticism. Neuroticism encompasses traits such as anxiety and depression (John

et al., 2008) and is associated with a tendency to become easily angered and frustrated

by others (Costa & McCrae, 1992). Along these lines, research suggests that neuroticism

inhibits cooperation with others and may result in poor working relationships (George,

1990). This is relevant for performance rating behaviours, as neurotic raters may be

likely to have poor relationships with their ratees, which could result in lower

performance ratings (Duarte et al., 1994). As this perspective suggests that raters who

are low in emotional stability (i.e., high in neuroticism) should rate performance low, we propose that rater emotional stability (i.e., low neuroticism) is positively related to

performance ratings.

392 Michael B. Harari et al.

Hypothesis 3: Rater emotional stability is positively related to performance ratings.

Conscientiousness

Conscientiousness reflects traits such as dependability, thoroughness, and achievement

orientation (Costa & McCrae, 1992). Ferris et al.’s (2008) review noted that the

standards against which performance is evaluated could influence performance ratings

(see also Murphy & Cleveland, 1995). Employees who are high in conscientiousness

generally display superior job performance as compared to employees who are lower in this trait (Barrick & Mount, 1991; Hurtz & Donovan, 2000). As such, raters who are high

in conscientiousness may have relatively high standards for performance and may

therefore expect exceptional performance, which could result in lower performance

ratings. As a result, rater conscientiousness should be negatively related to performance

ratings.

Hypothesis 4: Rater conscientiousness is negatively related to performance ratings.

Openness

Openness reflects traits such as imaginative, creative, curious, and original (Costa &

McCrae, 1992). Openness is relevant for performance appraisal in organizations in the light of the cognitive context features discussed in Ferris et al. (2008). Cognitive models

of performance appraisal indicate that rating errors result from the rater’s finite

information-processing capabilities. As openness is positively associated with cognitive

functioning (DeYoung, Peterson, & Higgins, 2005), raters who are high in openness may

be better able to integrate larger numbers of performance episodes into their

performance judgments (Murphy & Cleveland, 1995). That is, raters who are high in

openness may be less prone to cognitive biases that influence performance evaluation

(Feldman, 1981). Openness is also related to a tendency to form more complex attributions for the

behaviours of others (Brookings, Zembar, & Hochstetler, 2003). In their review of the

performance rating context literature, Levy and Williams (2004) noted that ‘attributional

processing is an important element of the rating process and these attributions, in part,

determine raters’ reactions and ratings’ (p. 887). Thus, raters who are high in openness

may react to the performance rating context by deeply considering the causes and

meaning of a wide range of observed performance episodes. Considering this, rater

openness may facilitate accurate performance ratings that are neither elevated nor suppressed. Thus, there is little reason to predict a rater openness–performance rating relationship, and we therefore hypothesize the following:

Hypothesis 5: Rater openness is not related to performance ratings.

Moderators

Study setting

We have noted that the contextual landscape that characterizes the social system of

organizations provides the opportunity for personality traits to influence performance

rating behaviours (Ferris et al., 2008; Levy & Williams, 2004; Tett & Guterman, 2000).

Does rater personality matter? 393

While laboratory studies may be able to simulate some naturalistic context features of

performance appraisal, it is likely that the full spectrum of contextual features and

their dynamic interactions (cf. Ferris et al., 2008) would not be well represented. To

the extent that rater personality–performance rating relationships depend on such features, we would expect the magnitude of these effects to be larger in field studies

as compared to laboratory studies. For example, if a positive rater agreeableness– performance rating relationship occurs in response to interpersonal consequences

associated with poor performance ratings (Jensen-Campbell & Graziano, 2001;

Murphy & Cleveland, 1995), this effect may be reduced in laboratory settings where

there is no continued interaction among participants and therefore scant possibility

for long-term interpersonal consequences (see Bell, 2007 for an application of this

logic to research involving personality and team performance). As a result, we propose that study setting will moderate the rater personality–performance rating relationships such that the effects are stronger in field studies as compared to

laboratory studies.

Hypothesis 6: The rater personality–performance rating relationships are moderated by study setting such that the relationships are stronger in field settings

and weaker in laboratory settings.

Situational strength

Beyond the relevance of social context features for personality trait expression in

performance appraisal, it is also important to note that features of the situation can act to inhibit the effect of traits, regardless of the relevance of other contextual features for trait

expression (Tett & Burnett, 2003). Situational strength refers to the extent to which

demands placed on individuals by a situation act to constrain the influence of one’s

personality on their behaviour (Cooper&Withey, 2009;Mischel, 1977). A strong situation

is one in which situational demand characteristics pose pressure to behave in a certain

way, inducing conformity and reducing the influence of personality traits on behaviours

(Mischel, 1973). On the other hand, a weak situation is one in which demand

characteristics posed by the environment are low, allowing the individual to determine their own course of action in a given situation. Thus, weak situations better allow for

personality traits to influence behaviour. A number of features of a performance appraisal

context may influence situational strength, and we focus on two: Appraisal purpose and

accountability.

Appraisal purpose. Broadly speaking, research has considered two classes of purposes

for which performance ratings are collected: Administrative purposes (e.g., promotion, raises) and research/developmental purposes (e.g., feedback, training; Cleveland et al.,

1989; Levy & Williams, 2004). When rating performance for administrative purposes,

raters are making decisions that will potentially have a vast impact on the ratee’s career.

The results of the performance appraisal may be permanently included in the ratee’s

records and could influence their chances at receiving a promotion or raise and may

contribute towards their termination. Thus, when providing administrative ratings,

raters face strong situational pressures to, for example, inflate or distort performance

ratings in some way (Jawahar & Williams, 1997). Considering this, a situation in which a

394 Michael B. Harari et al.

rater is evaluating performance for administrative purposes could be characterized as

strong. In such a situation, the rater personality–performance rating relationships may be reduced. On the other hand, when ratings are collected for research or develop-

mental purposes, many of the aforementioned situational pressures are alleviated. Thus, performance appraisals conducted under research or developmental conditions

represent a weak situation, allowing rater personality traits to influence performance

ratings.

Hypothesis 7: The rater personality–performance rating relationships are moderated by appraisal purpose such that the relationships are attenuated when

ratings are collected for administrative purposes and strengthened when

ratings are collected for research or developmental purposes.

Accountability. Accountability refers to the extent to which raters are held responsible

for their ratings of an employee (Levy & Williams, 2004). For instance, when raters must

justify their ratings to others, accountability would be high (Wherry, 1952). Research

suggests that holding raters accountable for their ratings represents a strong situation that

influences performance rating outcomes (Klimoski & Inks, 1990; Mero & Motowidlo, 1995). Indeed, as raters are pressed to explain their performance ratings, they face

stronger situational pressures to rate performance accurately. As a result, high

accountability could be characterized as representing a strong situation that reduces

the influence of rater traits on performance ratings. On the other hand, when

accountability is low, raters are free from situational pressures that necessitate accurate

ratings. Therefore, we consider low accountability as reflecting a weak situation that

would allow for rater personality traits to drive rating behaviour. Thus,we suggest that the

rater personality–performance rating relationships will be attenuated when accountabil- ity is high and strengthened when accountability is low.

Hypothesis 8: The rater personality–performance rating relationships are moderated by accountability such that the relationships are attenuated when

accountability is high and strengthened when accountability is low.

In summary, the purpose of this study is to use meta-analytic methods to assess the

relationships between rater personality traits and performance ratings. In doing so, we

invoke the Five-Factor Model of personality as an organizing framework for classifying personality traits examined in the existing literature. We also examine the influence of a

number of theoretically relevant moderators on these relationships, including study

setting, appraisal purpose, and accountability.

Method

Literature search

To identify relevant studies, we first conducted a literature search using ProQuest,

PsycINFO, Google Scholar, and ABI-Inform. We searched for articles that included any

of the following: Rater individual differences or rater personality. We also searched

these databases for articles that contained various combinations of rater, rating,

individual differences, Big Five, five-factor model, personality, openness, conscien-

tiousness, extraversion, agreeableness, neuroticism, and emotional stability.

Does rater personality matter? 395

Furthermore, we identified additional studies by reviewing the reference sections of all

relevant articles returned by the literature search. This process yielded a total of 304

articles.

Criteria for inclusion

Todeterminewhich of these 304 articleswould be included in our analyses,we employed

the following decision rules. First, the dependent variable in the study had to be a rating of

another person’s performance, including performance on the job, work samples or

simulations (either in a field or laboratory setting), and vignettes describing a hypothetical

employee’s performance. Studies that reported ratings of traits, clinical assessments,

problematic behaviour, leadership style, and other non-performance variables were excluded, as were any studies that examined self-ratings of performance. Second, the

study had to include as an independent variable a personality measure that could be

classified as openness, conscientiousness, extraversion, agreeableness, or emotional

stability. Third, the study had to report a zero-order correlation between the personality

variable and mean performance ratings across ratees or enough statistical information so

that it could be computed. A total of 21 studies (and 28 independent samples) met this

inclusion criteria and were included in our analyses.

Coding scheme

Prior to coding, the first and third authors independently classified each of the personality

measures used in each study as an indicator of openness, conscientiousness, extraversion,

agreeableness, or emotional stability. Many studies included explicit measures of the Big

Five personality factors (e.g., the NEO-PI-R; Costa &McCrae, 1992), and in these cases, no

judgment calls weremade on the part of the authors.Where explicit measures of the Five-

Factor Model were not used, the authors consulted the taxons proposed by Hough and Ones (2001). TheHough andOnes taxonsweredeveloped to guidemeta-analytic research

that involves personality traits by explicating the Big Five personality factor assessed by a

variety of commercial personality scales. Since the introduction of the taxons, they have

been drawn upon in a number of meta-analyses involving personality (e.g., Dudley, Orvis,

Lebiecki, & Cortina, 2006; Foldes, Duehr, & Ones, 2008). When a measure was not

explicitly developed to assess one of the Big Five factors and was not included in the

Hough and Ones taxons, the two authors (1) reviewed the scale development papers and

scale items to determine what (if any) Big Five factor the measure was assessing and (2) searched the literature for statistical evidence that supported a classification of the

measure within the Big Five framework (e.g., factor analytic studies that included the

measure and established measures of the Big Five, correlations between the measure and

established measures of the Big Five). Agreement between the two authors was initially

94%, and the discrepancies were resolved between the two authors and the second

author. Ultimately, 100% agreement for all variables was reached.

Next, the first and third authors independently coded each study for sample size, effect

size, measurement reliability, and moderators. Measurement reliability for both the predictor and criterion measures was operationalized as internal consistency (i.e.,

coefficient alpha). Where studies reported effect sizes between multiple measures of the

same predictor and criterion, compositeswere formed according to themethods outlined

by Hunter and Schmidt (2004). If the information needed to compute composites was

unavailable, we computed composites by averaging the effect sizes. We computed

396 Michael B. Harari et al.

composite reliabilities using the Spearman–Brown formula or, where the needed information was not available, by averaging the coefficient alpha reliability estimates

reported for each measure.

In terms of the moderators, performance appraisal purpose and accountability were coded only if the primary study was explicit about these features. Agreement

between the two authors was initially 97%, and all discrepancies were resolved between

the two authors and the second author.1

Analyses

We conducted the meta-analysis according to the procedures outlined by Hunter and

Schmidt (2004). This method allowed us to estimate construct-level relationships by correcting observed relationships for statistical artefacts, specifically sampling error and

measurement reliability. First, we calculated the sample size-weighted mean correlation

for performance ratings with each of the personality traits. We then estimated the

population correlation (i.e., q) by correcting each mean correlation for measurement reliability in both the predictor and criterion using artefact distributions, as reliability

information was only sporadically available. Direct range restriction was not evident in

any studies, and therefore, we made no such corrections. Finally, we estimated 95%

confidence intervals (using the methods outlined by Viswesvaran, Schmidt, & Ones, 2002) and 80% credibility intervals around each estimate of q, as well as the percentage variance accounted for by statistical artefacts (i.e., sampling error and unreliability in the

predictor and criterion measures) for each estimate of q. To evaluate the presence of moderators, we employed the 75% rule (i.e., a moderator is likely to be present when the

percentage variance accounted for by statistical artefacts is <75%; Hunter & Schmidt, 2004). The same analytic procedure was repeated for each level of each moderator in the

moderator analyses.

While the moderators examined in our study were conceptually distinct, it is important to establish that they are empirically distinct aswell. Therefore,we assessed the

relationships between the moderators using the phi coefficient. The phi coefficient is

used to test the relationship between variables in a two-by-two contingency table and can

be interpreted by the same standards as a correlation coefficient. The study setting– accountability and appraisal purpose–accountability relationshipswereminimal (Φ = .22 and .18, respectively). The study setting–appraisal purpose relationship was stronger (Φ = .40), but still did not suggest that these two moderators were redundant. Thus, we concluded that themoderators examined herewere sufficiently distinct tomerit inclusion in our analyses.

An important consideration for our moderator analyses is the minimum number of

studies that must be included in any given analysis to provide informative results. For

instance, several meta-analytic reviews in the organizational sciences have required k = 3 effect sizes to proceed with analyses (e.g., Eby, Allen, Evans, Ng, & DuBois, 2008;

Viswesvaran et al., 2002). However, Valentine, Pigott, and Rothstein (2010) noted that

meta-analytic methods can be applied effectively to as few as two effect sizes and that this

approach is superior to othermeans bywhich findings from a small number of studies can be interpreted (e.g., so-called cognitive algebra whereby one tries to mentally integrate

findings across studies). Therefore, we determined that we would proceed with our

1 The complete code sheet is available upon request from the first author.

Does rater personality matter? 397

moderator analyses so long as at least two effect sizes were available (however, all of our

analyses involved at least k = 3 effect sizes). While estimating bivariate rater personality–performance rating relationships is

informative, we also used our bivariate meta-analytic estimates to assess the multiple correlation of the Five-Factor Model of personality as a set on performance ratings.

Doing so required us to build a correlation matrix involving the relationships estimated in

this study as well as the intercorrelations among the Big Five factors of personality.

Consistent with other meta-analyses (e.g., Munyon, Summers, Thompson, & Ferris, 2014;

Ones, Dilchert, Viswesvaran, & Judge, 2007), we used the estimates derived by Ones

(1993; also reported in Ones, Viswesvaran, & Reiss, 1996) for these analyses. Consistent

with recommendations (Viswesvaran & Ones, 1995), we used the harmonic mean across

cells as the sample size in our analyses. We conducted these analyses by regressing performance ratings onto the five personality factors.We repeated these analyses for each

level of each moderator.

Note that when the predictors included in a regression model are correlated, the

relative contribution of each predictor to the model R2 cannot be accurately determined

by examining the beta weights alone (LeBreton, Ployhart, & Ladd, 2004). Therefore, we

conducted relative importance analyses to determine the relative contribution of each of

the Big Five factors to the prediction of performance ratings (Tonidandel & LeBreton,

2011). This approach has recently been adopted in meta-analyses to provide a more nuanced perspective of howpredictors in a regressionmodel operate in concert with one

another in influencing the criterion (e.g., Munyon et al., 2014; O’Boyle, Humphrey,

Pollack, Hawver, & Story, 2010). We conducted these analyses using a relative weight

analysis framework (Johnson, 2000) and repeated these analyses for each level of

each moderator examined. Relative weight analysis produces two types of coefficients – relative weights and rescaled relative weights. The former reflects the proportion of

variance in the outcome (i.e., performance ratings) that is attributed to each of the

predictor variables, while the latter reflects the percentage of predicted variance that is accounted for by each predictor variable (calculated by dividing the relative weights by

the model R2; LeBreton, Hargis, Griepentrog, Oswald, & Ployhart, 2007).

Results

The bivariate meta-analytic results are reported in Table 1, the results of our multiple regression analyses are reported in Table 2, and the results of our relative weight analyses

are reported in Table 3. The results of our overall multiple regression analysis indicated

that the Big Five personality factors accounted for 8% of the variance in performance

ratings. Hypothesis 1 predicted a positive relationship between rater agreeableness and

performance ratings. The results of our bivariate analysis indicated a corrected correlation

of q = .25. Furthermore, the results of our multiple regression analysis indicated a statistically significant betaweight of .22. Finally, the results of our relativeweight analysis

indicated that agreeableness accounted for 65.9% of the variance in performance ratings explained by the Big Five personality factors. Cumulatively, these findings support

hypothesis 1.

Hypothesis 2 predicted a positive rater extraversion–performance rating relationship. The results of our bivariate analysis indicated a corrected rater extraversion–performance rating correlation of q = .12. Furthermore, the results of our multiple regression analysis indicated a statistically significant betaweight of .08, and the results of our relative weight

398 Michael B. Harari et al.

Table 1. Meta-analysis of rater personality–performance rating correlations

Moderator k N r q rq % Var CVL CVU CIL CIU

Agreeableness

Overall 12 1,899 .20 .25 .17 25.65 .03 .46 .15 .35

Study setting

Field 6 874 .26 .33 .00 100 .33 .33 .33 .33

Laboratory 6 1,025 .14 .17 .21 17.05 �.09 .44 .00 .34 Appraisal purpose

Administrative 4 378 .28 .36 .00 100 .36 .36 .36 .36

Research/developmental 4 582 .18 .22 .23 15.93 �.08 .52 �.01 .45 Accountability

Low 5 864 .22 .27 .09 50.78 .15 .39 .19 .35

High 5 792 .16 .20 .24 14.65 �.10 .50 �.01 .41 Publication status

Published 9 1,015 .22 .27 .17 30.85 .05 .49 .16 .38

Unpublished 3 884 .17 .22 .15 17.91 .02 .42 .05 .39

Extraversion

Overall Study setting 11 1,109 .10 .12 .11 56.74 �.02 .26 .05 .19 Field 5 503 .08 .10 .08 69.70 �.00 .21 .03 .17 Laboratory 6 606 .11 .14 .12 49.80 �.02 .29 .04 .24

Appraisal purpose

Administrative 4 414 �.02 �.03 .08 72.06 �.13 .07 �.11 .05 Research/developmental 5 566 .17 .21 .00 100 .21 .21 .21 .21

Accountability

Low 6 656 .17 .22 .00 100 .22 .22 .22 .22

High – – – – – – – – – – Publication status

Published 8 771 .05 .06 .08 70.20 �.04 .17 .00 .12 Unpublished 3 338 .20 .25 .00 100 .25 .25 .25 .25

Emotional stability

Overall Study setting 13 1,311 .11 .14 .08 69.83 .03 .24 .10 .18

Field 7 582 .15 .20 .12 56.99 .04 .35 .11 .29

Laboratory 6 729 .07 .09 .00 100 .09 .09 .09 .09

Appraisal purpose

Administrative 5 438 .15 .19 .00 100 .19 .19 .19 .19

Research/developmental 4 357 .22 .28 .00 100 .28 .28 .28 .28

Accountability

Low 6 502 .11 .14 .14 48.58 �.04 .32 .03 .25 High 3 467 .06 .07 .00 100 .07 .07 .07 .07

Publication status

Published 10 850 .16 .20 .08 75.00 .10 .30 .15 .25

Unpublished 3 461 .02 .03 .00 100 .03 .03 .03 .03

Conscientiousness

Overall 18 2,974 .08 .10 .17 22.99 �.12 .32 .02 .18 Study setting

Field 11 1,780 .16 .19 .09 53.36 .08 .31 .14 .24

Laboratory 7 1,194 �.03 �.04 .18 20.87 �.27 .20 �.17 .09 Appraisal purpose

Continued

Does rater personality matter? 399

analysis indicated that extraversion accounted for 12.5% of the explained variance in

performance ratings. Overall, these results supported hypothesis 2.

Hypothesis 3 predicted a positive rater emotional stability–performance rating relationship. The results of our bivariate analysis indicated a corrected correlation of

q = .14. Furthermore, the results of our regression analysis indicated a statistically significant beta weight of .08, while our relative weight analysis indicated that emotional

stability accounted for 14.2% of the explained variance in performance ratings. These

results provided support for hypothesis 3. Hypothesis 4 predicted a negative rater conscientiousness–performance rating

relationship. The results of our bivariate analysis indicated a corrected correlation of

q = .10. Thus, counter to our prediction, the effect of conscientiousness on performance ratings was positive. Therefore, hypothesis 4 was not supported. The results of our

Table 1. (Continued)

Moderator k N r q rq % Var CVL CVU CIL CIU

Administrative 3 354 �.09 �.11 .21 21.76 �.39 .16 �.35 .13 Research/developmental 8 993 .10 .12 .16 32.11 �.08 .33 .01 .23

Accountability

Low 11 1,886 .13 .16 .16 25.46 �.04 .36 .07 .25 High 4 637 �.03 �.04 .20 18.90 �.29 .22 �.24 .16

Publication status

Published 13 1,711 .07 .09 .20 22.69 �.16 .34 �.02 .20 Unpublished 5 1,263 .10 .12 .13 24.41 �.05 .29 .01 .23

Openness

Overall 9 829 �.01 �.01 .00 100 �.01 �.01 �.01 �.01 Publication Status

Published 8 643 �.01 �.01 .00 100 �.01 �.01 �.01 �.01 Unpublished – – – – – – – – – –

Note. k, number of independent correlations;N, total sample size for all studies combined; r, sample size-

weighted average observed correlation; q, sample size-weighted average-corrected correlation; rq, standard deviation of sample size-weighted average-corrected correlation; % Var, percentage of variance

accounted for by statistical artefacts; CV, 80% credibility interval; CI, 95% confidence interval.

Table 2. Results of regression analyses

Overall

Study setting Purpose Accountability

Field Laboratory Admin R/D Low High

Agreeableness .22** .28** .17** .42** .14** .21** .25**

Extraversion .08** .04* .11** �.12** .16** .19** �.18** Emotional stability .08** .11** .06** .20** .22** .04 .08

Conscientiousness .02 .08** �.11** �.23** .02 .09** �.13** Openness �.06** �.06** �.06** �.08** �.09** �.07** �.03 R2 .08 .14 .06 .22 .13 .12 .09

Note. Admin, administrative; R/D, research/developmental.

Regression coefficients are in standardized form.

*p < .05; **p < .01.

400 Michael B. Harari et al.

multiple regression analysis indicated a small and non-significant beta weight of .02

(p > .05). Furthermore, our relative weight analysis indicated that conscientiousness accounted for only 5.5% of explained variance in performance ratings. These results

suggest that, once accounting for the effect of the other Big Five factors, conscientious-

ness contributed relatively little variance to performance ratings.

Hypothesis 5 predicted that rater openness was not related to performance ratings.

Our bivariate analysis indicated a corrected correlation of q = �.01. Our analysis also indicated that this relationship was consistent across study populations, as artefacts

accounted for 100% of the variance in observed correlations. That is, our results suggested

that therewere nomoderators of the rater openness–performance rating relationship and that the population correlationwas q = �.01 across rating contexts. Thus, we concluded that the effect of rater openness on performance ratings across contexts is of little

practical value (Cohen, 1992). This notion is bolstered by the results of our relativeweight

analysis, which indicated that openness accounted for only 1.9% of explained variance in

performance ratings. Overall, this pattern of results provided support for hypothesis 5, suggesting that rater openness does not influence performance ratings.

Moderator analyses

We proceeded with moderator analyses for those relationships where our results

suggested that moderators might be present – conscientiousness, agreeableness, extraversion, and emotional stability. The results of the moderator analyses are presented

in Tables 1–3. We review these results in the following sections.

Study setting

Hypothesis 6 predicted that the effect of rater personality on performance ratings would

be stronger in field settings as compared to laboratory settings. The results of our bivariate

analyses indicated that study setting moderated the rater personality–performance rating relationship for agreeableness, conscientiousness, and emotional stability. Furthermore,

and consistent with hypothesis 6, these relationships were stronger in field settings as opposed to laboratory settings. On the other hand, the relationship between rater

extraversion and performance ratings did not vary as a function of study setting.

The results of our regression analyses indicated that the Big Five as a set accounted for 14%

of the variance in performance ratings in field studies, but only 6% of the variance in

Table 3. Results of relative weight analyses

Overall (%)

Study setting (%) Purpose (%) Accountability (%)

Field Laboratory Admin R/D Low High

Agreeableness .05 (65.9) .09 (63.7) .03 (48.8) .14 (62.9) .03 (23.4) .05 (45.3) .05 (55.4)

Extraversion .01 (12.5) .01 (3.9) .02 (27.6) .01 (3.1) .03 (25.1) .04 (33.4) .02 (28.2)

Emotional stability .01 (14.2) .02 (16.9) .01 (10.5) .04 (15.9) .06 (44.6) .01 (6.9) .01 (7.0)

Conscientiousness .00 (5.5) .02 (14.2) .01 (10.3) .04 (16.9) .01 (4.5) .02 (12.8) .01 (8.7)

Openness .00 (1.9) .00 (1.3) .00 (2.8) .00 (1.2) .00 (2.4) .00 (1.7) .00 (0.7)

R2 .08 .14 .06 .22 .13 .12 .09

Note. Admin, administrative; R/D, research/developmental.

Raw relative weights appear in cells with rescaled relative weights appearing in parentheses.

Does rater personality matter? 401

performance ratings in laboratory studies. Overall, these results suggested that features of

a naturalistic rating context (cf. Ferris et al., 2008)might be necessary for trait expression

in a performance appraisal context (Tett & Guterman, 2000) and provided support for

hypothesis 6.

Appraisal purpose

Hypothesis 7 predicted that the effect of rater personality on performance ratings would

be stronger when ratings were collected for research/developmental purposes (i.e., a

weak situation) and attenuated when ratings were collected for administrative purposes

(i.e., a strong situation). The results of our bivariate analyses provided support for this

prediction with regard to conscientiousness, extraversion, and emotional stability, as their relationships with performance ratings were stronger when ratings were collected

for research/developmental purposes as opposed to administrative purposes. On the

other hand, for rater agreeableness, the opposite was observed. That is, rater agreeable-

ness accounted for a greater amount of variance inperformance ratingswhen ratingswere

collected for administrative purposes as opposed to research/developmental purposes.

Also, the results of our multiple regression analyses indicated that personality accounted

for a greater amount of variance in ratings that were collected for administrative purposes

(R2 = .22) as compared to research or developmental purposes (R2 = .13). However, inspection of the relative weights (Table 3) indicated that this effect was predominantly

driven by agreeableness (RW = .14, rescaled-RW = 62.9%). This surprising finding could perhaps be accounted for by study setting. Specifically,

no field studies into the rater agreeableness–performance rating relationship included ratings collected for research/developmental purposes. All studies that assessed

agreeableness–research/developmental performance rating relationships were con- ducted in laboratory settings, while all but one study involving administrative ratings

were conducted in field settings. As discussed earlier with respect to hypothesis 6, the effect of rater personality on performance ratings should be lower in studies conducted in

laboratory as opposed to field settings. Therefore, this study setting–rating purpose confound observedwith respect to agreeableness could have accounted for the pattern of

results observed here. It is possible that for research/developmental ratings collected in

field settings, the effect of rater agreeablenesswould bemore pronounced. Therefore, it is

premature to interpret these effects concerning agreeableness too strongly. When

removing agreeableness fromourmultiple regressionmodel,we found that the remaining

four personality traits accounted for 7% of the variance in administrative ratings as compared to 11% of the variance in research/developmental ratings, which is consistent

with hypothesis 7. Overall, the pattern of results observed here suggested that rater

personality–performance rating relationships were more robust when ratings were collected for research/developmental purposes (i.e., a weak situation) as opposed to

administrative purposes (i.e., a strong situation), supporting hypothesis 7.

Accountability

Hypothesis 8 predicted that the effect of rater personality on performance ratings would

be strengthened when accountability was low (i.e., a weak situation) and attenuated

when accountability was high (i.e., a strong situation). Only one study reported a rater

extraversion–performance rating relationship where accountability was high. Therefore, we could not examine accountability as a moderator of the rater extraversion–

402 Michael B. Harari et al.

performance rating relationship in our bivariate analyses. However, consistent with

existing recommendations (Viswesvaran & Ones, 1995), we included the single estimate

in the correlation matrix used to conduct our multiple regression and relative weight

analyses. In terms of our bivariate analyses, for rater agreeableness, conscientiousness, and

emotional stability, the 95% confidence intervals associated with the correlations at each

level of the moderator (i.e., high vs. low) overlapped. However, in all cases, the point

estimates were lower when accountability was high as opposed to low. Furthermore, for

conscientiousness and agreeableness, we failed to detect non-zero relationships with

performance ratings when accountability was high (95% CI = �.24 to .16 for conscien- tiousness and�.01 to .41 for agreeableness). However, when accountability was low, we observed evidence suggesting that these effects were non-zero (95% CI = .07 to .25 for conscientiousness and .19–.35 for agreeableness). In terms of our multiple regression analyses, the Big Five as a set accounted for 9% of variance in performance ratings when

accountability was high and 12% of variance in performance ratings when accountability

was low. Overall, these results suggested that the effect of rater personality on

performance ratings was stronger when accountability was low and weaker when

accountability was high, providing support for hypothesis 8.

Publication bias

To assess the extent to which our findings may have been influenced by publication bias,

we compared effect size estimates derived from published and unpublished studies. If

correlations derived from published studies were significantly larger than those derived

from unpublished studies, this would provide some evidence suggesting that publication

bias may have influenced our findings. A comparison between sample size-weighted

corrected correlations and their associated 95% confidence intervals suggested limited

evidence for publication bias (see Table 1). In terms of agreeableness, conscientiousness, and extraversion, correlations fromunpublished studieswere not significantly lower than

those derived from published studies. In terms of openness, only one unpublished study

was identified. The correlation reported in this study (r = .00) differed from the sample size-weighted mean correlation derived from published studies (r = �.01) by only .01, and therefore, it does not appear that publication bias has influenced our findings with

respect to openness either. However, in terms of emotional stability, publication status

did moderate the relationship with performance ratings such that the effect was

significantly smaller in unpublished studies as compared to published studies. Therefore, publication bias may have influenced our estimate of the rater emotional stability– performance rating relationship.

Considering this, we conducted an additional test of publication bias for the rater

emotional stability–performance rating relationship using the trim-and-fill method. The trim-and-fill method is regarded as one of the most powerful tests of publication bias

available (Aguinis, Pierce, Bosco, Dalton, & Dalton, 2011). Trim and fill is a funnel plot-

based method that uses an iterative procedure to estimate the degree of asymmetry in the

funnel plot (i.e.,with effect sizes plotted on the x-axis and standard errors plotted on the y- axis) to identify the number and magnitude of missing correlations, impute those missing

values, and recalculate the sample size-weighted mean correlation (Duval & Tweedie,

2000a,b). The influence of publication bias on study findings can be ascertained by

comparing the original sample size-weightedmean correlationwith the one derived once

the missing studies have been imputed.

Does rater personality matter? 403

However, as this test is based on a funnel plot, it must be assumed that the only source

of variance influencing study findings is sampling error (Duval & Tweedie, 2000a,b). Note

that in our overall analyses involving emotional stability, artefacts accounted for nearly

70% of the variance across studies. Therefore, the trim-and-fill procedures may be appropriate in this case. We further assessed the tenability of the trim-and-fill method’s

homogeneity assumption by testing for heterogeneity using the Q statistic, Q

(12) = 20.66, n.s., which failed to detect the presence of heterogeneity. This provided additional support for the appropriateness of the trim-and-fill method as a means of

detecting publication bias. We conducted these analyses using the metafor package

(Viechtbauer, 2010) for R (R Development Core Team, 2010). Our results indicated no

funnel plot asymmetry and no missing studies and provided evidence suggesting that

publication bias did not influence our findings concerning the rater emotional stability– performance rating relationship.2 Overall, the pattern of results observed here indicated

that publication bias has not likely influenced the relationships estimated in this study.

Discussion

Considering the centrality of performance ratings for research and practice in Human Resource Management and Industrial, Work, and Organizational Psychology (Aguinis,

2013), research into the construct validity of performance ratings has been critical (Austin

&Villanova, 1992; Murphy, 2008). Research has largely considered the influence of rating

context features on performance ratings (Ferris et al., 2008; Fletcher, 2001; Levy &

Williams, 2004; Murphy & Cleveland, 1995). We argued that rater personality traits are

relevant determinants of performance ratings as well, and we found support for this

notion in our cumulative meta-analysis of this literature. Across 21 studies and 28

independent samples, we found evidence suggesting that rater personality traits consistent with the Five-Factor Model accounted for between 6% and 22% of the variance

in performance ratings. Throughout ourmanuscript, we argued that themanner inwhich

raters respond to various features of the performance evaluation context may be

determined, at least in part, by their personalities, and that this would have implications

for the construct validity of performance ratings. Our results supported this notion.

The results of our bivariate meta-analytic results as well as our multiple regression

analyses largely supported our predictions. Specifically, and consistent with our

hypotheses, rater agreeableness, extraversion, and emotional stability were all positively related to performance ratings. In terms of conscientiousness, although we predicted a

negative relationship, the results of our bivariate analyses indicated that conscientious-

ness was overall positively related to performance ratings. However, the effect of

conscientiousness in our multiple regression model was small and not statistically

significant (b = .02, n.s.). Furthermore, conscientiousness accounted for only a small portion of the explained variance in performance ratings (RW = .00, rescaled- RW = 5.5%). Thus, while we found evidence for an overall positive rater conscientious- ness–performance rating relationship in isolation, this effect seemed to be trivial when taking the other Big Five factors into account.

Ferris et al. (2008) described performance appraisal as a formal accountability system

that organizations use as a means of holding employees accountable and aligning their

performance with organizational objectives. Given their tendency to follow organiza-

2 The complete results of this analysis are available from the first author.

404 Michael B. Harari et al.

tional rules and procedures (John et al., 2008), raters who are high in conscientiousness

may be sympathetic towards the organization’s need to hold employees accountable and

may understand the role of accurate ratings in this process. Also, as conscientiousness is

indicative of a tendency to approach tasks in a meticulous and well-organized fashion (Saucier & Ostendorf, 1999), raters who are high in conscientiousness may react to the

importance of accurate performance ratings for organizational success by approaching

the rating task with great care and consideration, resulting in accurate performance

ratings. This line of reasoning could potentially account for our findings concerning

conscientiousness.

In terms of openness, we predicted a nil relationship with performance ratings. Our

results supported this prediction. The effect of openness on performance ratings was

small and not practically significant, as we estimated a q of �.01 across nine independent samples and a total sample size of N = 829. Furthermore, moderators did not act upon the rater openness–performance rating relationship. This finding was consistent with our argument that raters who are high in openness may respond to the

performance rating context by deeply considering a wide range of performance

episodes and the appropriate inferences that could be drawn about the employee’s

performance based on them.

The results of our moderator analyses were also generally consistent with our

predictions. We argued that rater personality likely played a role in performance ratings because personality would, in part, determine the rater’s characteristic responses to

features of the rating context, which would ultimately influence performance rating

behaviours (Roberts, 2009; Tett & Guterman, 2000). Based on this notion, we predicted

that the effect of rater personality on performance ratings should be stronger in field

studies as compared to laboratory studies. Our results supported this prediction. An

important implication is that while it may be possible to study various performance

appraisal processes in laboratory settings (Murphy, Herr, Lockhart, & Maguire, 1986), it

appears that to best understand the influence of rater personality on performance ratings, field studies should be preferred. However, it is also important to note that as laboratory

studies are more likely to make use of inexperienced raters than field studies, variation in

findings between laboratory and field studies could potentially be due to variation in

experience as opposed to variation in the presence of rating context features alone.

Future research may wish to examine rating experience as a moderator of the

relationships examined here.

We also noted that situational strength was an important consideration, as strong

situations act to mute the influence of personality traits on behaviour (Cooper &Withey, 2009; Mischel, 1973, 1977). We considered two characteristics of the rating context that

speak to situational strength – appraisal purpose and accountability (Levy & Williams, 2004). In terms of appraisal purpose, we proposed that the effect of rater personality on

performance ratings should be strengthened when ratings were collected for research/

developmental purposes (i.e., a weak situation) and attenuated when ratings were

collected for administrative purposes (i.e., a strong situation). Our findings were largely

consistent with our hypothesis, but our results concerning agreeableness were

contradictory. As discussed earlier in the Results section, it is possible that the contradictory results

were due to a study setting–appraisal purpose confoundwith respect to studies assessing agreeableness. However, it is also possible that our finding concerning appraisal purpose

as a moderator of the rater agreeableness–performance rating relationship was meaning- ful. For instance, the traits encompassed by agreeableness includealtruism, a tendency to

Does rater personality matter? 405

be concerned for the welfare of others (Costa & McCrae, 1992). As administrative ratings

have far-reaching implications for employees (e.g., pay and promotion decisions), raters

who are high in agreeableness may be especially likely to inflate administrative ratings to

protect employees from the negative effects of poor administrative ratings. From this perspective, rather than speaking exclusively to situational strength, conducting

administrative ratings may provide opportunities for agreeable raters to behave in trait-

relevant ways. Unfortunately, our study could not tease apart these two competing

explanations, and future research into this issue is needed.

In terms of accountability, we proposed that the effect of rater personality would be

strengthened when accountability was low (i.e., a weak situation) and attenuated when

accountability was high (i.e., a strong situation). Our results supported this prediction.

Therefore, overall, our findings did suggest that the effect of rater personality on performance ratings was attenuated in strong situations and strengthened in weak

situations.

Implications

Research into the construct validity of performance ratings has largely focused on the

influence of the rating context on performance rating processes and outcomes (Ferris

et al., 2008; Fletcher, 2001; Levy & Williams, 2004). This approach has been useful and has led to a number of important insights. However, we argued that rater personality traits

might also play a role as determinants of performance ratings, and we found evidence

supporting this assertion. Research has shifted focus to the role of the social context in

ratings because of a number of reviews that had recognized that to understand

performance appraisal, it had to be studied in the context inwhich it occurred (e.g., Bretz,

Milkovich, & Read, 1992; Ilgen et al., 1993). Similarly, as our results suggested that rater

personality influences performance evaluation in applied contexts, consideration of rater

personality is also criticalwhen attempting to study performance evaluation as it occurs in organizations (as all raters have some standing on the Big Five personality factors and this

has implications for their reactions to features of the rating context and their rating

behaviours).

Our findings are also relevant for performance appraisal practices. Specifically, once

sources of extraneous variance in performance ratings have been identified, the construct

validity of ratings can be improved by identifying conditions that reduce their influence

(Murphy, 2008). We not only identified rater personality as a source of extraneous

variance, but also examinedmoderators that influenced these relationships that may bear relevance for performance appraisal practices. Of most practical value, the relationships

between rater personality and performance ratings were generally lower when raters

were held accountable for their ratings. Agency theory notes that firm owners (i.e.,

principals) and managers (i.e., agents) may at times have conflicting goals, and therefore,

monitoring mechanisms must be in place to ensure that the manager is acting in the best

interest of the firm’s shareholders (Fama, 1980; Fama & Jensen, 1983). The presence of a

monitoring system that allows the principal to be aware of the agent’s ratings and requires

the agent to justify those ratings to the principal is one means by which raters could be held accountable for their performance ratings, reducing the influence of rater traits on

performance rating scores and improving the construct validity of performance ratings.

For example, part of a rater’s performance evaluation (one of the most ubiquitous control

mechanisms; Ferris et al., 2008) could include the rater’s ability to justify the ratings that

he or she has documented for his or her subordinates.

406 Michael B. Harari et al.

We also note that a number of additional performance appraisal practices may help to

mitigate the influence of rater personality traits on performance ratings, although not

explicitly examined here. Research has argued that involving multiple raters in the

performance appraisal process (e.g., through 360-degree performance rating systems; Smither et al., 2005) can improve the psychometric properties of performance ratings

(Viswesvaran, Ones, & Schmidt, 2008). Such an approach may also be useful in

accounting for the effects of rater personality on performance ratings, as pooling ratings

across raters would limit the effect of any single rater’s personality traits on composite

performance ratings.

Rating scale format may also influence the strength of the rater personality– performance rating relationships. For example, rating formats that do not allow raters

to inflate the ratings given to all employees, such as forced ranking distributions, may be more robust to the effect of rater personality. As an example, raters who are high in

agreeablenessmaybemotivated to inflate performance rating scores across ratees to avoid

disturbing the quality of their dyadic work relationships with their ratees (Jensen-

Campbell & Graziano, 2001; Shore & Tashchian, 2002). However, a forced ranking

distribution systemwould require the rater to discriminate between employees; thus, the

effect of rater agreeableness in this example may be reduced.

This study has implications for the personality literature as well. A major initiative in

thepersonality literature has been identifying the situations underwhichpersonality traits influence behaviour (Blickle et al., 2008; Hogan, 1991; Mischel & Shoda, 1995). Research

along these lines has long since recognized the role of situational strength (Mischel, 1973),

and many empirical investigations into the effect of situation strength on trait–behaviour relationships exist (cf. Cooper & Withey, 2009). However, the trait activation approach

has emergedmore recently.While evidencehas been steadilymounting for the role of trait

activation in trait–behaviour relationships (Tett & Guterman, 2000), more research along these lines is needed. Our study provided continued support for the trait activation

framework. Specifically, our study setting moderator analysis was consistent with trait activation, as the effect of rater personality on performance ratings was stronger in field

settings (where situation trait relevance was expected to be stronger) and weaker in

laboratory settings (where situation trait relevance was expected to be weaker).

Limitations and future directions

Several of our moderator analyses were based off of small sample sizes, with some

including as few as three independent correlations. Note that small sample sizes in meta- analyses are not problematic for interpreting the sample size-weighted mean correlation.

Indeed, pooling findings across even a small number of studies results in a more stable

estimate of the effect size as compared to interpreting the findings of any single study in

isolation (Valentine et al., 2010). However, small sample sizes inmeta-analysis are likely to

result in biased estimates of the standard deviation of correlations due to second-order

sampling error (Hunter & Schmidt, 2004). As a result, we cannot make any strong

conclusions regarding the stability (or lack thereof) of effect sizes observed in our

moderator analyses across samples. Nonetheless, the point estimates observed in our moderator analyses are readily interpretable and provide useful information to perfor-

mance appraisal scholars.

While our review includedmany studies into the rater personality–performance rating relationships, many relevant moderators were not addressed in the available sample of

studies. For instance, we considered situational strength as a moderator of rater

Does rater personality matter? 407

personality–performance rating relationships, operationalized as appraisal purpose and accountability. However, treatments or interventions can also be considered strong to the

extent that they ‘lead all persons to construe the particular events the same way, induce

uniform expectancies regarding the most appropriate response pattern, provide adequate incentives for the performance of that response pattern, and instill the skills

necessary for its satisfactory construction and execution’ (Mischel, 1973, p. 276). Thus,

interventions such as rater training may be considered strong and therefore may also be

likely to reduce rater personality–performance rating relationships. However, few studies included any form of rater training, and thus, we could not examine this as a moderator.

Future research should consider how rater training and other relevant interventionsmight

reduce the influence of rater personality traits on performance ratings.

A related issue is that in some cases, levels of the moderators examined were not represented in our analyses. For example, we identified only one study into the rater

extraversion–performance rating relationship where accountability was high. We also faced a study setting–appraisal purpose confound with respect to our analyses involving agreeableness, and therefore, the results of the appraisal purpose moderator analysis

pertaining to agreeableness are not readily interpretable. Thus, while the present review

summarized the extant literature, it also suggested the need for a number of futureprimary

studies. For instance, it would be useful for future research to explicitly test appraisal

purpose as a moderator of the rater personality–performance rating relationships to clarify the effects observed here.

We focused exclusively on the Big Five factors as opposed to their facets. While a

number of studies assessed narrow personality facets, very few assessed the same facets,

and therefore, we could not cumulate results across studies into the same narrow trait.

However, we believe that narrow personality facets likely play a role in driving the rater’s

reaction to the rating context and in shaping rating behaviour. For instance, in terms of

extraversion, the sociability facet may drive high-performance ratings as raters who are

high in sociability may be likely to form favourable work relationships with their ratees (John et al., 2008), which research suggests can positively influence performance ratings

(Duarte et al., 1994). However, the dominance facet bears relevance to different

contextual features. For instance, raters who are high in dominance may react to political

context features of performance ratings (Ferris et al., 2008) and may use performance

ratings as an opportunity to punish ratees for insubordinate behaviours (Longenecker,

Sims, & Gioia, 1987; Murphy & Cleveland, 1995), resulting in negative rater dominance– performance rating relationships under some circumstances. This line of reasoning

suggests that the effect of rater personality on performance ratings may bemore nuanced and complex when dealing with narrow personality traits (as opposed to the broader

personality factorswhichmay influencebehaviour throughbroadermechanisms; Barrick,

Stewart, & Piotrowski, 2002).

Similarly, not all traits are well represented within the Five-Factor Model and research

is needed to consider traits that fall outside of this framework yet bear theoretical

relevance when considered in relation to the rating context. For instance, Ferris et al.

(2008) noted that rater emotional intelligence and political skill may have implications for

performance rating behaviours, and we agree with this notion. Furthermore, several studies have indicated that task-specific individual differences, such as rating self-efficacy

(Bernardin & Villanova, 2005) and performance appraisal discomfort (Villanova,

Bernardin, Dahmus, & Sims, 1993), can influence performance rating behaviour. We

urge future research to continue investigating the role of these traits as predictors of

performance ratings.

408 Michael B. Harari et al.

While our focus was on the effect of rater personality traits on performance ratings, it

would be useful for research to consider the effect of rater–ratee personality similarity on performance ratings aswell. Rater–ratee personality similarity could potentially influence performance ratings through its effect on psychological distance (e.g., perceptions of similarity; Ferris et al., 2008). Furthermore, Bauer and Green (1996) found support for a

model whereby leader–follower trait affectivity congruence influenced the leader’s perceptions of follower performance through affect and perceptions of trustworthiness.

However, in a study that drew on the Five-Factor Model of personality, Strauss et al.

(2001) found that only perceived personality similarity as opposed to actual personality

similarity had a significant effect on performance ratings. Overall, only a handful of studies

have investigated the role of rater–ratee personality similarity onperformance ratings, and more research along these lines is needed.

Conclusion

While a sizeable literature is dedicated to examining the effect of rater personality on

performance ratings, few definitive answers existed, as there were many mixed findings.

Weusedmeta-analyticmethods to clarify this literature, finding evidence for generalizable

relationships between several of the Big Five personality factors and performance ratings. Consistent with the trait activation and situational strength literatures, we identified that

the relationships were more robust when (1) studies were conducted in field settings as

opposed to laboratory settings, (2) ratings were collected for research/developmental

purposes as opposed to administrative purposes, and (3) accountability was low as

opposed to high. Cumulatively, based on our comprehensive meta-analytic review of this

literature, we conclude that when it comes to accounting for variance in performance

ratings, rater personality does indeed matter.

References

References marked with an asterisk (*) were included in the meta-analysis. Aguinis, H. (2013). Performancemanagement. (3rd ed.), Upper Saddle River, NJ: Pearson Prentice

Hall.

Aguinis, H., Pierce, C. A., Bosco, F. A., Dalton, D. R., & Dalton, C. M. (2011). Debunking myths and

urban legends about meta-analysis. Organizational Research Methods, 14, 306–331. doi:10. 1177/1094428110375720

Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917–1992. Journal of Applied Psychology, 77, 836–874. doi:10.1037/0021-9010.77.6.836

Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A

meta-analysis. Personnel Psychology, 44, 1–26. doi:10.1111/j.1744-6570.1991.tb00688.x Barrick, M. R., Stewart, G. L., & Piotrowski, M. (2002). Personality and job performance: Test of the

mediating effects of motivation among sales representatives. Journal of Applied Psychology, 87

(1), 43–51. doi:10.1037/0021-9010.87.1.43 Bauer, T. N., & Green, S. G. (1996). Development of leader-member exchange: A longitudinal test.

Academy of Management Journal, 39, 1538–1567. doi:10.2307/257068 Bell, S. T. (2007). Deep-level composition variables as predictors of team performance: A meta-

analysis. Journal of Applied Psychology, 92, 595–615. doi:10.1037/0021-9010.92.3.595 *Bernardin, H. J., Cooke, D. K., & Villanova, P. (2000). Conscientiousness and agreeableness as

predictors of rating leniency. Journal of Applied Psychology, 85, 232–234. doi:10.1037/0021- 9010.85.2.232

Does rater personality matter? 409

*Bernardin, H. J., & Orban, J. A. (1990). Leniency effect as a function of rating format, purpose for

appraisal, and rater individual differences. Journal of Business and Psychology, 5, 197–211. doi:10.1007/BF01014332

*Bernardin, H. J., Tyler, C. L., & Villanova, P. (2009). Rating level and accuracy as a function of rater

personality. International Journal of Selection and Assessment, 17, 300–310. doi:10.1111/j. 1468-2389.2009.00472.x

Bernardin, H. J., & Villanova, P. (2005). Research streams in rater self-efficacy. Group and

Organization Management, 30(1), 61–88. doi:10.1177/1059601104267675 Blickle, G., Meurs, J. A., Zettler, I., Solga, J., Noethen, D., Kramer, J., & Ferris, G. R. (2008).

Personality, political skill, and job performance. Journal of Vocational Behavior, 72, 377–387. doi:10.1016/j.jvb.2007.11.008

*Bravo, I. M. (1994). Rater’s personality as a moderator of context effects in performance

appraisals (Unpublished master’s thesis). Florida International University, Miami, FL.

Bretz, R.D.,Milkovich,G. T.,&Read,W. (1992). The current state of performance appraisal research

and practice: Concerns, directions, and implications. Journal of Management, 18, 321–352. doi:10.1177/014920639201800206

Brookings, J. B., Zembar, M. J., &Hochstetler, G. M. (2003). An interpersonal circumplex/five-factor

analysis of the Rejection Sensitivity Questionnaire. Personality and Individual Differences, 34,

449–461. doi:10.1016/S0191-8869(02)00065-X *Chambers, B. A. (2003). Leniency bias in the performance appraisal process: A person 9

situation perspective (Doctoral dissertation). Available from ProQuest Dissertations and Theses

database (UMI No. 3092122).

Cleveland, J. N., Murphy, K. R., & Williams, R. E. (1989). Multiple uses of performance appraisal:

Prevalence and correlates. Journal of Applied Psychology, 74(1), 130–135. doi:10.1037/0021- 9010.74.1.130

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. doi:10.1037/0033- 2909.112.1.155

Cooper, W. H., & Withey, M. J. (2009). The strong situation hypothesis. Personality and Social

Psychology Review, 13, 62–72. doi:10.1177/1088868308329378 Costa, P. T. Jr, & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO-PI-R) and NEO

Five-Factor (NEO-FFI) Inventory professional manual. Odessa, FL: Psychological Assessment

Resources.

DeYoung, C. G., Peterson, J. B., & Higgins, D. M. (2005). Sources of openness/intellect: Cognitive

and neuropsychological correlates of the fifth factor of personality. Journal of Personality, 73,

825–858. doi:10.1111/j.1467-6494.2005.00330.x Digman, J. M. (1990). Personality structure: Emergence of the five-factor model. Annual Review of

Psychology, 41, 417–440. doi:10.1146/annurev.ps.41.020190.002221 Duarte, N. T., Goodson, J. R., & Klich, N. R. (1994). Effects of dyadic quality and duration

on performance appraisal. Academy of Management Journal, 37, 499–521. doi:10.2307/ 256698

Dudley, N. M., Orvis, K. A., Lebiecki, J. E., & Cortina, J. M. (2006). A meta-analytic investigation of

conscientiousness in theprediction of job performance: Examining the intercorrelations and the

incremental validity of narrow traits. Journal of Applied Psychology,91(1), 40–57. doi:10.1037/ 0021-9010.91.1.40

Duval, S. J., & Tweedie, R. L. (2000a). A nonparametric “trim and fill” method for accounting for

publication bias in meta-analysis. Journal of American Statistical Association, 95, 89–98. doi:10.1080/01621459.2000.10473905

Duval, S. J., & Tweedie, R. L. (2000b). Trim and fill: A simple funnel plot-basedmethod of testing and

adjusting for publication bias in meta-analysis. Biometrics, 56, 455–463. doi:10.1111/j.0006- 341X.2000.00455.x

Eby, L. T., Allen, T. D., Evans, S. C., Ng, T., & DuBois, D. (2008). Does mentoring matter? A

multidisciplinary meta-analysis comparing mentored and non-mentored individuals. Journal of

Vocational Behavior, 72, 254–267. doi:10.1016/j.jvb.2007.04.005

410 Michael B. Harari et al.

Fama, E. F. (1980). Agency problems and the theory of the firm. The Journal of Political Economy,

88, 288–307. Fama, E. F., & Jensen, M. (1983). Separation of ownership and control. Journal of Law and

Economics, 26, 301–325. Feldman, J. M. (1981). Beyond attribution theory: Cognitive processes in performance appraisal.

Journal of Applied Psychology, 66, 127–148. doi:10.1037/0021-9010.66.2.127 Ferris, G. R., Munyon, T. P., Basik, K., & Buckley, M. R. (2008). The performance evaluation context:

Social, emotional, cognitive, political, and relationship components. Human Resource

Management Review, 18, 146–163. doi:10.1016/j.hrmr.2008.07.006 Fletcher, C. (2001). Performance appraisal and management: The developing research agenda.

Journal of Occupational and Organizational Psychology, 74, 473–487. doi:10.1348/ 096317901167488

Foldes, H. J., Duehr, E. E., & Ones, D. S. (2008). Group differences in personality: Meta-analyses

comparing five U.S. racial groups. Personnel Psychology, 61, 579–616. doi:10.1111/j.1744- 6570.2008.00123.x

Forret, M. L., & Dougherty, T. W. (2001). Correlates of networking behavior for managerial and

professional employees. Group and Organization Management, 26, 283–311. doi:10.1177/ 1059601101263004

*Fried, Y., Levi, A. S., Ben-David, H. A., Tiegs, R. B., & Avital, N. (2000). Rater positive and negative

mood predisposition as predictors of performance ratings of ratees in simulated and real

organizational settings: Evidence from US and Israeli samples. Journal of Occupational and

Organizational Psychology, 73, 373–378. doi:10.1348/096317900167083 George, J. M. (1990). Personality, affect, and behavior in groups. Journal of Applied Psychology, 75,

107–116. doi:10.1037/0021-9010.75.2.107 Goldberg, L. R. (1992). The development of markers for the big-five factor structure. Psychological

Assessment, 4, 26–42. doi:10.1037/1040-3590.4.1.26 *Griffin, B. N. (2010). Interviewer bias in medical school selection. Medical Journal of Australia,

193, 343–346. *Guven, L. (2007). The effect of positive core self and external evaluations on performance

appraisal (Unpublished master’s thesis). Middle East Technical University, Ankara, Turkey.

denHartog, D. N., Boselie, P., & Paauwe, J. (2004). Performancemanagement: Amodel and research

agenda. Applied Psychology: An International Review, 53, 556–569. doi:10.1111/j.1464-0597. 2004.00188.x

Hogan, R. (1991). Personality and personality assessment. In M. D. Dunnette & L. Hough (Eds.),

Handbook of industrial and organizational psychology (pp. 873–919). Chicago, IL: Rand McNally.

Hough, L. M., & Ones, D. S. (2001). The structure, measurement, validity, and use of personality

variables in industrial, work, and organizational psychology. In N. R. Anderson, D. S. Ones, H. K.

Sinangil & C. Viswesvaran (Eds.), International handbook of work and organizational

psychology (pp. 233–377). Thousand Oaks, CA: Sage. Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in

research findings. (2nd ed.). Newbury Park, CA: Sage.

Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited.

Journal of Applied Psychology, 85, 869–879. doi:10.1037/0021-9010.85.6.869 Ilgen, D. R., Barnes-Farrell, J. L., &McKellin, D. B. (1993). Performance appraisal process research in

the 1980s: What has it contributed to appraisals in use?Organizational Behavior and Human

Decision Processes, 54, 321–368. doi:10.1006/obhd.1993.1015 Jawahar, I. M., &Williams, C. R. (1997). Where all the children are above average: The performance

appraisal purpose effect. Personnel Psychology, 50, 905–925. doi:10.1111/j.1744-6570.1997. tb01487.x

Jensen-Campbell, L. A., & Graziano, W. G. (2001). Agreeableness as a moderator of interpersonal

conflict. Journal of Personality, 69, 323–361. doi:10.1111/1467-6494.00148

Does rater personality matter? 411

John, O. P., Naumann, L. P., & Soto, C. J. (2008). Paradigm shift to the integrative big-five trait

taxonomy: History, measurement, and conceptual issues. In O. P. John, R. W. Robins & L. A.

Pervin (Eds.), Handbook of personality: Theory and research (pp. 114–158). New York, NY: Guilford Press.

Johnson, J.W. (2000). A heuristic method for estimating the relativeweight of predictor variables in

multiple regression. Multivariate Behavioral Research, 35, 1–19. doi:10.1207/S15327906 MBR3501_1

Judge, T. A., & Ilies, R. (2002). Relationship of personality to performance motivation: A meta-

analytic review. Journal of Applied Psychology, 87, 797–807. doi:10.1037/0021-9010.87.4.797 Kane, J. S., Bernardin, H. J., Villanova, P., & Peyrefitte, J. (1995). Stability of rater leniency: Three

studies. Academy of Management Journal, 38, 1036–1051. doi:10.2307/256619 Klimoski, R., & Inks, L. (1990). Accountability forces in performance appraisal. Organizational

Behavior and Human Decision Processes, 45, 194–208. doi:10.1016/0749-5978(90)90011-W *Kovacs, R., &Kapel, D. E. (1976). Personality correlates of faculty and course evaluations.Research

in Higher Education, 5, 335–344. doi:10.1007/BF00993432 *Kuang, D. C. Y. (2004).Assessing performance: Investigation of the influence of item context and

rater personality (Doctoral dissertation). Available from ProQuest Dissertations and Theses

database (UMI No. 3150616).

Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87(1), 72–107. doi:10. 1037/0033-2909.87.1.72

LeBreton, J. M., Hargis, M. B., Griepentrog, B., Oswald, F. L., & Ployhart, R. E. (2007). A

multidimensional approach for evaluating variables in organizational research and practice.

Personnel Psychology, 60, 475–498. doi:10.1111/j.1744-6570.2007.00080.x LeBreton, J. M., Ployhart, R. E., & Ladd, R. T. (2004). A Monte Carlo comparison of relative

importance methodologies. Organizational Research Methods, 7, 258–282. doi:10.1177/ 1094428104266017

Levy, P. E., & Williams, J. R. (2004). The social context of performance appraisal: A review

and framework for the future. Journal ofManagement,30, 881–905. doi:10.1016/j.jm.2004.06. 005

Longenecker, C. O., Sims, H. P., & Gioia, D. A. (1987). Behind the mask: The politics of employee

appraisal. The Academy of Management Executive, 1, 183–193. doi:10.5465/AME.1987. 4275731

Mero, N. P., & Motowidlo, S. J. (1995). Effects of rater accountability on the accuracy and the

favorability of performance ratings. Journal of Applied Psychology, 80, 517–524. doi:10.1037/ 0021-9010.80.4.517

Mischel, W. (1973). Toward a cognitive social learning reconceptualization of personality.

Psychological Review, 80, 252–283. doi:10.1037/h0035002 Mischel, W. (1977). The interaction of person and situation. In D. Magnusson & N. S. Endler (Eds.),

Personality at the crossroads: Current issues in interactional psychology (pp. 333–352). Hillsdale, NJ: Lawrence Erlbaum.

Mischel, W., & Shoda, Y. (1995). A cognitive-affective system theory of personality:

Reconceptualizing situations, dispositions, dynamics, and invariance in personality structure.

Psychological Review, 102, 246–268. doi:10.1037/0033-295X.102.2.246 Munyon, T. P., Summers, J. K., Thompson, K. M., & Ferris, G. R. (2014). Political skill and work

outcomes: A theoretical extension, meta-analytic investigation, and agenda for the future.

Personnel Psychology, 68(1), 143–184. doi:10.1111/peps.12066 Murphy, K. R. (2008). Explaining theweak relationship between job performance and ratings of job

performance. Industrial and Organizational Psychology, 1, 148–160. doi:10.1111/j.1754- 9434.2008.00030.x

Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social,

organizational, and goal-based perspectives. Thousand Oaks, CA: Sage.

Murphy, K. R., Herr, B. M., Lockhart, M. C., & Maguire, E. (1986). Evaluating the performance

of paper people. Journal of Applied Psychology,71, 654–661. doi:10.1037/0021-9010.71.4.654

412 Michael B. Harari et al.

*Naffziger, D. W. (1983). Personality and biodata correlates of rater effectiveness (Doctoral

dissertation). Available from ProQuest Dissertations and Theses database (UMI No. 8400923).

O’Boyle, E. H. Jr, Humphrey, R. H., Pollack, J. M., Hawver, T. H., & Story, P. A. (2010). The relation

between emotional intelligence and job performance: A meta-analysis. Journal of

Organizational Behavior, 32, 788–818. doi:10.1002/job.714 Ones, D. S. (1993). The construct validity of integrity tests (Doctoral dissertation). Available from

ProQuest Dissertations and Theses database (UMI No. 9404524).

Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. A. (2007). In support of personality assessment

in organizational settings.Personnel Psychology, 60, 995–1027. doi:10.1111/j.1744-6570.2007. 00099.x

Ones, D. S., Viswesvaran, C., &Reiss, A. D. (1996). Role of social desirability in personality testing for

personnel selection: The redherring. Journal ofAppliedPsychology,81, 660–679. doi:10.1037/ 0021-9010.81.6.660

*Phillips, B. N. (1960). Authoritarian, hostile, and anxious students’ ratings of an instructor.

California Journal of Educational Research, 11(1), 19–23. R Development Core Team (2010). R: A language and environment for statistical computing.

Vienna, Austria: R Foundation for Statistical Computing.

*Randall, R., & Sharples, D. (2012). The impact of rater agreeableness and rating context on the

evaluation of poor performance. Journal of Occupational andOrganizational Psychology,85,

42–59. doi:10.1348/2044-8325.002002 Roberts, B.W. (2009). Back to the future: Personality and assessment and personality development.

Journal of Research in Personality, 43, 137–145. doi:10.1016/j.jrp.2008.12.015 Roberts, B. W., & DelVecchio, W. F. (2000). The rank-order consistency of personality traits from

childhood to old age: A quantitative review of longitudinal studies. Psychological Bulletin, 126

(1), 3–25. doi:10.1037/0033-2909.126.1.3 Roberts, B. W., & Jackson, J. J. (2008). Sociogenomic personality psychology. Journal of

Personality, 76, 1523–1544. doi:10.1111/j.1467-6494.2008.00530.x Roberts, B. W., Lejuez, C., Krueger, R. F., Richards, J. M., & Hill, P. L. (2014). What is

conscientiousness and how can it be assessed? Developmental Psychology, 50(5), 1315–1330. doi:10.1037/a0031109

*Roch, S. G., Ayman, R., Newhouse, N., & Harris, M. (2005). Effect of identifiability, rating audience,

and conscientiousness on rating level. International Journal of Selection and Assessment, 13

(1), 53–62. doi:10.1111/j.0965-075X.2005.00299.x Saucier, G., & Ostendorf, F. (1999). Hierarchical subcomponents of the Big Five personality factors:

A cross-language replication. Journal of Personality and Social Psychology, 76, 613–627. doi:10.1037/0022-3514.76.4.613

*Schneider, C. E. (1977). Operational utility and psychometric characteristics of behavioral

expectations scales: A cognitive reinterpretation. Journal of Applied Psychology, 62, 541–548. doi:10.1037/0021-9010.62.5.541

Shore, T. H., & Tashchian, A. (2002). Accountability forces in performance appraisal: Effects of self-

appraisal information, normative information, and task performance. Journal of Business and

Psychology, 17, 261–274. doi:10.1023/A:1019689616654 Smither, J. W., London, M., & Reilly, R. R. (2005). Does performance improvement follow

multisource feedback? A theoretical model, meta-analysis, and review of empirical findings.

Personnel Psychology, 58, 33–66. doi:10.1111/j.1744-6570.2005.514_1.x *Stephens, O. D. (2001). Capturing the policy that air force raters use when writing performance

appraisals on junior officers (Unpublished master’s thesis). Air Force Institute of Technology,

Wright-Patterson Air Force Base, OH.

Stewart, G. G., & Barrick, M. R. (2004). Four lessons learned from the person-situation debate: A

review and research agenda. In B. Schneider & D. B. Smith (Eds.), Personality and

organizations (pp. 61–85). Mahwah, NJ: Erlbaum. *Strauss, J. P., Barrick, M. R., & Connerley, M. L. (2001). An investigation of personality similarity

effects (relational and perceived) on peer and supervisor ratings and the role of familiarity and

Does rater personality matter? 413

liking. Journal of Occupational and Organizational Psychology, 74, 637–657. doi:10.1348/ 096317901167569

Tett, R. P., &Burnett, D.D. (2003). Apersonality trait-based interactionistmodel of jobperformance.

Journal of Applied Psychology, 88, 500–517. doi:10.1037/0021-9010.88.3.500 Tett, R. P., &Guterman, H. A. (2000). Situation trait relevance, trait expression, and cross-situational

consistency: Testing a principle of trait activation. Journal of Research in Personality, 34, 397– 423. doi:10.1006/jrpe.2000.2292

Tonidandel, S., & LeBreton, J. M. (2011). Relative importance analysis: A useful supplement to

regression analysis. Journal of Business and Psychology, 26, 1–9. doi:10.1007/s10869-010- 9204-3

*Tziner, A., Murphy, K. R., & Cleveland, J. N. (2002). Does conscientiousness moderate the

relationship between attitudes and beliefs regarding performance appraisal and rating behavior?

International Journal of Selection and Assessment, 10, 218–224. doi:10.1111/1468-2389. 00211

*Tziner, A.,Murphy,K. R.,&Cleveland, J. N. (2003). Personalitymoderates the relationship between

context factors and rating behavior. Advances in Psychology Research, 22, 107–120. Tziner, A., Murphy, K. R., & Cleveland, J. N. (2005). Contextual and rater factors affecting

rating behavior. Group and Organization Management, 30, 89–98. doi:10.1177/ 1059601104267920

Valentine, J. C., Pigott, T. D., & Rothstein, H. R. (2010). Howmany studies do you need? A primer on

statistical power for meta-analysis. Journal of Educational and Behavioral Statistics, 35, 215– 247. doi:10.3102/1076998609346961

Viechtbauer, W. (2010). Conducting meta-analysis in R with the metafor package. Journal of

Statistical Software, 36, 1–48. Villanova, P., Bernardin, H. J., Dahmus, S. A., & Sims, R. L. (1993). Rater leniency and performance

appraisal discomfort. Educational and Psychological Measurement, 53, 789–799. doi:10. 1177/0013164493053003023

Viswesvaran, C., & Ones, D. S. (1995). Theory testing: Combining psychometric meta-analysis and

structural equation modeling. Personnel Psychology, 48, 865–885. doi:10.1111/j.1744-6570. 1995.tb01784.x

Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (2008). No new terrain: Reliability and construct

validity of job performance ratings. Industrial and Organizational Psychology, 1, 174–179. doi:10.1111/j.1754-9434.2008.00033.x

Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2002). The moderating influence of job performance

dimensions on convergence of supervisory andpeer ratings of jobperformance: Unconfounding

construct level congruence and rating difficulty. Journal of Applied Psychology, 87, 345–354. doi:10.1037/0021-9010.87.2.345

*Wexley, K. N., & Youtz, M. A. (1985). Rater beliefs about others: Their effects on rating errors and

rater accuracy. Journal of Occupational Psychology, 58, 265–275. doi:10.1111/j.2044-8325. 1985.tb00200.x

Wherry, R. J. (1952). The control of bias in ratings: A theory of ratings. Columbus, OH: The Ohio

State Research.

*Yun, G. J., Donahue, L. M., Dudley, N. M., & McFarland, L. A. (2005). Rater personality, rating

format, and social context: Implications for performance appraisal ratings. International

Journal of Selection and Assessment, 13, 97–107. doi:10.1111/j.0965-075X.2005.00304.x

Received 5 November 2013; revised version received 28 July 2014

414 Michael B. Harari et al.

Copyright of Journal of Occupational & Organizational Psychology is the property of Wiley- Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.