summary
Rating Issues
Arturo Covarrubias-Paniagua
Hello, I’m Arturo and I’m here to present on Rating issues.
1
Overview
Model of Rating Behavior
3 Articles
Carmbon & Steiner (2015)
Harari & Rudolph (2016)
Jelley, Goffin, Powell, & Heneman (2012)
Case Study
How to reduce rating error
Today we will briefly review the model of rating behavior again, 3 articles covering issues in rating, a case study regarding a PM system implemented in Yahoo, and some tips on how to reduce error.
2
Model for Rating Behavior
A model – Rating behaviors are influenced by a motivation to provide accurate ratings and the motivation to distort ratings.
3
Article 1
Incentives and Alternative Rating Approaches: Roads to Greater Accuracy in Job Performance Assessment?
Jelley, Goffin, Powell, & Heneman (2012)
4
Background
Providing an accuracy incentive prior to observing performance can effect raters.
Format of rating (serial vs. parallel) can affect ratings.
5
Types of Rating Accuracy & Serial vs Parallel
Elevation Accuracy (EL) – Raters level-of-rating accuracy
Differential Elevation (DE) – The differential main effects of ratees.
Stereotype Accuracy (SA) – concerns the raters accuracy in differentiating among items
Differential Accuracy (DA) – represents the differential ratee-by-item interaction.
Serial – Encourages raters to consider relative judgments of specific behavior (one subject at a time)
Parallel – Rate all ratees on a given behavior (all subjects at once)
Poor EL reflects a raters tendency to evaluate ratees too high or too low
DE indicates a raters accuracy in differentiating among ratees in terms of their general level of performance
SA – is relevenat to group-level training needs assessment
DA – relevant for individual level identification of patterns of strengths and weaknesses among ratees behaviors
Now, there are also two different ways to go about rating multiple people at once. You can do serial rating, where you basically rate one person on all of their judgements at one time, while parallel has you rate all ratees on a certain dimension at once.
6
Hypothesis
H1: Serial approach will be accurate in all ways (DA, SA, EL, DE) then the parallel approach in the no-incentive condition
H2A: Parallel approach will be more accurate (EL & DE) than the serial approach in the recall-incentive condition.
H2B: Parallel approach will be more accurate in all ways (DA, SA, EL, DE) then the serial approach observation-incentive condition.
7
Methods
147 student participants
Viewed recording of lecturers lectures, rated 48 hours later.
Some students were told that they would receive 20.00$ if they were accurate, either before or after
Others were not told at all of an incentive
8
Results & Implications
Serial rating approach was more accurate than the parallel approach.
Accuracy incentive before the observation mitigated the effects of the parallel approach.
9
Article 2
The effect of rater accountability on performance ratings: A meta-analytic review
Harari & Rudolph (2016)
The second article I’ve chosen covers the effects of rate accountability on performance ratings, by Harari & Rudolp
10
Background & Hypothesis
Rater Accountability
Accountability is a norm in most appraisal systems, but not universal.
Sometimes ratings are not shared with ratees.
The effect of rater accountability mechanisms remains unclear
Firstly, Rater accountability is when another individual holds raters responsible for their performance ratings. People experience accountability when they believe that their performance ratings have implications for their social image with their organization. The article lists an example of accountability as a situation where raters need to let the ratee know the reasons and results of their rating. As such, accountability is typically the norm in most appraisal systems. I think a real life example would be the SOTE’s.
The article however notes that the effects that rater accountability has on ratings is still somewhat unclear, so they undertook a meta-analysis.
11
Methods
Initial search yielded 138 articles.
Manipulation of rater accountability
D-value indexing
Performance ratings as a dependent variable
Analyzed 35 independent samples.
When they began their literature search they originally found 138 articles that met their search criteria. However; they narrowed these articles down by requiring them to have a manipulation of rater accountability, enough information so that the reviewers could calculate d-values, and performance ratings needed to be the dependent variable in the study.
After they narrowed down their results they have 35 original papers to conduct the analyses.
12
Results & Implications
Cohen’s d = .12, very small effect size
When accounting for accountability source (ratee vs. superior) as a moderator, effect sizes changed
Ratee Cohen’s d = .32, small to medium
Supervisor Cohen’s d = -0.06.
Performance ratings were higher when Ratee’s were accountable.
Being held accountable by a ratee results in inflated performance ratings.
Their initial results on the effect of accountability alone on performance ratings was small, indicating that accountability does have a small effect on increasing the scores. However, when they took the accountability sources as a moderator, they found that the effect size for ratees was small to medium, while the effect size for superiors was practically not there at all. Overall, being held accountable by a ratee results in inflated ratings.
13
Article 3
When Rating Format Induces Different Rating Processes
Cambon & Steiner (2015)
14
Background
Rating Format
Descriptive Behavior
Evaluative Behavior
Big Five Personality
Conscientiousness and Extraversion.
“Agency”
Cronbach’s distinct forms of accuracy
DA
SA
EL
SE
The main goal is to examine different rating formats and their interactions with the purposes of rating (administrative vs. developmental), the induced performance rating processes and their effects on the accuracy of the rating.
Now lets take a step back and cover some of these concepts. Firstly the authors state that different rating formats should serve different functions in performance appraisal and in general psychological assessment. In terms of the authors focus on rating scales that use either descriptive behavior, where it focuses on the characteristics of the target. The second format is evaluative behavior because they mainly entail looking at the target based on an evaluation of what one can do with the target, while saying nothing about the character of it.
This leads to their first hypothesis that a rating format based on DBs will produce more within-rate discriminability than a mode based on EB. And a rating mode based on EB’s will produce more between rate discriminability than a mode based on traits.
Secondly, the article focuses on the dimensions of performance. They note earlier researches focus between judgments of communication or agency. Where communion refers to interpersonal behaviors related to socialization and friendship, where agency refers to behaviors related to power and personal growth. In current terms communion matches up with the dimensions of agreeableness and neuroticism, whereas Agency matches up really well with the dimensions of conscientiousness and extraversion in the big five. This leads to their next hypothesis that ratings made on the agency dimension will produce more between-rate discrimibailty than ratings made on the communion dimension.
15
Methods
Two experiments
Participants rated seven targets presented via videotapes
Given descriptive knowledge, evaluative knowledge, or a mix of the two.
Indexes of discriminability and of accuracy.
16
Results & Implications
Results indicated that EB ratings scales had higher between-rate discriminability
DB rating scales had higher within-rate discriminability
Use EB to compare different ratees.
Use DB to identify strengths and weakness of a ratee
17
Case study – Yahoo’s Performance Management System
Mayer introduced a system known as QPRs (Quarterly Performance Reviews). Through this popular performance review technique, managers would set and communicate goals and results to the departments, teams and individual employees. Employees would get a score every quarter from one to five. A one meant the employee was consistently missing their goals, while a five meant that they were greatly exceeding their goals. In essence, the system is similar to stack ranking (as seen at GE & Adobe) The target distribution system or stack ranking, put employees in five buckets. Ten per cent of high performing employees would go into “greatly exceeds,” twenty-five in “exceeds,” fifty percent into “achieves,” ten percent into “occasionally misses,” and five percent into “misses.” Seventy-five% of the entire company got into the top three buckets, while 25 percent of every team had to go into the bottom two. This results in an incredibly competitive work environment, where teammates directly competed with each other to make sure they didn't end up in the bottom 25 percent.
“I was forced to give an employee an occasionally misses, was very uncomfortable with it. Now, I have to have a discussion about it when I have my QPR meetings. I feel so uncomfortable because in order to meet the bell curve, I have to tell the employee that they missed when I truly don’t believe it to be the case. I understand we want to weed out mishires/people not meeting their goals, but this practice is concerning. I don’t want to lose the person mentally. How do we justify?”
Now with this in mind, are some of the big problems and how would you rectify them?
Get rid of forced choise. Implement possible
18
Summary
Serial ratings are generally better than parallel approaches, but accuracy incentives mitigate the issue.
Being held accountable by a ratee results in inflated performance scores
EB scales should be used to compare amongst ratees, DB scales should be used for developmental purposes.
Serial rating approach was more accurate than the parallel approach.
Accuracy incentive before the observation mitigated the effects of the parallel approach.
Serial rating methods are better for developmental purposes.
19
References
Jelley, R., Goffin, R., Powell, D., & Heneman, R. (2012). Incentives and Alternative Rating Approaches: Roads to Greater Accuracy in Job Performance Assessment? Journal of Personnel Psychology, 11(4), 159-168.
Cambon, L., & Steiner, D. (2015). When Rating Format Induces Different Rating Processes: The Effects of Descriptive and Evaluative Rating Modes on Discriminability and Accuracy. Journal of Business and Psychology, 30(4), 795-812.
Harari, & Rudolph. (2017). The effect of rater accountability on performance ratings: A meta-analytic review. Human Resource Management Review, 27(1), 121- 133.
http://allthingsd.com/20131108/because-marissa-said-so-yahoos-bristle-at-mayers-new-qpr-ranking-system-and-silent-layoffs /
https:// blog.impraise.com/360-feedback/is-this-the-end-of-yahoo-and-employee-stack-ranking-performance-review
20