Implementing a Performance Management System
206 Part II System Implementation
7-3 TRAIN ING PROGRAMS FOR MIN IM IZ ING UNINTENTIONAL RATING ERRORS
Training the raters is another necessary s tep to prepare for the rollout of the performance management system. Training not only provides participants in the performance management system with needed skills and tools to do a good job implementing it, but also helps increase satisfaction with the system.9
In Chapter 6, we discussed wh at to do to minimize intentional rating distortion. But unintentional errors also affect the accuracy of ratings. Specifically, before rolling out the performance management system, we should consider implementing rater training programs that address how to identify and rank job activities and how to observe, record, and measure performance.
7-3-1 Rater Error Training Many performance management systems can be plagued with rating errors. In fact, rating errors are usually the reason why so many performance management systems are usually criticized. to Accordingly, the goal of rater error training (RE'T) is to make raters aware of what rating errors they are likel y to make and to help them develop strategies to minimize those errors. In other words, the goal of RET is to increase rating accuracy by making raters aware of the unintentional errors they are likely to make.
RET programs generally include definitions of the most typical errors and a description of possible causes for those errors. Such programs also allow trainees to view examples of common errors and to review suggestions on how to avoid making errors. This can be done by showing video vignettes designed to elicit rating errors and asking trainees to fill out appraisal forms regarding the sit uations they observed on the video clips. Finally, a comparison is made between the ratings provided by the trainees and the correct ratings. The trainer then explains why the errors took place, which specific errors were made, and ways to overcome the errors in future.
RET does not guarantee increased accuracy. Raters do become aware of the possible errors they can make, but precisely because many of the errors are unintentional, simple awareness of the errors does not mean that errors will not be made. Nevertheless, it may be useful to expose raters to the range of possible errors. These errors include the following:
• Similar to me error. Similarity leads to attraction, so we tend to favor those who are similar to us. Consequentl y, in some cases, raters are more likely to give higher performance ratings to those employees who are perceived to be more similar to them in terms of attitudes, preferences, personality, and demographic variables, including race and gender.
• Contrast error. Contrast error occurs when, even if an absolute measurement system is in place, raters compare individuals with one another, instead of against predetermined standards. For example, when a rater rates an individual of only average performance, the rating may actually be higher than deserved if the other individuals rated by the same rater display substandard performance levels: the average performer may seem to be much better in comparison to the others. This error is most likely to occur
Chapter 7 Rolling Ou t the Performance Management System 207
when raters complete multiple appraisal forms at the same time because, in s uch situations, it is difficult to ignore the ratings given to other employees.
• Halo error. Halo error occurs when raters fail to d istinguish between the different aspects of performance being rated. Recall, we described this error in Chapter 6 in the context of peer eval uations. If an employee receives a high score on one d imension, she also receives a high score on all other d imensions, even though performance may not be even across all dimensions. For example, if an employee has a perfect attendance record, then the rater may give her a high mark on dedication and productivity. The perfect attendance record, however, may be caused by the fact that the employee has large loan payments to make and cannot afford to miss work, not because the employee is actually an excellent overall performer. In other words, being present at work is not the same as being a productive employee. This error is typically caused by the rater's assigning performance ratings based on an overall impression about the employee instead of evaluating each performance d imension independently.
• Primacy error. Primacy error occurs when performance evaluation is influenced mainly by information collected during the initial phases of the review period. For example, in rating communication skills, the rater gives more weight to incidents involving communication that took place toward the beginning of the review period, as opposed to incidents taking place at all other times.
• Recency error. Recency error occurs when performance evaluation is influenced mainly by information gathered during the last portion of the review period . This is the opposite of the primacy error: raters are more heavily influenced by behaviors taking place toward the end of the review period, instead of giving equal importance and paying attention to incidents occurring throughout the entire review period .
• Negativil:lj error. Negativity error occurs when raters place more weight on negative information than on positive or neutral information. For example, a rater may have observed one negative interaction between the employee and a customer and several positive interactions in which customers' expectations were surpassed. The rater may focus on the one negative incident in rating the "customer service" dimension. The negativity error explains why most people have a tendency to remember negative rather than positive news that they read online or watch on te levision.
• First impression error. First impression error occurs when raters make an initial fa vorable or unfavorable judgment about an employee, and then, ignore subsequent information that does not support the initial impression. This type of error can be confounded with the "similar to me error" because first impressions are likely to be based on the degree of similarity: the more similar the person is to the rater, the more positive the first impression will be.
• Spillover error. Spillover error occurs when scores from previous review periods unjustly influence current ratings. For example, a rater makes the assumption that an employee who was an excellent performer in the
208 Part II System Imp lementation
previous period ought to be an excellent performer during the current period also, and provides performance ratings consistent with this belief.
• Stereol:l;pe error. Stereotype error occurs when a rater has an oversimplified view of individuals, based on group membership. That is, a rater may have a belief that certain groups of employees (e.g., women) are unassertive in their communication style. In rating women, therefore, he may automatically describe communication as being "unassertive" without actually having any behavioral evidence to support the rating.n This type of error can also lead to biased evaluations of performance when an individual (e.g., woman) violates stereotypical norms by working in an occupation that does not fit the stereotype (e.g., assembly of airplane parts). 12 This type of error can also result in consistently lower performance ratings for members of certain groups. For example, a s tudy including an identical sample of black and white workers found that white raters gave higher ratings to white workers relative to black workers than did black raters. In other words, if a white worker is rated, then it does not really matter whether the rater is black or white; however, if a black worker is rated, the rater's ethnicity matters because this worker is likely to receive a higher rating from a black rater than from a white ratern
• Attribution error. The attribution error takes place when a rater attributes poor performance to an employee's dispositional tendencies (e.g., personality, abilities) instead of features of the situation (e.g., malfunctioning equipment). In other words, different raters may place different relative importance on the environment in which the employee works in making performance evaluations. If raters make incorrect inferences about the employees' dispositions and ignore situational characteristics, actions taken to improve performance may fail because the same situational constraints may still be present (e.g., obsolete equipment).14
As a recap, Table 7-2 includes a summary list of unintentional errors that raters may make in assigning performance ratings. RET exposes raters to the different errors and their causes; however, being aware of unintentional errors does not mean that raters will no longer make these errors. 15 Awareness TABLE 7 -2 is certainly a good first step, Un intentional Errors Likely to Be Made in Providing but we need to go further if we Performance Ratings want to minimize unintentional .. ____________ _ errors. One fruitful possibility is
Contrast the implementation of a frame of reference training. Halo
7·3·2 Frame of Reference Training
Frame of reference (FOR) training helps improve rater accuracy by thoroughly familiarizing raters with the various performance dimensions to be assessed. t 6 The
Primacy
Recency
Negativity
F1rst impression
Spillover
Stereotype
Attribution
Chapter 7 Roll ing Ou t th e PerFormance Managemen t System 209
overall goal is to give raters skills so that they can minimize unintentional errors and provide accurate ratings on each performance dimension by developing a common FOR.
A typical FOR training program includes a discussion of the job description for the individuals being rated and the duties involved . Raters are then familiarized with the performance dimensions to be rated by reviewing the definitions for each dimension and discussing examples of good, average, and poor performance. Raters are then asked to use the appraisal forms to be used in the actual performance management system to rate fictitious employees usually shown in video practice vignettes. The trainees are also asked to write a justification for the ratings. Finally, the trainer informs trainees of the correct ratings for each dimension and the reasons for such ratings and d iscusses d ifferences between the correct ratings and those provided by the trainees. Typically, FOR training programs include the following formal steps17:
1. Raters are told that they will eval uate the performance of three employees on three separate performance dimensions.
2. Raters are given an appraisal form and instructed to read it as the trainer reads aloud the definition for each of the dimensions and the scale anchors.
3. The trainer discusses various employee behaviors that illustrate various performance levels for each rating scale included in the form. The goal is to create a common "performance theory" (frame of reference) among raters so that they will agree on the appropriate performance d imension and effectiveness level for different behaviors.
4. Participants are shown a video clip of a practice vignette, including behaviors related to the performance dimensions being rated, and are asked to evaluate the employee's performance using the scales provided.
5. Ratings provided by each participant are shared with the rest of the group and discussed . The trainer seeks to identify which behaviors participants used to decide on their assigned ratings and to clarify any discrepancies among the ratings.
6. The trainer provides feedback to participants, explaining why the employee should receive a certain rating (target score) on each dimension, and shows discrepancies between the target score and the score given by each trainee.
Consider how the Canadian military uses FOR training. 18 First, the training program includes a session regarding the importance of performance management systems in the military. In the next session, raters are told that they will be evaluating the performance of four direct reports. They are given the appraisal form to be used and information on each of the scales included in the form. As the trainer reads through each of the scales, participants are encouraged to ask questions. At the same time, the trainer gives examples of behaviors associated with each level of performance. The trainer thus makes sure that the trainees come to a common FOR concerning what behaviors constitute the different levels of performance. Participants are shown a video clip of a soldier and are asked to evaluate the performance using the appraisal form explained earlier. Next, the
210 Part II System Impl ementation
ratings are discussed as a group, focusing on the behaviors exhibited in the video clip and the ratings that would be most appropriate in each case. This process is repeated several times. Finally, the participants are given three more samples of behavior to rate, as displayed by three hypothetical soldiers, and they receive feedback on how well they evaluated each sold ier.
It should be evident by now that FOR training can take quite a bit of time and effort to develop and administer, but it is well worth it. Specifically, as a consequence of implementing this type of training, raters not only are more likely to provide consis tent and more accurate ratings, but they are also more likely to help employees design effective development plans. This is because sharing a common view of what constitutes good performance allows super- visors to provide employees with better guidelines to employ to reach s uch performance levels. 19
7-3-3 Beh avioral Observation Training Behavioral observation (BO) training is another type of program implemented to minimize tmintentional rating errors. BO training focuses on how raters observe, store, recall, and use information about performance. Fundamentally, this type of training improves raters' skills a t observing performance.
For example, one type of BO training involves showing raters how to use observational aids such as notes or diaries. These observational aids help raters record a preestablished number of behaviors on each performance d imension. Using these aids helps raters increase the sample of incidents observed and recorded during a specific time period. In addition, an aid such as a diary is an effective way to standardize the observation of behavior and record of critical incidents throughout the review period . In addition, it serves as a memory aid when filling out evaluation forms. Memory aids are beneficial because ratings based on memory alone, without notes or d iaries, are likely to be distorted due to factors of social context (e.g., friendship bias) and time (i.e., duration of supervisor-direct report relationship).20
Consider how BO training is also implemented by the Canadian military. The Canadian military has found that a combination of FOR and BO training works best. Earlier, we described how the Canad ian military uses FOR training. BO training is added to the FOR training program. In addition to FOR training, there are sessions on the importance of BO and common BO errors, including first impression, stereotypes, and halo effects. Finally, the participants are trained in the importance of keeping diaries and taking notes on their direct reports throughout the year. Furthermore, the trainer explains the criteria for each performance d imension and provides written descriptions of the different levels of performance. The participants are given a chance to practice keeping a diary while watching the video clips used in the FOR training section of the training program. After watching each video clip, participants are given tips on note-taking and recording behaviors as well as the resulting outcomes.
In summary, raters are likely to make several types of urtintentional errors when providing performance information. Unintentional errors are the product of the complex tasks of observing, encoding, storing, and retrieving performance information- and resistance to change exacerbates these errors. Through the implementation of three different types of training programs, these errors can be substantially minimized. Training programs focus on describing the errors
Chapter 1 Rolling Out t he Performance Management System 211
that raters usually make (i.e., RET programs). In addition, they sh ould allow raters to generate a common FOR to be used in evaluating performance as well as offer raters tools to improve observation and memory skills and help mitigate the discomfort generated by the interpersonal demands of the performance managem ent process. FOR training is particularly beneficial when performance measurement emphasizes behaviors. On the contrary, BO training is particularly beneficial when performance m easurement emphasizes results because raters learn not only how to observe behaviors, b ut also how these behaviors are linked to results.
Th us far, this chapter has described how to prepare for the launching o f a perform ance management system by designing a communication plan and an appeals p rocess and by d elivering training programs that w ill minimize unintentional rating distortions . Next, we turn to the fina l set of activities required before the performance management system is put into practice: p ilot testing.
7-4 PILOT TESTING Before the performance management system is fully rolled out, it is a good idea to test a version of the entire system so that a djustments and revisions can be made as needed.21 In th e pilot test of the system, evaluations are not record ed in employee files; however, the system is implemented in its entirety from beginning to end, including all the steps that would be included if th e system had actually been implemented . In other words, meetings tak e place between supervisor and employee, performance d ata are gath ered , developmental plans are d esigned , and feedback is provided. The most important aspect of the pilot test is that all participants maintain records, noting any difficulties they encountered, ranging from problems with the appra isal form and how performance is measured to the feed back received. The pilot test allows for th e id entification and early correction of any flaws before the system is implemented throughout the organization.
We should not assume that the performance management system will necessarily be executed or that it will prod uce the anticipated results. The pilot test allows us to gain information from the perspective of users on how well the system works, to learn about any difficulties and unforeseen obstacles, to collect recommend ations on how to improve all aspects of the system, and to understand personal reactions to it. In a d dition, conducting a pilot test is yet another way to achieve early acceptance from a small group who can th en act as champions for the performance management system, rather than putting the burd en on the HR department to sell the idea. A final reason for conducting a pilot test is that users are likely to have a higher system acceptance rate, knowing that stakeholders in th e company had a say in its design, rather than feeling that the system was created by the HR department alone.
In larger organizations, an important decision to b e made is the selection of the group of employees with whom the system will be tested. In choosing this group, we n eed to und erstand that the managers w h o will be participating should be willing to invest th e resources required to do the pilot test. In addition, this group should be mad e up of managers who are flexible and willing to t ry new things. Thus, managers should know what the system will look like and receive a realistic preview before they decide to participate in the pilot test.