PERSONALITY THEORIES : PSYC-322 LEARNING JOURNAL 2
101
How Is Personality Assessed?
In the previous chapters, we talked about the nature of personality, covering broad themes in personality research. This included different theories dealing with its structure, its causes, and its development through-
out the life span. The research reviewed above spans more than a century, and thousands of studies have been carried out to address questions concerning the nature of personality. The large amount of empirical evidence is impressive, and consid- erable progress appears to have been made in this fi eld. While we have laid out empirical fi ndings regarding the discovery of the basic structure of personality, the biological causes of it, and the change and development of it, very little discussion has been devoted to how these discoveries have actually been made. Are we to take these claims for granted? Should we even
3
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 101 8/24/2012 6:36:26 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
102
take them seriously? After all, personality is an abstract and unobservable concept; in essence, one could argue that it is not something that is subject to measurement, or even worse, that it is an illusory construct. So how do we know that we can measure this construct of personality? How do we know that we have? And, how do we know that we have accurately done so?
These are very important and very loaded questions, because they challenge the very foundations of personal- ity psychology and psychometric testing as a whole. What’s more, psychological tests, such as personality inventories, are used extensively in a variety of settings, from selection of applicants to university and organizational positions to psychiatric diagnosis. Accordingly, people’s careers and even lives can be changed (and this is not an overstatement) as a result of psychometric test results. Given this, it is clearly an imperative task of psychologists in this fi eld to provide satisfactory answers to such issues. It is their job to estab- lish that personality is measurable, to establish that they have measured personality, and importantly, to establish that they have accurately done so. So, how can psychologists convinc- ingly address these concerns?
To establish credibility, personality research must fi rst demonstrate scientifi cally acceptable methodology. The scien- tifi c method is recognized to be the best method for obtain- ing valuable knowledge about any phenomenon (Kline, 1988). Accordingly, researchers have to establish some basic criteria, or standards, which personality measurement needs to sat- isfy for its claims to be considered scientifi cally acceptable. Given that personality psychology as a discipline has scientifi c objectives, several steps have been taken by psychologists to ensure that these standards are indeed satisfi ed. These include both theoretical and statistical steps, and we will discuss these throughout this chapter; however, before we embark on that, we want to ask the fi rst basic question: Can personality actu- ally be measured?
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 102 8/24/2012 6:36:27 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
103
CAN PERSONALITY BE MEASURED
AND HOW?
It is not uncommon, when we talk to people outside academia about our research and the methods of personality assessment, to be confronted with a bewildered look. Even if some don’t disclose it, there generally seems to be a question mark about the way this is done, or the accuracy or validity of this research. It seems to many an unfeasible prospect. How could one pos- sibly “measure” a person’s personality? People are complex, dynamic, and one could say, chaotic. Many are diffi cult to read, understand, or predict. Some are worried about how others will perceive them and try hard to manage their impressions. Others are simply deceptive. With all its complexities, its dynamic and chaotic nature, there is always a sense of skepticism about attempts to quantify personality, just as with things like love or art. You often hear people say that love is not something you can describe and/or put numbers on—“It is just something you feel.” The same seems to apply to the measurement of personal- ity, at least in the mind of laypeople, but is this mainly wishful thinking?
Unlike laypeople, psychologists believe you can measure personality using reliable scientifi c tools. After all, if some- thing exists, it should be subject to study. If it varies in detect- able ways, it should be quantifi able. Therefore, the only thing needed is to fi nd a valid way (or ways) of doing so. This is not a commentary of faith but is derived from our understanding of the scientifi c method. For instance, we would all agree that not all people are exactly the same; some people are taller, some are heavier, and some are stronger. We would also agree that people differ more than in just a physical sense; that is, they differ in the way they think, feel, and behave. Some people are friendlier, some are more aggressive, and others more assertive. When we compare people, we often say they have more or less of an attribute (e.g., friendliness, aggression, determination,
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 103 8/24/2012 6:36:27 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
104
etc.). These differences are differences in degree. If we acknowl- edge therefore that some people differ in psychological aspects in terms of degree, there is no reason as to why these aspects should not be quantifi able. All we need is to assign numbers to these degrees or variations.
Strictly speaking then, the skepticism about the prospect of measuring personality isn’t justifi ed. Indeed, the whole fi eld of psychometrics (literally meaning mind measurement) is dedicated to measuring differences between people in various psychological concepts (or constructs), including personality. Accordingly, within this fi eld (and psychology in general), the question is not really whether personality can be measured, but rather, whether it can be accurately done. So, are personality psychologists able to measure personality well? Are they able to really capture a person’s character?
The answer to this question is “yes!” (with a big exclama- tion mark). We will review evidence for this claim below, but fi rst, we should clarify a few points. First, there is no magic in personality assessment. Personality psychologists are not mind readers or telepathists. They cannot look at your palm or fore- head and tell you who you are. Personality assessment is not standard or simple. It is not like measuring height with a tape measure. Personality assessment combines a variety of theories and methods, including common sense, probability theory, and statistical testing. The methods are often not much differ- ent from what anyone put with the task of assessing personality would eventually discover and try.
To clarify this point, suppose for instance that you (and perhaps a few of your friends) were given the task of devising a way of assessing the personality of, say, the richest person in the world (who at the time this book is being written is, according to Forbes rankings, Carlos Slim; Forbes.com, 2011). How would you go about doing it? In fact, if you have a minute, you may actually want to try this exercise. So, the question is as follows: How you would have measured the personality of the richest man in the world? What methods would you use?
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 104 8/24/2012 6:36:27 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
105
Regardless of whether you actually attempted this task or not, it is safe to say that you would eventually have had a num- ber of ideas, or options, in front of you as methods. Indeed, there are several ways you could do this. One way, of course, is simply to talk to them and fi nd out as much as you could about their personality. Another way could be to ask them questions simply by using a questionnaire. You may, however, be skeptical. They may distort their story, either to self-enhance, or because they are simply delusional. So, you might decide to interview other people who know this person well and get their views of what he/she is like. Finally, you may decide that it would be useful to observe how this person behaves in a variety of situa- tions. For instance, you could put them in various scenarios or role plays and see how they react; or put them in experimental conditions. You could, alternatively, observe their day-to-day behavior, and so on.
FOUR TYPES OF DATA
One problem (but this is also an advantage) is that there are lots of ways you could get information, or “data,” about a per- son, and personality psychologists have divided the numerous data sources into a few categories. These data categories are (a) life record data (L-data), (b) observer data (O-data), (c) test data (T-data), and (d) self-report data (S-data). An easy way to remember these categories, or types of data, is by the acronym LOTS. Below, we briefl y describe each of these categories.
L-data basically deal with a person’s life history or bio- graphical information. It involves collecting data from the indi- vidual’s natural, or everyday-life, behaviors, measuring their characteristic behavior patterns in the real world. Rather than asking a person about their past tendencies, L-data often con- sist of actual (objective) records. This can be obtained by look- ing at past school/college grades, criminal records, educational
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 105 8/24/2012 6:36:27 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
106
attainments, and so on. It is based on the idea that past behav- ior is the best predictor of future behavior.
O-data consist of obtaining evaluations and information from relevant others or observers. Observers can include par- ents, friends, colleagues, teachers, and so on. There are a variety of ways in which observers can provide information. A com- mon way is to provide ratings through questionnaires similar to (or the same as) those used in self-reports. The benefi t of this method is that one can obtain data from different observers who see and interact with the subject in different contexts. For instance, in organizational settings, 360 multisource feedback is a common way to obtain O-data. This involves obtaining ratings from subordinates, peers, bosses, and customers (in addition to self-ratings). Other observational methods include observing participants as they go along their daily lives. This is similar to anthropological research, where people are observed in their natural environments. The observer is usually someone who is trained to make systematic observations. They attempt to obtain as much data as possible, which are then scored on a predefi ned set of criteria.
T-data are based on objective tests. They consist of stan- dard stimulation situations in which the individual is unaware of what is being measured (Cattell & Kline, 1977). It basi- cally involves examining participants’ reactions to standard- ized experimental situations (often created in a lab), where a person’s behavior can be objectively observed and measured. One example of such tests (called the Fidgetometer) involves instructing examinees to sit on a chair that is wired up to detect any movements. In this scenario, the person being examined often does not know that their movement is being measured. Even if they did, they would not know that personality traits are inferred from moving more or less (let alone which particular personality trait).
Finally, S-data involve responses based on introspection by the individual about his or her own behavior and feelings. Here, the information about the individual is obtained by the
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 106 8/24/2012 6:36:28 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
107
individual. This can take several forms, most common of which are self-report questionnaires and interviews (but there are others such as, for instance, written-essay form). The methods employed in S-data are the most convenient form of gathering information. This is particularly the case today, with the advent of the Internet, where surveys posted online give simple and quick access to a vast number of people.
There are also several methods that do not easily fi t into the LOTS categories. For instance, some research employs diary methods, where participants are required to report specifi c events, or specifi c behaviors at specifi c times, or feelings and thoughts. Thus, participants in such studies are asked to keep diaries of their thoughts, feelings, and behaviors, over a set time period. This can sometimes be a highly useful method because it can provide data that may otherwise be diffi cult to obtain. Indeed, this was brilliantly demonstrated by studies conducted by Fleeson and Gallagher (2009), discussed in Chapter 1, which provided clear information about behavior variation across sit- uations and time.
There are many other methods available, of course, to assess someone’s personality. As you review these options, however, you are also likely to be recognizing that there are shortcomings of each. For instance, life records (L-data) are not always easy to obtain and can be biased, or even inaccurate. Asking peo- ple direct questions either through interviews or self-reports (S-data) may be subject to impression management or socially desirable answers (i.e., lying). Asking others (O-data) may seem to remedy the latter concern, but others may not actually know the person very well. In any case, it certainly cannot eliminate the possibility that they are also distorting their ratings (e.g., because they are best friends with the person being rated). Observational methods may be subject to common cognitive biases in human observers. Finally, experiments may not gen- eralize to the outside world.
As you can see, none of these measures are perfect. Each individual data source, and methods within it, has its own fl aw.
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 107 8/24/2012 6:36:28 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
108
However, despite this, it is also easy to see the benefi ts of each method. Clearly, the information gathered from these sources will not be totally useless. It will surely tell you something about the personality of a person. In addition, there is nothing to say that a researcher needs to be confi ned to one data source. In principle, they can combine as many of these methods as they wish. Indeed, it is desirable, and advisable, that multiple sources are used when obtaining information about people. In this way, one can be more confi dent about one’s conclusions, particularly if the different methods end up providing very similar profi les of the same person, if they paint a similar picture.
Nevertheless, it is not always easy to expose people to hours or days of research. Personality psychologists have there- fore often had to resort to fewer, and commonly, single sources of data. A critical task for a researcher therefore is to determine which assessment method to choose. In order to do this, of course, the researcher must know which assessment method assesses personality most accurately. There are several scien- tifi c methods to establish this and researchers today generally agree on which tests are most accurate. Nevertheless, not all well-established methods are suitable for all researchers. That is, there is no universal agreement. One major reason for this is that judging which method is best is not always a question of research data, but also a matter of theoretical perspective. Thus, psychologists may employ different assessment methods because their research often rests on different theoretical ideas about what personality is, what its underlying causes are, and how it is expressed. As mentioned before, a psychoanalyst would rarely employ self-reports simply because the theory asserts that many of the causes of a person’s thoughts, feelings, and behav- ior are not available to the person on a conscious level.
While there are different schools of thought, however, today the most commonly used source of data in personality psychol- ogy is self-reports. This may strike you as surprising. After all, self-reports, on the surface, appear to have several limitations (some of which already have been mentioned). Before you are
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 108 8/24/2012 6:36:28 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
109
tempted to draw any conclusions, however, let us remind you that scientifi c research on personality testing spans almost 100 years (even if efforts to assess personality have an even longer history that predates psychology; Boyle, 2008). Accordingly, the methods found today have gone through nearly a century’s research and scientifi c evaluation. Therefore, the current incli- nation of researchers to use self-report does not refl ect personal preferences, but rather years of empirical data.
For you to better understand the current status of personal- ity assessment, it is perhaps useful to look at its evolution—that is, the different phases and stages it has passed to arrive at its current position.
A BRIEF HISTORICAL PERSPECTIVE OF
PERSONALITY ASSESSMENT
Interestingly, the earliest personality measures were in the form of questionnaires not unlike those found in psychologi- cal research and practice today. However, as with intelligence measurement, the impetus for the development of personality tests came from the world of practical affairs. Indeed, a major reason for this was the success of early standardized intelligence tests. Alfred Binet had successfully introduced tests of intelli- gence (in 1904), which were used to classify children according to their ability, fi rst in France and later in United States. During World War I, adjusted adult versions of these tests were soon brought into the U.S. Army, to aid in the selection of recruits (in terms of their cognitive ability) for military service. Soon after, however, a need was also recognized for tests that would identify recruits that were prone to psychological instability. To ensure army recruits were also emotionally healthy, therefore, the fi rst standardized personality test was devised. It consisted of questions that dealt with various symptoms or problem areas (for instance, with whether a person had frequent daydreams,
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 109 8/24/2012 6:36:28 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
110
or wet their beds, etc.), on which “yes–no” responses could be made (Woodworth, 1919).
The measures were a success in the army. As a result, they quickly spurred interest in the application of personality assess- ment in other domains also. The most noteworthy of these was within clinical practices, where some researchers aspired to design a test that would provide an “objective” basis for psy- chiatric diagnosis. That is, they wanted to develop a test with items (i.e., test questions) that could distinguish between groups known to have different psychiatric disorders, as well as patients from people in general. The best-known test of this sort was the Minnesota Multiphasic Personality Inventory (MMPI; Hathaway & McKinley, 1940), which appeared in 1940. Similar to the tests used in the army, the MMPI was a self-report inventory consist- ing of true–false statements. The achievement and popularity of the MMPI were enormous and still remain today (in fact, with its revised version [MMPI-2], the MMPI is thought to be the most widely used personality inventory in history; Boyle et al., 2008).
Given the practical usefulness of the MMPI, similar tests with nonclinical populations soon followed. The aim of researchers again was to distinguish between groups, but this time between nonclinical personality characteristics. One of the best known of these was the California Psychological Inventory (CPI, Gough, 1957). As with the MMPI, the CPI was designed with practical purposes in mind. Specifi cally, it was aimed at high school and college students. The CPI tested for various personality traits such as dominance, sociability, toler- ance, and so on and was useful for categorizing people into dif- ferent groups, for instance, dominant versus submissive pupils. As with the MMPI, the CPI was (and still is) a highly popular inventory in applied settings (even if the academic community has largely dismissed the test on empirical grounds). This is one reason as to why it has long been known as “the sane man’s MMPI” (Thorndike, 1959).
An interesting historical observation is that around the time these self-report measures were being constructed, other
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 110 8/24/2012 6:36:28 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
111
personality tests that differed substantially in their method- ological and theoretical foundations were also appearing. These included the now infamous Rorschach Inkblot Tests (Rorschach, 1921), the Thematic Appreciation Test (Murray, 1943), and the Objective Analytic Test Battery (Cattell, 1950). These tests con- trasted radically from self-report inventories, such as the MMPI and the CPI. While self-reports clearly relied on people’s sub- jective evaluations of themselves (as well as their honesty and self-knowledge), these latter tests were designed to eliminate the subjective element in testing. The notion here was to design tests that would actually measure a person’s personality rather than asking him or her about it. Thus, akin to intelligence tests, advocates of this line of research wanted to design objective measures of personality. This division between subjective and objective measurement of personality has a long-standing his- tory, and tension between advocates of each remains even today. Nevertheless, the desire to create convincing objective measures of personality traits is no doubt shared by all researchers in the fi eld (even the skeptics of such measurement).
Perhaps, deriving from this tradition, more recent assess- ments of personality have increasingly included objective measures. This “renaissance” of objective measures no doubt has come as a result of technological improvements in various areas such as software and computers, medical equipment, and technological advances in experimental settings. For instance, psychologists are now able to, and increasingly do, use physi- ological measures to assess personality, including examining genetic, biological, and neurological markers (see Chapter 2 section on Genes and Personality for more details). Positron emission tomography (PET) scans, functional magnetic reso- nance imaging (fMRI) scans, and electroencephalography (EEG) measures are becoming increasingly popular in psycho- logical research on personality.
Nonetheless, as mentioned before, today self-reports remain at the forefront of personality assessment. With so many other options, you may be wondering why that is. As
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 111 8/24/2012 6:36:28 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
112
mentioned above, however, the choice of self-reports is not a matter of personal preference but rather refl ects the simple fact that this method has met the scientifi c criteria better than other methods. The criteria in question are the cornerstone of per- sonality assessment; they are the means by which personality psychologists determine whether they have assessed personal- ity accurately or not (or rather how accurately). Accordingly, the next section will be devoted to their nature and description.
HOW CAN WE DETERMINE WHETHER
PERSONALITY IS ASSESSED ACCURATELY?
We mentioned earlier that a fundamental aim, or one might even say obligation, for personality psychologists is to deter- mine whether they, through the various methods mentioned above, are actually able to capture people’s personalities. That is, they need to determine and demonstrate that they are assess- ing personality accurately. This is, of course, an empirical ques- tion. It is not down to your or our opinion about whether a certain data source or assessment method seems to be more suitable than another for assessing personality. There are ways to demonstrate this scientifi cally. The criteria that any personal- ity test (or indeed any other psychometric test) needs to meet to be considered useful, or fi t for purpose, are reliability and valid- ity. A good test of personality is one that is both highly reliable and valid. Given their importance for psychological research of personality, we will discuss the meaning of these terms in detail below.
Reliability
Suppose you went on a weighing scale in the morning and found that you were 165 pounds (or around 75 kg). Imagine that you did the same the morning after and found that you
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 112 8/24/2012 6:36:28 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
113
were now 130 pounds (or around 60 kg). The following morn- ing the scale seems to suggest that you are now 200 pounds (or around 90 kg). You would obviously throw the scale away. It is clearly unreliable. At least two of these values have to be wrong, which means you cannot take any of them at face value. Even if the scale got it right one of the occasions, you would still not fi nd it fi t for purpose. This is because it is inconsistent—it does not consistently show the same values (which it, of course, should).
In psychometric testing, similarly, reliability refers to the consistency of a measure. It refers to the extent to which obser- vations are dependable. This consistency (or reliability) can refer to three aspects of the test: the test’s internal consistency, test–retest reliability, and interrater reliability. Should a test be lacking in (or score low on) one or more of these components, it would be considered to be unreliable. The extent to which the measure is lacking in one of these components is refl ected by the “measurement error,” that is, the amount of error there is in the measure’s ability to capture (or assess) what it is sup- posed to capture (be it weight, height, or personality). Thus, for a measure to be considered reliable, it has to have high internal consistency, high test–retest reliability, and high interrater reli- ability (although this latter point can depend on other factors, as you will see below). We will deal with each of these compo- nents below.
The aspect of reliability—internal consistency—refers to how much association there is between the items that are mea- suring the construct in question. It is essentially the correla- tions between the items of the test. For instance, imagine we wanted to measure a person’s intelligence and gave the person a fi ve-item measure to complete. Of the fi rst three items, one is on arithmetic, one on vocabulary, and one on picture completion. However, suppose that the last two items consisted of measur- ing how fast the person runs and how much weight this person can lift. Now, regardless of whether these two items (running speed and power) is part of your concept of intelligence or not,
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 113 8/24/2012 6:36:28 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
114
we are likely to fi nd little or no correlation between these and the other three items of the test (which do correlate with one another). In this scenario, we have little internal consistency between the test items. Similarly, in a personality questionnaire that aims to measure, say, ambition, we may have items (or questions) asking about a person’s tendency to persist on tasks, their reactions to competition, or their eagerness to advance. If we then included questions about the person’s tendency to visit art galleries, to daydream, to read scientifi c magazines, and so on, we would not expect the test to be internally consistent. Many of the items of the test would not be associated with one another. If different parts of the test are measuring different variables, it is hard to see how it could be a good test. Thus, internal consistency of a test is necessary for a test to be reliable and therefore useful.
The second aspect of reliability, test–retest reliability, refers to the extent to which observations (for instance, score on a per- sonality scale) can be replicated. It is what we illustrated with the weight scale example. Does the scale give identical read- ings each time you are on it, or does it give different values? In terms of personality measurement, test–retest reliability refers to the similarity of test scores between different occasions. That is, the correlation between test and retest scores is the index of the test’s reliability. If a group of people, for example, take a test of Extraversion on two different occasions, say, a few weeks in between, it is expected that they should score approximately the same on the second occasion as they did on the fi rst (assuming that there will be no change in the construct being measured, i.e., in people’s actual Extraversion levels). If there was no con- sistency between the scores, that is, if the test gave different scores to people at these different occasions, then no trust could be placed on the scores. It should be noted that we are referring to test–retest reliability in terms of rank-order consistency (see section on Stability Versus Change for a reminder of rank-order stability). So, a person scoring similarly means that they score similarly relative to others. That is, it is perfectly possible to have
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 114 8/24/2012 6:36:28 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
115
high test–retest reliability even when a signifi cant mean-level (or absolute) change has occurred in that particular trait (for instance, test–retest reliability of IQ scores remains high, even if people become more intelligent as they move from childhood to adolescence).
The fi nal component of reliability concerns interrater reli- ability. Interrater reliability refers to the amount of agreement, or consensus, between two or more raters. For instance, if you rated yourself as a very funny person but your partner completely disagreed, interrater reliability would be low. Commonly, inter- rater reliability involves ratings by others (often referred to as judges). For instance, how much agreement is there between your friends that Brad Pitt or Angelina Jolie is beautiful? If beauty is in the eye of the beholder, of course, there should be low interrater reliability between your friends’ judgments. That is, there would be little consensus on how attractive they found Brad or Angelina. If, on the other hand, they mostly seemed to agree (and a guess would be that they do), we would say that the interrater reliability is high.
This is also the case with regard to personality. In job interviews, a panel of interviewers generally want to rate you on different personality traits—how ambitious you are, how agreeable you are, how socially skilled you are, and so on. If the interviewers disagreed on how reliable or friendly you are, for instance, there would be little value in this method of assess- ment. As before, at least one judge would be wrong, which in essence would render all the ratings futile. Thus, interrater reli- ability is crucial for a personality test to be considered reliable.
It should be noted, however, that there could be value in having low interrater reliability. For instance, in some instances, such as in 360 multisource feedback, interrater reliability is not necessarily expected. This is because different people may have seen different parts of the ratee’s personality, and thus, the ratings would be considered to be complementary and pro- vide different sources of information (rather than refl ecting disagreement).
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 115 8/24/2012 6:36:28 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
116
Sources of Unreliability (“Error”)
Several factors can infl uence the reliability of a measure. In technical terms, we would say that there are many sources of “error” in measurement. These can include factors internal to the person, such as mood state, motivation, physical state, and so on, or external factors, such as test-taking conditions and the nature of the test itself. It is obvious to see how internal factors could infl uence tests’ reliability. For instance, if a person com- pletes a test on subjective well-being just after he or she won the lottery, they would probably respond differently than they would have just a few hours before (and even some time after the win, as positive effects may wear off with time). Thus, in this situation, we could not rely on the test score of subjective well-being. The test score would be unreliable.
Sometimes error in measurement may result from factors inherent to the test. One source of error is the number of items included in the test. For instance, if you wanted to test whether a person is conscientious, you could simply pose the question “Are you a conscientious person?”—“Yes” or “No”—in a ques- tionnaire. This makes sense; however, you would probably also recognize that there would be several sources of error in this type of questionnaire (even if the person was being honest). A person may respond “Yes” because they see themselves as more conscientious than, well, “not conscientious.” However, in essence, they may still not be very conscientious. The test doesn’t tell us how conscientious they are. Thus, although the test is not wrong, there will be quite a bit of error in the test because it has not been able to capture how conscientious that person actually is.
If we, on the other hand, asked two or three questions (e.g., “Are you conscientious with regard to your university assign- ments, at work, at home, etc.?”), and perhaps gave them a rating scale from 1 to 5 on which they could indicate their agreement, then we would have a better estimate of how conscientious they are. In essence, the more questions you ask about different
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 116 8/24/2012 6:36:29 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
117
behaviors related to Conscientiousness, the fewer sources there would be for error. This is why personality questionnaires usu- ally comprise dozens and sometimes hundreds of questions. However, we should note that the remedy isn’t always simply to include more items to increase reliability. A large amount of items may lead to fatigue, boredom, anger, or other possible mood states in test-takers, which, as mentioned before, are themselves sources of error.
The Practical Importance of Eliminating Error
Error, and thus unreliability, in measurement can have seri- ous implications, particularly in applied settings. It is therefore critical that psychologists and practitioners try to minimize the sources of it. Suppose for instance that we are selecting appli- cants for a job, based on their scores on Conscientiousness. Here, we inevitably need to decide a cut-off score for selection versus rejection. Imagine that the average Conscientiousness score for all candidates is 50 and a fi rst candidate scores 75, whereas a second scores 55. Should we hire the fi rst candidate based on this score? Initially, this would certainly seem like a reasonable decision. However, crucially, the decision would also be depen- dent on the reliability of the scale that we are using. If we have a scale that is highly reliable and people score the same each time they complete it (so the fi rst candidate would score 75 and the second 55 on every occasion), then this decision would be cor- rect. However, if we have a test with low reliability and people score rather differently on different occasions (for instance, if there is a reasonable chance that the scores reverse, i.e., the fi rst candidate scores 55 and the second 75, on a second occasion), then this decision would be incorrect (because we do not know what the candidates’ “true” Conscientiousness levels are).
As you can tell, this issue is a worrying one. If you are applying for a job, you would certainly not want to be the one having an “unlucky” test day. Such a revelation may also evoke serious skepticism in you toward psychometric testing in
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 117 8/24/2012 6:36:29 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
118
general. However, you do not need to draw any such conclu- sions. Psychometricians have inevitably found a way of dealing with this issue and there is a statistical way to eliminate this problem. In plain language, to be able to make a selection deci- sion based on psychometric scores, we need to know the stan- dard variability of the scores, that is, how much disparity we can expect on individuals’ test scores in general, or, on average. This is what psychologists call standard error of measurement. We do not need to dwell on the statistics for calculating the standard error. Instead, we will give you a simple explanation. First, we know that in a normal (bell-curve) distribution, 68% of all scores will fall between the mean (average) and ±1 stan- dard deviation (i.e., the average departure of scores from the mean). In all, 95% of scores fall between the mean and 2 stan- dard deviations (note that we can generalize this observation because most, if not all, human traits show a distribution that looks like a bell curve). For instance, IQ tests have a mean score of 100 and a standard deviation of 15. This means that 68% of all people score between 85 and 115 and 95% of all people score between 70 and 130.
We can then apply this same formula to our example above. Imagine that our test of Conscientiousness has a stan- dard error of 5. This would mean that there is a 68% chance that a candidate’s score will fall within ±5 points of their cur- rent score, and 95% chance that it will fall within ±10 points of their current score on another occasion. Thus, in the above scenario, a standard error of 5 would mean that the fi rst can- didate’s score would, 68% of the time, fall between 70 and 80, and 95% of the time between 65 and 85. The second candi- date’s score would, 68% of the time, fall between 50 and 60, and 95% of the time between 45 and 65. Thus, as you can see, if we know that the standard error in the sample is 5, and we know that the fi rst candidate scored 75, and the second 55, we can be confi dent that there is about 95% chance that our decision to select candidate 1 would be correct. This is a pretty good bet.
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 118 8/24/2012 6:36:29 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
119
The good news is that standard error can be calculated fairly easily. Interpretations of scores (and selection deci- sions) can, subsequently, be made by taking this formula into account. Once calculated, an employer can decide that differ- ences between two candidates’ scores would have to be beyond the specifi c error boundaries to be considered enough to make a selection decision. This would certainly minimize the prob- abilities of getting it wrong. The takeaway message from this example is, however, that for all practical testing, it is essential that the test be highly reliable. A test with low reliability can be highly detrimental in practical use and would, in general, be considered to give a poor indication of a person’s real, or true, personality.
We should fi nally note that even a highly reliable measure can sometimes be of little use. This is because very high levels of reliability may indicate that we are measuring rather narrow and psychologically trivial variables. For instance, it is easy to increase the internal consistency of a measure by asking an essentially paraphrased version of the same question over and over. This may give us very little information about the overall trait we are trying to assess. Thus, very high internal consistency may simply indicate that the test is so specifi c that it is of little psychological interest (Cattell & Kline, 1977).
In addition to being consistent, therefore, we need to deter- mine whether the test in question is, in fact, also a useful one. This is where the next criterion for establishing the quality of a test comes in, namely, validity.
Validity
Validity essentially refers to the usefulness of the test. It assesses whether the test measures what it claims to measure. As mentioned above, the main task of personality psychologists is to demon- strate that the assessment methods they use are, in fact, measur- ing specifi c personality traits, and that they are accurately doing so. These concerns are evaluated by looking at tests’ validity.
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 119 8/24/2012 6:36:29 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
120
To illustrate, suppose that a woman wanted to investigate whether she was pregnant or not, but used a thermometer to do so. Now, this measure could be perfectly reliable, but it would no doubt be useless. It measures some variables consistently but those variables are not related to pregnancy; it is an invalid measure of pregnancy. Thus, although reliability is an essential component of a test, validity is arguably even more important (although, of course, if a test is not reliable, or consistent, it is diffi cult to see how it could be valid).
Given this defi nition of validity, however, you may be won- dering how one could actually demonstrate what a personality measure is measuring. It is easy to validate a thermometer or a pregnancy test. We can feel as the temperature gets higher and lower; if a thermometer doesn’t respond to these temperature changes, it is invalid. We can see whether a woman is pregnant or not (well, we can tell for certain after birth); if the pregnancy test repeatedly tells you something different from what you observe, it is invalid.
However, how do you demonstrate whether a personal- ity test measures what it claims to measure? This is an impor- tant psychometric issue, since it is often diffi cult to even agree on what a psychological construct is in the fi rst place (think of the notorious diffi culty in agreeing on what intelligence is). Furthermore, things in psychology are not as observable as things in physics or chemistry. This diffi culty has led psy- chologists to develop a number of different ways for assessing validity.
A very basic, and obvious, way to determine whether a measure is valid is to examine its face validity, that is, to simply check whether it looks valid. For instance, in personality ques- tionnaires, this would involve verifying that the questions look relevant. As the old proverb goes: “If it walks like a duck, quacks like a duck, and looks like a duck, it must be a duck.” Of course, face validity has no inherent connection to “true” validity. You may have a toy thermometer that looks exactly like a real ther- mometer, but is useless for indicating temperature. Face validity
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 120 8/24/2012 6:36:29 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
121
is good, insofar as it increases the probability that test-takers will take the test seriously (well, also that they will take it in the fi rst place). However, very high face validity can also mean too obvious a test, or, a test easier to fake.
A related type of validity is content validity. This refers to whether the test measures all aspects of the construct being measured. For instance, a test of intelligence shouldn’t just measure mathemati- cal ability. Content validity can be examined, for instance, by hav- ing experts in the fi eld (of, say, intelligence) inspect whether the test seems to represent a comprehensive measure of the construct. If the test doesn’t cover all domains of a construct, it cannot be valid. Although mathematical ability is one aspect of intelligence, few would equate intelligence with merely a talent for mathemat- ics. We should note, however, that with psychological constructs, no expert can tell us that a test is, in fact, valid just by looking at it—even if the content looks valid.
A more rigorous, and arguably more scientifi c, way of assess- ing validity is by looking at tests’ concurrent validity. Concurrent validity can be empirically demonstrated by showing that the measure correlates highly with other measures of the same construct. For instance, if you have designed a new measure of impulsivity, you want to make sure that this measure is related to other already empirically established measures of impulsiv- ity. If it does not correlate with any of these well-established measures, it is diffi cult to argue that this is in fact a measure of impulsivity. If there are substantial correlations between the new test and other tests of impulsivity, then it could be said that there is convincing evidence of validity. Of course, although concurrent validity involves statistical evidence for the validity of a new measure, there is an obvious question: If there already is an established measure for a construct, then why design a new one? This logic is accurate. However, one reason for new tests can be that they are simply better able to predict relevant outcomes.
In essence, quite separate from face, content, and concur- rent validity, a key question relating to all psychometric tests
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 121 8/24/2012 6:36:29 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
122
is as follows: Does it provide practically useful information? A test may appear good, seem to encompass all the right com- ponents of a construct, and correlate well with other measures assessing the construct. However, if it does not yield informa- tion that enables psychologists to predict criteria that the con- struct should predict, then of what use is it? For a test to be considered worthy of attention, it needs to be associated with certain external criteria or real-life outcomes. For instance, if we have a measure of trait anxiety, we should, in essence, be able to use it to distinguish between people with anxiety disorders and people suffering from depression. If we have a measure of Agreeableness, we should be able to predict who will be less likely to have confl icts at university or work. These are called criterion-related or predictive validity. If we cannot predict out- comes, then clearly these tests do not capture the variables that we want to capture—and serve no real purpose. If we, on the other hand, can distinguish between groups based on the test, or predict who will act in what way, then the test would be of great practical value.
There are two other aspects of validity. These relate to the practical usefulness of a test beyond other available tests. If, for instance, we develop a new test of, say, emotional intelligence (EI), we would want (or need) it to yield information that is dif- ferent from what we can obtain from existing ability or person- ality tests. This is called discriminant validity; it assesses whether a test measures a construct that is actually distinct from existing constructs. If the test has no discriminant validity (i.e., the cor- relations are very high), then we would have to conclude that the test either does not capture the distinct construct in ques- tion, or that this construct does not exist.
In addition to discriminant validity, we would also want the test to be able to predict external criteria beyond existing tests. To keep with the example above, if we, for instance, gave job applicants a test of intelligence (IQ test) and a test of person- ality (say the Big Five), would it matter if we also give them an EI measure? That is, will the EI measure give us information about
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 122 8/24/2012 6:36:29 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
123
how the person will perform at the job, beyond that of what is already obtained by the IQ and personality measures? This latter type of validity is referred to incremental validity and is, as we will see in the next chapter, of vital importance in applied contexts, and particularly within occupational psychology.
It should be clear from the above that various types of validity criteria are needed to establish whether a personality measure is accurately assessing the construct it is attempting to assess. Any one of these methods individually will not do. Thus, Cronbach and Meehl (1955) suggested an inclusive way of measuring tests’ validity, namely, through construct validity. Construct validity refers to the extent to which results obtained from the tests fi t into a theoretical network; it is the combina- tion of all the fi ndings. For instance, to establish the construct validity of a self-report measure of general anxiety, we would not only correlate it with individuals’ feelings of anxiousness before a test, but a range of other criteria, such as the likelihood of receiving treatment for an anxiety disorder, the likelihood to be stressed in dangerous situations, physiological indices of anxious arousal, and so on. We would also have to show that the measure is related to other self-report measures of anxiety, is not correlated with tests not measuring anxiety, and is not correlated with ability tests (such as IQ). If all these criteria have been satisfi ed, however, we would be confi dent that we are indeed measuring anxiety, and we are accurately doing so.
As you can see, there is a highly rigorous and scientifi c process involved in demonstrating that personality traits (or indeed any other psychological traits) can be measured, are measured, and are measured accurately. This process is derived from scientifi c methodology. If we want to have confi dence in personality tests, and make practical judgments based on these tests, we need to fi rst establish that they show good psychomet- ric properties. If a test shows good reliability and validity, we can be confi dent that we are indeed able to measure the person- ality trait in question, and that the test is practically valuable. Thus, although personality may be a complex, unobservable,
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 123 8/24/2012 6:36:29 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
124
and even chaotic, phenomenon, to the extent that tests show high reliability and validity, they can be said to be measuring personality with good accuracy.
Of course, so far we have only specifi ed that a test would be considered valuable and an accurate estimate of a person- ality trait (or traits) if it shows good reliability and validity. We have not actually shown that assessment methods are, in fact, reliable and valid, nor have we specifi ed the types of measures that may be more or less valid in assessing person- ality. Much of the criterion-related validity estimates, that is, what personality predicts, will be discussed with reference to self-report inventories in the next chapter. In the rest of this chapter, we will focus on describing in detail the differ- ent kinds of assessment methods in personality research, and critically evaluating the strengths and weaknesses of each. This will, in essence, also give you a better idea as to why per- sonality psychologists so often prefer to employ self-report inventories.
TYPES OF PERSONALITY ASSESSMENT
In our example above, in which we asked you to consider meth- ods for assessing the personality of the world’s richest man, we illustrated the numerous options that are potentially available for personality psychologists. We also noted that the choice of personality assessment is guided not only by common sense or preference, but also by theoretical and statistical considerations. Thus, psychologists may employ different assessment methods depending on their theoretical perspective of what personal- ity is, what its underlying causes are, and how it is expressed. Here, we will describe the various methods that personality psychologists have tried and tested throughout the years. Some of these methods have only recently emerged; many others have been used from the very early days of personality assessment.
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 124 8/24/2012 6:36:29 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
125
Traditionally, it has been useful to divide the various assess- ment methods into three different categories: projective meth- ods, self-report methods, and objective methods; however, as will be seen, not all methods fi t into one of these categories, and we will review several other methods of assessment beyond this classifi cation. Below, we will start by describing some of the more intriguing measures for assessing personality, namely, projective tests.
Projective Tests
If you have seen a Hollywood movie featuring psychological testing, chances are that it was the Rorschach Inkblot Test. The Rorschach Inkblot Test is one of the most well-known psy- chological tests, and indeed, one of the few (if not the only) personality test known to the general public. It is from the fam- ily of projective tests, which at their core are designed to obtain information about a person’s innermost thoughts and feelings that he or she may or may not be consciously aware of. They do so by presenting people with ambiguous stimuli, or open- ended and unstructured tasks, such as completing unfi nished sentences, interpreting events presented in pictures, or describ- ing what they see in inkblots. They are referred to as “projective tests” because they require participants to project their feelings, thoughts, and perceptions in their interpretations of these stim- uli. The assumption is that when people are presented with an ambiguous stimulus, whose meaning is not clear, differences in interpretations must refl ect differences in inner psychological properties.
These tests are based on a theoretical perspective quite opposite to that of self-reports. Unlike self-report assessment, which in essence must assume that people know themselves and their own personalities (hence, why they can report it), pro- jective tests assume that people are often not conscious of, or otherwise not willing to admit, many of their underlying psy- chological tendencies.
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 125 8/24/2012 6:36:30 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
126
They would (evidently) therefore not be able to report these to interviewers or on questionnaires. Accordingly, the only way to access these parts is through techniques that require projec- tion. There are numerous projective techniques; however, given space limitations, we will only review the most widely used (and the most popular) of these: the Rorschach Inkblot Test (or simply Rorschach; Rorschach, 1921) and the Thematic Apperception Test (TAT; Murray, 1943).
The Rorschach Inkblot Test
The Rorschach was designed by Swiss psychiatrist Hermann Rorschach (1884–1922) and consists of 10 symmetrical inkblots presented on cards. These come in black and white, or colored. Respondents are asked to report what they see, and subsequently (after all 10 blots have been presented) about what parts of the blot they used, and which attributes were most important.
The most important part of the test from a methodological perspective is the scoring. Numerous different scoring systems have been developed, focusing on a large number of elements of the response. These include the portion of the blot that is used (e.g., the whole blot or just parts), the specifi c attributes of the blot used (e.g., the color), and the content of the response (e.g., images of death, sex, animals, etc.). Some systems may, for instance, interpret (score) using parts as opposed to the whole blot as indicating rigidity and compulsiveness. Using the colored parts of the blot may indicate emotionality and impulsivity. Scoring some content, such as seeing sex organs or people engaging in sexual acts, may be obvious; however, others are inferred. For instance, seeing a wild animal may be interpreted (scored) as refl ecting an examinee’s confl ict with his or her father.
While fascinating in its appearance, several problems have been identifi ed with the Rorschach (some of which are common to all projective tests). An obvious problem is that scoring often requires subjective judgment, or interpretation,
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 126 8/24/2012 6:36:30 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
127
by the examiner. This means that different examiners may interpret the same responses very differently. Second, there are major inconsistencies among all the various scoring sys- tems (Kline, 1992), which complicates the issue of interrater reliability even further. Finally, critics have pointed out that there is no psychological theory that justifi es the assump- tions made in the Rorschach test, or indeed any other projec- tive method (Kline, 2000; we will consider this issue further when we evaluate the TAT).
These criticisms directed at the Rorschach have been empirically confi rmed. Research has found little or no relation between individual Rorschach indices and external criteria. Indeed, in a seminal review of the validity of this test, Hunsley and Bailey (1999) noted that it is “the most reviled of all psy- chological assessment instruments” (p. 266). Furthermore, even when specifi c scoring systems have been able to improve the reliability and validity of the Rorschach (e.g., Exner, 1986), they have not done so beyond quicker and cheaper alternatives (such as self-reports), making it diffi cult to justify their use.
Thematic Apperception Test
The other major projective test is the TAT, developed by Murray (1938). Like the Rorschach, TAT involves the presentation of cards to participants. However, rather than inkblots, TAT con- sists of pictures depicting human beings interacting in various scenes. The scenes are ambiguous and open to interpretation (for instance, a man with a fi rm grip of a woman’s arm), but they are less ambiguous than mere inkblots. The examinee tak- ing the TAT is then asked to describe what is happening, what the characters are thinking and feeling, what may have led up to the scene, and what the outcome will be. The responses are interpreted and analyzed for recurrent themes that supposedly refl ect important aspects of a respondent’s personality, motives, and confl icts. For instance, an image depicting a man having a fi rm grip of a woman’s arm may be interpreted as a scene
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 127 8/24/2012 6:36:30 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
128
of violence or a scene of passionate love. From participants’ responses, different personal characteristics, such as power motivation, achievement aspiration, or affi liation orientation, can be extracted. What is extracted will depend on the nature of the response—for instance, the type of relationship depicted, motives and feelings attributed to characters, assumed causes of event, expectations of positive/negative ending to story, and so on.
Interpretations made by psychologists can often be appeal- ing, intriguing, and even make complete sense (more so with the Rorschach, because the latter does not provide any obvious reference point that is offered by the TAT images). However, as with the Rorschach, a major problem with these interpretations is their subjective nature. Given there is no standard way of interpreting responses (even if several systems are available for scoring, these do not eliminate subjectivity of interpretations), different raters may often interpret responses differently. Given that at least one of these interpretations will be wrong, we can- not attribute credibility to any one of these interpretations. This also makes it diffi cult to determine whether the personality and motivational facets that are suggested by these interpretations, and the TAT, are actually real.
As with the Rorschach, the TAT has not fared well in stud- ies assessing its reliability and validity. Although some studies have found TAT to be useful for identifying some personality and motivational characteristics (e.g., Atkinson, 1958;), corre- lations with several relevant motives as well as external criteria have often been poor. For instance, one key motive measured by TAT is Need for Achievement (nAch)—a variable strongly associated with entrepreneurial activity (McClelland, 1961; Rauch & Frese, 2007). However, studies have found little cor- relation between the TAT and objective or psychometric mea- sures of nAch. Furthermore, while psychometric measures of nAch show signifi cant associations with entrepreneurial activ- ity (Rauch & Frese, 2007), the TAT has no validity at all in the prediction of this criterion (Hansemark, 2000).
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 128 8/24/2012 6:36:30 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
129
The number and variety of projective techniques are remarkable (cf. Lilienfeld, Wood, & Garb, 2000). Overall, how- ever, it is clear that evidence for the reliability and validity for these tests is scarce. Although the tests show some validity for limited purposes, when systematic research has been carried out, the results have been disappointing. In a recent review of the validity of all projective techniques, Lilienfeld et al. (2000) had a damning conclusion. They wrote:
We conclude that there is empirical support for the validity of a small number of indexes derived from the Rorschach and TAT. However, the substantial majority of Rorschach and TAT indexes are not empirically supported. The validity evidence of human fi gure drawings is even more limited. With a few exceptions, pro- jective indexes have not consistently demonstrated incremental validity above and beyond other psychometric data. (p. 27)
Given this review and that the ethical implications of relying on projective indexes in applied settings are not well validated, it is clear that projective techniques do not meet the criteria for personality assessment we discussed above. That is, they are not reliable, nor valid, methods for assessing personal- ity; they do not capture personality accurately. Some research- ers point out that this fi nding is not surprising. For instance, according to Kline (1993), projective techniques are based on thin theoretical grounds. While these tests share the psycho- analytic notion of projection of unconscious and inaccessible motives and drives, the use of projection (e.g., the scoring sys- tems) and the assumption that subjects identify with the main characters in images (e.g., project onto inkblots) are quite dif- ferent from that of psychoanalysis. Furthermore, there is no indication in general psychological theory to suggest that the inherent assumptions of projective techniques should be cor- rect (i.e., that unconscious feelings and thoughts should “jump out” when a person is confronted with ambiguous stimuli, such as inkblots, or that a person should identify with characters in drawings). Thus, despite their inherent appeal and continued
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 129 8/24/2012 6:36:30 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
130
popularity in some circles of psychological assessment, the lack of reliability, validity, and theoretical rationale behind projec- tive techniques has meant that there is now a rather general skepticism in personality as well as mainstream psychology about their use.
Objective Tests
Apart from their objective of discovering hidden sexual drives or dark repressed motivations, projective tests appeal to laypeo- ple also because they have one important objective element: They attempt to assess personality without asking the person directly for his or her own opinions about it. The desire to assess personality objectively is not in need of any rationale. Anyone interested in assessing personality, be it academics and practi- tioners applying these in research or practice or examinees in organizational or clinical settings being assessed, would rather that this assessment be done objectively. After all, the best form of knowledge is, as mentioned before, objective knowledge. Objectivity is the essence of science. If personality assessment is to be considered scientifi c, then objectivity should be at the forefront of this endeavor. This is so obvious that it does not really need mentioning.
Despite this, however, one does feel that this sometimes needs repeating. After all, psychologists’ preference for self- reports in personality assessment is notable. With this in mind, it is necessary to ask why this is so. To address this question, below we will describe and critically evaluate vari- ous objective tests and discuss the current state and projected future of this fi eld.
As mentioned before, attempts to fi nd objective measures of personality stretch back to the very beginnings of psychol- ogy. Although the efforts of some early pioneers in this non- questionnaire area is well documented (e.g., Thornton and Thurstone; in Hundleby, Pawlik, & Cattell, 1965), the most important protagonist of this fi eld of research was undoubtedly
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 130 8/24/2012 6:36:30 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
131
Raymond Cattell. Cattell believed that self-reports (as well as other sources of data) have several serious limitations: They are susceptible to both voluntary (e.g., faking) and involuntary (lit- tle, or biased, introspective knowledge) biases. He argued that it is therefore a vital task for personality psychologists to make progress with objective tests or T-data.
The Objective Analytic Test Batteries
Cattell (1957) defi ned an objective test as a test that can be objectively scored and in which individuals being measured are unaware of what is being measured. This description of tests refl ects Cattell’s concept of personality as the totality of behav- ior. According to Cattell, personality characteristics should be inferred from what a person does rather than from what the person says. In the 1950s, with this perspective in mind, Cattell and his colleagues launched a research program, assembling a broad sample of nonquestionnaire behaviors that could be used in tests as indicative of personality traits. Once these behaviors were composed and tested, the researchers attempted to eluci- date by factor analysis the number of objectively derived per- sonality dimensions. By the mid-1960s, Cattell had listed over 500 objective tests comprising more than 2,000 personality variables (Cattell & Warburton, 1967). These were later com- bined into a battery (collection) of tests known as the Objective Analytic Test Batteries (Cattell & Schuerger, 1978).
It is, of course, beyond the scope of this chapter to describe even a small percentage of these tests. Suffi ce it to say that advo- cates of such techniques attempted to derive certain personal- ity characteristics based on how a person reacts to, or responds in, these tasks. For instance, one example is Thurntone’s early attempt to measure—objectively—individual differences in persistence. To do this, Thurntone (as reported by Hundleby et al., 1965), in addition to self-reports, measured factors such as amount of time spent on an unsolvable perceptual ability test, time spent in breath holding, strength of hand grip, number of
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 131 8/24/2012 6:36:30 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
132
familiar words written in a word-building test, and more. This may give you an idea of the variety of tests now classifi ed as part of the Objective Analytic Test Batteries. Other examples of these tests include speed of tapping, speed of reading when asked to read at one’s usual rate, selection of acceptable versus deviating book titles, faster speed of social judgment, larger number of things disliked, number of admissions of minor wrongdoings or frailties, longer estimates of time to do tasks, and amount of movement when seated in a chair.
Although these tests seem to provide an appealing alter- native to self-report assessment, their use has been rather lim- ited outside a small group of researchers (mostly colleagues of Cattell or those otherwise infl uenced by him). Indeed, Cattell’s research program was practically abandoned throughout sub- sequent decades. Kline (1993) notes that, as a consequence of this, the validity of these tests is virtually unknown. So, why were these tests abandoned? Several reasons have been listed, including low face validity (see section Validity), diffi culties in constructing objective tasks, and diffi culties in administration of these tests. These explanations essentially indicate that tests included in the objective analytic test battery were, quite sim- ply, impractical. There were, nevertheless, other, and perhaps more serious, problems. A major such concern, for instance, was the personality structure extracted by these tests. Schuerger (2008) recently noted that “even under the very good condi- tions under which the [Objective Analytic] Kit research was conducted, the factor structure might be considered well dem- onstrated for only six of the hypothesized factors” (p. 543). In addition, research has shown that factors that were extracted, in some cases, loaded on ability tests.
Recent Developments in Objective
Measures of Personality
In recent years, there has been an increased interest in alter- native methods for objectively assessing personality. One
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 132 8/24/2012 6:36:30 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
133
compelling example is the Implicit Association Test (IAT). The IAT is based on recent approaches to the study of “implicit” cognition. According to social cognition research, people have two different systems of information processing, the refl ec- tive and the impulsive system (Strack & Deutsch, 2004). While the refl ective system is rational and consciously accessible (for instance, consciously refl ecting on whether one is extraverted or not when asked), in the impulsive system, information is processed automatically. Specifi cally, this processing occurs by spread of activation between concepts that are linked in the memory. For instance, for you, the concept of “party” may be closely linked, in memory, with the concept of, say, “fun”—and less closely linked with the concept of “anxiety”; the concept of “exam,” on the other hand, may be closely linked with the con- cept of “anxiety,” and less closely linked to the concept of “fun.” Similarly, people also have personality trait–self-concept link- ages in memory, such as, for instance, the concept of “I” being closely linked with the concept of “talkative” and less closely linked with the concept of “shy” and so on.
IAT is a computer-based method for assessing these auto- matic associations that exist between concepts in a person’s memory. The strength (or closeness) of linkages is measured by the time it takes for people to react to associations between specifi ed concepts. For instance, during the test, participants may be asked to associate as fast as possible a target, such as “me” or “others,” with an attribute, such as “outgoing” or “shy,” by pressing a response key. They are exposed to such asso- ciations through a series of blocks that alternate in the pre- sentation of the target and attribute (e.g., “me” + “outgoing,” “me + “shy,” “other” + “outgoing,” “others” + “shy,” etc.). The notion is that some personality attributes will be more strongly associated with the “self” in memory than others (Asendorpf et al., 2002). This should therefore be “detected” in the IAT by observed differences in respondents’ reaction times. If a per- son, for instance, is quicker in combining “me + outgoing” and “others + shy,” than he or she is in combining “me + shy” and
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 133 8/24/2012 6:36:30 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
134
“others + outgoing,” then one could conclude that the person’s self-concept is more closely associated with the personality attribute of “outgoing” than “shy.”
It is worth noting that IAT concepts presented to partici- pants are not dissimilar to those presented in self-reports. For instance, rather than asking a person how assertive he or she is on a scale from 1 to 5, IATs may present the concepts of “me” and “assertive” and record the person’s reaction time to this combination, compared to, say, the reaction time for the com- bination “me” and “compliant.” The fundamental difference between these techniques (and in essence what makes one a subjective and the other an objective test) is that the speed at which the IAT tasks are carried out (often milliseconds) does not allow for introspection. This makes refl ective, and therefore subjective, judgments (including faking) highly unlikely. One could, therefore, argue that IATs are essentially an objective (or less subject to conscious “intrusion”) form of gathering self- reports. Another way of putting it could be to say that they are like self-reports with a lie detector attached to them. This is, of course, the gold standard for personality assessment. This raises the question as to whether there is evidence to demonstrate that IATs may indeed be as valid alternatives as the theory that they are based on claims.
First, it should be noted that the application of IATs in personality assessment (and in general) is relatively new (Greenwald et al., 1998). Nevertheless, some recent studies have provided support for the validity of IAT measures. For instance, Steffens and Konig (2006) found that an implicit measure of Conscientiousness predicted conscientious behavior in an experimental task better than the questionnaire measures of the same trait. Interestingly, this study did not fi nd any asso- ciation between the IAT measure of Conscientiousness and the self-report measure of Conscientiousness. Furthermore, in a recent meta-analysis, Greenwald, Poehlman, Uhlmann, and Banaji (2009) found compelling evidence for the predic- tive validity of IATs. In particular, IATs outperformed explicit
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 134 8/24/2012 6:36:31 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
HOW IS PERSONALITY ASSESSED?
135
measures (e.g., self-reports) in the prediction of behaviors, or “judgments,” such as stereotyping and prejudice. Such judg- ments are assumed to be made below conscious awareness; therefore, the validity of IATs to predict them makes theoretical sense. Finally, research has shown that IATs have both additive (Schnabel et al., 2006), interactive (McGregor et al., 2005), and double dissociative or “distinct” (i.e., IATs predicting sponta- neous, while explicit measures predicting controlled, behavior; Asendorpf et al., 2002) validity in predicting criteria.
It is clear that implicit measures of personality are intrigu- ing research tools for personality psychologists. They also seem to have practical usefulness in predicting behavior, sometimes beyond explicit measures. Their main appeal, as with other objective measures, is that they seem to tap into the inaccessible parts of personality—aspects of a person’s self-concept, which he or she may not even be aware of. As yet, however, they have not been widely researched. Furthermore, there remain several critical points these tests need to address. For instance, while IATs usually meet internal consistency standards, investigations of their test–retest reliability are less encouraging. Furthermore, meta-analysis has demonstrated that IATs have low concurrent validity. For instance, studies fi nd little convergence between supposedly identical constructs measured by IATs and self- reports (Schnabel, Asendorpf, & Greenwald, 2008). IATs also show low association with other implicit measures, such as priming procedures. This indicates that IATs may predict unique variance but are unlikely to capture the broadest aspect of personality. Given that they do show evidence for incremen- tal validity, one can argue that IATs may be complementary to self-reports—though they are unlikely to replace them. For instance, while Greenwald et al.’s meta-analysis found IATs to predict implicit judgments made below the level of conscious- ness (e.g., stereotyping and prejudice), explicit measures out- performed IATs in domains requiring refl ection, such as brand preferences or political candidate preferences. Finally, there are doubts about the explanations of IAT effects. For instance, it
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 135 8/24/2012 6:36:31 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .
CHAPTER 3
136
remains unclear whether IAT participants respond to the actual (semantic) meaning of the attributes presented or simply on their positive and negative “tone” (e.g., shy = negative, confi - dent = positive; cf. Schnabel et al., 2006).
Psychophysiological Measures
Other signifi cant steps in objective testing have been made within the fi eld of psychophysiology. Psychophysiological mea- surement attempts to increase our scientifi c understanding of the physiological functions that contribute to personality dif- ferences. One aim of psychophysiological measurement is to elucidate the biological processes underlying factor-analytically derived dimensions of personality. We have already outlined some of the research in this area in Chapter 2. However, as with other objective tests, a second objective of these measures is to actually eradicate subjectivity and cognitive biases completely. In a sense, they are attempts at measuring personality directly, rather than indirectly through self-reports or investigations of explicit behavior (which is essentially what other objective tests reviewed above attempt). Furedy (2008, p. 295) uses the metaphor “Psychophysiological Window on Personality” to exemplify this fi eld. Given that people have limited control over much of the recorded biological signals in physiological measures, this seems a promising method for achieving the lat- ter objective.
Although many techniques in psychophysiology have emerged only in the past couple of decades, today there are a large number of physiological tests available. Neuroimaging techniques are among the most popular of these; they include MRI, fMRI, PET, and EEG. Personality researchers often have to make informed (and cost-effective) choices among these mea- sures, and the specifi c neuroimaging technique used in a partic- ular study will often depend on the specifi c theoretical question. For instance, a researcher interested in how various brain struc- tures relate to individual differences may employ MRI in order
Ahmetoglu_PTR_Ch-03_24-08-12_101-156.indd 136 8/24/2012 6:36:31 PM
Chamorro-Premuzic, T., & Ahmetoglu, G. (2012). Personality 101. Retrieved from http://ebookcentral.proquest.com Created from claflin on 2018-05-22 11:12:40.
C op
yr ig
ht ©
2 01
2. S
pr in
ge r P
ub lis
hi ng
C om
pa ny
. A ll
rig ht
s re
se rv
ed .