week 6
306
C H A P T E R 8
Foundations of Personality Testing
I n psychological testing a fundamental distinction often is drawn between ability tests and personality tests. Defined in the broadest sense, ability tests include a plethora of instruments for measuring intelligence, achievement, and aptitude. In the preceding seven
chapters we have explored the nature, construction, application, reliability, and validity of ability tests. In the next two chapters we shift the emphasis to personality tests and related matters. Personality tests seek to measure one or more of the following: personality traits, dynamic mo- tivation, symptoms of distress, personal strengths, and attitudinal characteristics. Measures of spirituality, creativity, and emotional intelligence also fall within this realm.
Theories of personality provide an underpinning for the multiplicity of instruments available in the field. For this reason, we begin this chapter with a survey of prominent per- sonality theories. The many ways in which theorists conceptualize personality clearly have impacted the design of personality tests and assessments. This is especially evident with projec- tive techniques such as the Rorschach inkblot method, which emanated from psychoanalytic conceptions of personality. Thus, in Topic 8A, Theories of Personality and Projective Tech- niques, in addition to the survey of personality theories, we have included an introduction to several instruments based on the turn-of-the-twentieth-century psychoanalytic hypothesis
Topic 8A Theories of personality and projective Techniques
Personality: An Overview
Psychoanalytic Theories of Personality
Type Theories of Personality
Phenomenological Theories of Personality
Behavioral and Social Learning Theories
Trait Conceptions of Personality
The Projective Hypothesis
Association Techniques
Completion Techniques
Construction Techniques
Expression Techniques
Case Exhibit 8.1 Projective Tests as Ancillary to the Interview
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 307
in personality psychology, which he defines as “those questions that are simple, important, and central to many people’s lives.” He identifies 20 big questions, only a few of which can be addressed through testing and assessment. These questions involve existential matters such as the purpose of life, the nature of per- sonhood, and the difficulties encountered in seeking self-knowledge. His captivating article is a reminder that some vital issues can be approached through the empiricism of psychological research and testing, whereas other crucial matters remain elusive and are amenable mainly to philosophical and phenomeno- logical inquiry.
In addition to understanding personality, psy- chologists also seek to measure it. Literally hundreds of personality tests are available for this purpose; we will review historically prominent instruments and also discuss some promising new approaches. How- ever, in order that the reader can better compre- hend the diversity of instruments and approaches, we begin with a more fundamental question: How is personality best conceptualized? As the reader will discover, in order to measure personality we must first envision what it is we seek to measure. The reader will better appreciate the multiplicity of tests and procedures if we also briefly describe the per- sonality theories that comprise the underpinnings for these instruments.
PsyChoanalytiC thEoriEs of PErsonality
Psychoanalysis was the original creation of Sigmund Freud (1856–1939). While it is true that many others have revised and adapted his theories, the changes have been slight in comparison to the substantial foundations that can be traced to this singular ge- nius of the Victorian and early-twentieth-century era. Freud was enormously prolific in his writing and theorizing. We restrict our discussion to just those aspects of psychoanalysis that have influenced psychological testing. In particular, the Rorschach, the Thematic Apperception Test, and most of the projective techniques critiqued in the next topic dic- tate a psychoanalytic framework for interpretation. Readers who wish a more thorough review of Freud’s contributions can start with the New Introductory
where responses to ambiguous stimuli reveal the innermost, unconscious mental processes of the examinee. The coverage of personality assessment continues in Topic 8B, Self-Report and Behavioral Assessment of Psychopathology, which includes a review of structured tests and procedures, including self-report inventories and behavioral assessment approaches. These time- honored topics of Chapter 8— theories of personality, projective techniques, and structured personality tests—are followed by the relatively new focus of Chapter 9—the Evaluation of Normality and Individual Strengths.
PErsonality: an ovErviEw
Although personality is difficult to define, we can distinguish two fundamental features of this vague construct. First, each person is consistent to some ex- tent; we have coherent traits and action patterns that arise repeatedly. Second, each person is distinctive to some extent; behavioral differences exist between individuals. Consider the reactions of three gradu- ate students when their midterm examinations were handed back. Although all three students received nearly identical grades (solid B’s), personal reactions were quite diverse. The first student walked off sul- lenly and was later overheard to say that a complaint to the departmental administrator was in order. The second student was pleased, stating out loud that a B was, after all, a respectable grade. The third student was disappointed but stoical. He blamed himself for not studying harder.
How are we to understand the different reactions of these three persons, each of whom was responding to an identical stimulus? Psychologists and laypersons alike invoke the concept of personal-
ity to make sense out of the behavior and expressed feelings of others. The notion of personality is used to explain behavioral differences between persons (for example, why one complains and another is sto- ical) and to understand the behavioral consistency within each individual (for example, why the com- plaining student noted previously was generally sour and dissatisfied).
Why people differ is just one of many key issues in the study of personality. Mayer (2007–8) provides a thoughtful discussion of the big questions
308 Chapter 8 • Foundations of Personality Testing
techniques, it is evident from Rorschach’s view that the psychoanalytic conception of the unconscious had a strong influence on testing practices.
the structure of the Mind
Freud divided the mind into three structures: the id, the ego, and the superego. The id is the obscure and inaccessible part of our personality that Freud likened to “a chaos, a cauldron of seething excite- ment.” Because the id is entirely unconscious, we must infer its characteristics indirectly by analyzing dreams and symptoms such as anxiety. From such an analysis, Freud concluded that the id is the seat of all instinctual needs such as for food, water, sexual gratification, and avoidance of pain. The id has only one purpose, to obtain immediate satisfaction for these needs in accordance with the pleasure princi- ple. The pleasure principle is the impulsion toward immediate satisfaction without regard for values, good or evil, or morality. The id is also incapable of logic and possesses no concept of time. The chaotic mental processes of the id are, therefore, unaltered by the passage of time, and impressions that have been pushed down into the id “are virtually immor- tal and are preserved for whole decades as though they had only recently occurred” (Freud, 1933).
If our personality consisted only of an id striving to gratify its instincts without regard for real- ity, we would soon be annihilated by outside forces. Fortunately, soon after birth, part of the id develops into the ego or conscious self. The purpose of the ego is to mediate between the id and reality. The ego is part of the id and servant to it, but the ego “inter- polates between desire and action the procrastinating factor of thought” (Freud, 1933). Thus, the ego is largely conscious and obeys the reality principle; it seeks realistic and safe ways of discharging the in- stinctual tensions that are constantly pushing forth from the id.
The ego must also contend with the superego, the ethical component of personality that starts to emerge in the first five years of life. The superego is roughly synonymous with conscience and com- prises the societal standards of right and wrong that are conveyed to us by our parents. The superego is partly conscious, but a large part of it is unconscious, that is, we are not always aware of its existence or
Lectures on Psychoanalysis (Freud, 1933). Reviews and interpretations of Freud’s theories can be found in Stafford-Clark (1971) and Fisher and Greenberg (1984).
origins of Psychoanalytic theory
Freud began his professional career as a neurologist but was soon specializing in the treatment of hys- teria, an emotional disorder characterized by histrionic behavior and physical symptoms of psy- chic origin such as paralysis, blindness, and loss of sensation. With his colleague Joseph Breuer, Freud postulated that the root cause of hysteria was buried memories of traumatic experiences such as child- hood sexual molestation. If these memories could be brought forth under hypnosis, a release of emotion called abreaction would take place and the hysterical symptoms would disappear, at least briefly (Studies on Hysteria, Breuer & Freud, 1893–1895).
From these early studies Freud developed a general theory of psychological functioning with the concept of the unconscious as its foundation. He believed that the unconscious was the reservoir of instinctual drives and a storehouse of thoughts and wishes that would be unacceptable to our conscious self. Thus, Freud argued that our most significant personal motivations are largely beyond conscious awareness. The concept of the unconscious was dis- cussed in elaborate detail in his first book (The In- terpretation of Dreams, Freud, 1900). Freud believed that dreams portray our unconscious motives in a disguised form. Even a seemingly innocuous dream might actually have a hidden sexual or aggressive meaning, if it is interpreted correctly.
Freud’s concept of the unconscious penetrated the very underpinnings of psychological testing early in the twentieth century. An entire family of projective techniques emerged, including ink- blot tests, word association approaches, sentence completion techniques, and storytelling (appercep- tion) techniques (Frank, 1939, 1948). Each of these methods was predicated on the assumption that unconscious motives could be divined from an ex- aminee’s responses to ambiguous and unstructured stimuli. In fact, Rorschach (1921) likened his inkblot test to an X ray of the unconscious mind. Although he patently overstated the power of projective
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 309
Of course, because they distort reality, the rigid, excessive application of defense mechanisms may create more problems than it solves.
assessment of Defense Mechanisms and Ego functions
Although Freud introduced the concept of defense mechanisms, it was left to his followers to eluci- date these unconscious mental strategies in more detail (Paulhus, Fridhandler, & Hayes, 1997). Vaillant (1971) developed a hierarchy of ego defense mechanisms based on the assumption that some mechanisms are healthier or more adaptive than others. He suggested four broad types, listed here in ascending level of maturity: psychotic, immature, neurotic, mature. Each type includes specific defense mechanisms such as denial, projection, repression, and altruism, described below. Perry and Henry (2004) proposed a similar hierarchy of adaptation in defense mechanisms. They also developed a sophis- ticated rating scale, which, as we will see, is of value in clinical practice. A hierarchy of types of defense mechanisms (least mature to most mature) is pro- vided in Table 8.1.
Psychotic defense mechanisms are the least healthy because they distort reality to an extreme degree. One example includes gross denial of ex- ternal reality such as the refusal to acknowledge the death of a loved one. Another example is delu- sional projection, which consists of frank delusions about external reality, usually of a persecutory nature. The second grouping, Acting Out, comprises several forms of maladaptive action such as passive- aggressive behavior (e.g., intentional lateness to aggravate a partner), impulsive behavior designed to reduce tension, and complaining while simultane- ously rejecting help.
Borderline defense mechanisms include patterns of behavior often found in persons with a diagnosis of Borderline Personality Disorder ( American Psychiatric Association, 2000). The spe- cific mechanisms include splitting, in which the images of others (or self) alternate rapidly from all good to all bad, and projective identification which is the projection of an unwanted, unrecognized trait (like anger) onto others. Neurotic defense mecha- nisms, the fourth group, are found to some degree
operation. The function of the superego is to restrict the attempts of the id and ego to obtain gratification. Its main weapon is guilt, which it uses to punish the wrongdoings of the ego and id. Thus, it is not enough for the ego to find a safe and realistic way for the gratification of id strivings. The ego must also choose a morally acceptable outlet, or it will suffer punishment from its overseer, the superego. This explains why we may feel guilty for immoral behav- ior such as theft even when getting caught is impos- sible. Another part of the superego is the ego ideal, which consists of our aims and aspirations. The ego measures itself against the ego ideal and strives to fulfill its demands for perfection. If the ego falls too far short of meeting the standards of the ego ideal, a feeling of guilt may result. We commonly interpret this feeling as a sense of inferiority (Freud, 1933).
the role of Defense Mechanisms
The ego certainly has a difficult task, acting as me- diator and servant to three tyrants: id, superego, and external reality. It may seem to the reader that the task would be essentially impossible and that the in- dividual would, therefore, be in a constant state of anxiety. Fortunately, the ego has a set of tools at its disposal to help carry out its work, namely, mental strategies collectively labeled defense mechanisms.
Defense mechanisms come in many varieties, but they all share three characteristics in common. First, their exclusive purpose is to help the ego re- duce anxiety created by the conflicting demands of id, superego, and external reality. In fact, Freud felt that anxiety was a signal telling the ego to invoke one or more defense mechanisms in its own behalf. Defense mechanisms and anxiety are, therefore, complementary concepts in psychoanalytic theory, one existing as a counterforce to the other. The sec- ond common feature of defense mechanisms is that they operate unconsciously. Thus, even though de- fense mechanisms are controlled by the ego, we are not aware of their operation. The third characteris- tic of defense mechanisms is that they distort inner or outer reality. This property is what makes them capable of reducing anxiety. By allowing the ego to view a challenge from the id, superego, or external reality in a less-threatening manner, defense mecha- nisms help the ego avoid crippling levels of anxiety.
310 Chapter 8 • Foundations of Personality Testing
forms of humor that do not distort reality but that can ease the burden of matters “too terrible to be borne” (Vaillant, 1977). Specific kinds of mature mechanisms include:
Altruism: Vicarious but constructive and grat- ifying service to others.
Humor: Playful acknowledgment of ideas and feelings without discomfort and without unpleasant effects on others; does not include sarcasm.
Suppression: Conscious or semiconscious decision to postpone paying attention to a conscious conflict or impulse.
Anticipation: Realistic anticipation of or plan- ning for future inner discomfort; for example, realistic anticipation of surgery or separation.
Sublimation: Indirect expression of instinctual wishes without adverse consequences or loss of pleasure; for example, channeling aggres- sion into sports.
An example of humor as a mature defense mecha- nism would be former president Ronald Reagan’s quip to doctors in 1981 as he entered surgery for a bullet wound from his attempted assassina- tion. He is reported to have said, “I hope you’re all Republicans.”
Perry and colleagues developed the Defense Mechanism Rating Scales (DMRS) as a basis for assessing the level, type, and severity of defense mechanisms encountered in psychotherapy patients (Perry, 1990; Perry & Harris, 2004). The DMRS was devised for rating the presence of 30 discrete de- fense mechanisms (e.g., acting out, splitting, denial, projection, repression, intellectualization, altruism, etc.) in a 50-minute dynamically oriented interview. In the original scale, a 3-point qualitative rating of absent, probably present, or definitely present was obtained for each defense mechanism identified in a review of a videotaped session.
Subsequently, the test developers adopted a simple quantitative scoring approach in which de- fense mechanisms were isolated and identified in short, meaningful segments of the taped interview. They found that a typical therapy session includes anywhere from 15 to 75 illustrations of the various
in most persons and include repression (inexplicable memory lapses or failure to acknowledge informa- tion, such as “forgetting” a dental appointment) and displacement, which comprises the transfer of feel- ings from the real object onto someone or some- thing else, such as kicking the dog when angry with the boss.
Obsessive defense mechanisms also are very common and consist of mental patterns like isola- tion of affect or intellectualization. Isolation of affect involves the superficial acknowledgement of a feel- ing in the absence of a full emotional experience. In intellectualization, threatening matters are acknowl- edged but explored in bland terms that are relatively devoid of feelings. For example, Vaillant (1971) de- scribes a physician whose mother had died recently of cancer. The doctor talked at length about the medical characteristics of her illness, thereby easing his sense of loss.
Mature defense mechanisms appear to the be- holder as convenient virtues. An example is certain
tablE 8.1 A Hierarchy of Types of Defense Mechanisms (Least Mature to Most Mature)
Type Description and Examples
Psychotic Gross denial of external reality such as frank delusions; includes denial and distortion
Acting Out Maladaptive behaviors such as impulsive actions; includes passive-aggressiveness
Borderline Splitting the image of others into good and bad; includes splitting and schizoid fantasy
Neurotic Mechanisms that involve minor reality distortion; includes repression and displacement
Obsessive Somewhat adaptive mechanisms; includes isolation of affect and intellectualization
Mature Mature forms of defense with minor reality distortion; includes humor and sublimation
Source: Based on Perry and Henry (2004) and Vaillant (1977).
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 311
two drawbacks: The practitioner needs specialized training to identify defense mechanisms, and the process of collecting relevant information from pa- tients is very time-consuming.
tyPE thEoriEs of PErsonality
The earliest personality theories attempted to sort individuals into discrete categories or types. For example, the Greek physician Hippocrates (ca. 460–377 b.c.) proposed a humoral theory with four personality types (sanguine, choleric, melancholic, and phlegmatic) that was too simplistic to be useful. In the 1940s, Sheldon and Stevens (1942) proposed a type theory based on the relationship between body build and temperament. Their approach stimulated a flurry of research and then faded into obscurity. Nonetheless, typological theories have continued to capture intermittent interest among personality researchers. We will illustrate type theories by re- viewing contemporary research on coronary-prone personality types.
type a Coronary-Prone behavior Pattern
Friedman and Rosenman (1974) investigated the psychological variables that put individuals at higher risk of coronary heart disease. They were the first to identify a Type A coronary-prone behavior pattern, which they described as “an action–emotion com- plex that can be observed in any person who is aggressively involved in a chronic, incessant struggle to achieve more and more in less and less time, and if required to do so, against the opposing efforts of other things or persons” (Friedman & Rosenman, 1974). At the opposite extreme is the Type B be- havior pattern, characterized by an easygoing, non- competitive, relaxed lifestyle. Of course, people vary along a continuum from “pure” Type A to “pure” Type B.
Friedman and Ulmer (1984) have provided a detailed description of the full-fledged Type A be- havior pattern, and it is not an appealing picture. These individuals display a deep insecurity, regard- less of their achievements. They desire to dominate others, and typically are indifferent to the feelings of competitors. They exhibit a free-floating hostility,
defense mechanisms. Based on prior research, each defense mechanism receives a score from 1 (highly immature and maladaptive) to 7 (highly mature and adaptive). Although the scale offers a number of scoring options, the most useful score is the Over- all Defensive Functioning (ODF) score, which is the simple average of the ratings of the observed defense mechanisms. The theoretical range of scores is 1.0 to 7.0, although scores of 3.0 and below are rare. Scores below 5.0 indicate significant personality disorder or severe depression. Scores of 6.0 and higher indicate normal or healthy functioning. Interrater reliabili- ties from six studies were mostly in the mid- to high- .80s for the ODF scores. The stability coefficient for a small sample of patients over a one-month interval was a respectable .75 (Perry & Harris, 2004).
The ODF scores tend to improve over the course of dynamically oriented therapy, which sup- ports the validity of the construct being measured, maturity of defense mechanisms. In four studies involving one-month to one-year follow-up with small samples, the within-group effect sizes for gains in ODF scores ranged from .02 to 1.05, with most in the range of .41 to .82 (Perry Harris, 2004, Table 9.5). Effect sizes of this magnitude are considered moderate to large, that is, meaningful gains are be- ing accomplished, as registered by the increased ma- turity of the defense mechanisms emerging in the therapy sessions. The authors observe:
Defenses can be viewed as both process phe- nomena (psychological mechanisms in action) and as a measure of adaptive outcome, when aggregated across sessions and time. This gives the study of defenses great potential clinical relevance. To develop and test predictive hy- potheses about treatment will make the study of defense very relevant to daily clinical work, and both scientifically promising and exciting (Perry & Harris, 2004, p. 190).
The meaningful assessment of defense mechanisms largely has eluded clinical researchers, but instru- ments like the DMRS show promise of making key elements of psychoanalytic theory accessible to em- pirical validation (Perry, Beck, Constantinides, & Foley, 2009). However, this approach does have
312 Chapter 8 • Foundations of Personality Testing
Type A behavior can also be detected by paper-and- pencil tests (Jackson & Gray, 1987). However, the questionnaire approach is limited because it cannot reveal the facial, vocal, and psychomotor indices of hostility and time urgency that are usually evident in interview (Friedman & Ulmer, 1984).
Early studies indicated that persons who exhibited the Type A behavior pattern were at greatly increased risk of coronary disease and heart attack. In one 9-year study of more than 3,000 healthy men, persons with the Type A behavior pattern were 2½ times more likely to suffer heart attacks than those with Type B behavior pattern (Friedman & Ulmer, 1984). In fact, not one of the “pure” Type B’s—the extremely relaxed, easygoing, and noncompetitive members of the study—had suffered a heart attack. In the famous Framingham longitudinal study, Type A men ages 55 to 64 were about twice as likely at 10-year follow-up to develop coronary heart disease as Type B men (Haynes, Feinleib, & Eaker, 1983). In this study, the link between Type A behavior and coronary heart disease (CHD) was especially strong for white-collar workers.
PhEnoMEnologiCal thEoriEs of PErsonality
Phenomenological theories of personality emphasize the importance of immediate, personal, subjective experience as a determinant of behavior. Some of the theoretical positions subsumed under this title have been given other labels also, such as humanis- tic theories, existential theories, construct theories, self-theories, and fulfillment theories (Maddi, 2000). Nonetheless, these approaches share a common focus on the person’s subjective experience, personal world view, and self-concept as the major wellsprings of behavior.
origins of the Phenomenological approach
The orientation briefly reviewed in this section has numerous sources that reach back to turn-of- the-twentieth-century European philosophy and literature. Nonetheless, two persons, one a philoso- pher and the other a writer, stand out as seminal
and easily find things that irritate them. They also suffer from a sense of urgency about getting things done. Type A persons often engage in multitasking, such as reviewing correspondence while making a phone call. Almost beyond belief, one patient con- fessed to using two electric shavers, one for each hand (Friedman & Ulmer, 1984).
In other studies, researchers have found only a weak relationship—or no relationship at all— between Type A behavior and CHD (e.g., Eaker & Castelli, 1988; Smedslund & Rundmo, 1999). In the most comprehensive review of its kind, Myrtek (2007) conducted a meta-analysis of 25 prospective studies of Type A behavior and CHD and concluded flatly that “Type A behavior is not an independent risk factor for CHD.” Effect sizes in this review were not just small, they were effectively zero, on the or- der of .003. It did not matter whether structured in- terviews or questionnaires were used to assess Type A behavior. Myrtek (2007) also warns that the exis- tence of the concept itself can be dangerous because it provides patients an “external causal attribution” and relieves them of the responsibility for behavior change. The Type A concept also gives false benefit to physicians when they work with CHD patients who lack the usual risk factors (smoking, poor diet, lack of exercise). Blaming Type A behavior is easier than admitting that the causes of CHD sometimes are unknown.
Other researchers have found that CHD is linked not so much with the full-blown Type A be- havior pattern as with specific components such as being anger-prone (Dembroski, MacDougall, Williams, & Haney, 1985) or possessing time ur- gency (Wright, 1988). Wielgosz and Nolan (2000) identified hostility, cynicism, and suppression of anger, as well as stress, depression, and social isola- tion as significant risk factors in Type A behavior. Certainly there continues to be a need to sort out the specific risk factors in this area of investigation. What we do know with certainty is that the simple equation of Type A behavior causes CHD no longer is convincing.
Type A behavior can be diagnosed from a short interview consisting of questions about hab- its of working, talking, eating, reading, and think- ing (Friedman, 1996). The more flagrant cases of
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 313
I make strong demands on myself
I am a submissive person
I am likeable
The examinee is asked to sort a hundred or so statements into nine piles, putting a prescribed num- ber of cards into each, thus forcing a near- normal distribution. The instructions specify that the exam- inee put the cards most descriptive of him or her at one end, those least descriptive at the opposite end, and those about which he or she is indifferent or un- decided around the middle of the distribution. The required distribution might look like this:
Least Like Me Most Like Me
Pile No. 1 2 3 4 5 6 7 8 9
No. of cards 1 4 11 21 26 21 11 4 1
The nature of the items is determined by the needs of the researcher or practitioner. Rogers used a set of items devised by Butler and Haigh ( Rogers & Dymond, 1954, chap. 4) to tap the self-concept. These statements were taken at random from available therapeutic protocols; their Q-sort items represented actual client statements, reworded for clarity. But a special virtue of the Q-technique is that other researchers or practitioners are free to craft their own items. For example, Marks and Seeman (1963) used a psychodynamic perspective in devising items for the therapist description of patient groups. Examples of their items include the following:
Utilizes acting out as a defense mechanism
Tends to be flippant in both word and gesture
Genotype has paranoid features
Ap pears to be poised, self-assured, socially at ease
Exhibits depression (manifest sad mood)
Scoring a Q-sort is usually a matter of compar- ing or correlating the distribution of items against an established norm. For example, well-adjusted
contributors to the modern phenomenological viewpoint. The German philosopher Edmund Husserl (1859–1938) invented a complex philoso- phy of phenomenology that was concerned with the description of pure mental phenomena. Husserl’s approach was heavily introspective and nearly in- scrutable. More approachable was the Danish writer Søren Kierkegaard (1813–1855), well known for his contributions to existentialism. Existentialism is the literary and philosophical movement concerned with the meaning of life and an individual’s free- dom to choose personal goals. The phenomenology of Husserl and the existentialism of Kierkegaard influenced dozens of prominent philosophers and psychologists. Vestiges of these early viewpoints are evident in virtually every contemporary phenom- enological personality theory (Maddi, 2000).
Carl rogers, self-theory, and the Q-technique
The most influential phenomenological theorist was Carl Rogers (1902–1987). His contributions to personality theory, known as self-theory, are ex- tensive and generally well appreciated by students of psychology (Rogers, 1951, 1961, 1980). But it is also true, albeit little recognized, that Rogers helped shape a small part of psychological testing by popu- larizing the Q-technique.
The Q-technique is a procedure for studying changes in the self-concept, a key element in Rogers’s self-theory. The technique was developed by Stephenson (1953) but a series of studies by Rogers and his colleagues served to popularize this measurement approach (Rogers & Dymond, 1954). Also known as a Q-sort, the Q-technique is a gener- alized procedure that is especially useful for study- ing changes in self-concept.1 The Q-sort consists of a large number of cards, each containing a printed statement such as the following:
I am poised
I put on a false front
1The Q-technique has additional applications as well. Marks and Seeman (1963) employed Q-sorts by therapists to describe patients with specific MMPI profiles. Bem and Funder (1978) recommend a Q-sort to derive a profile of characteristics associated with successful performance of a specific task. Persons whose self-descriptions match the derived profile can be predicted to succeed at the selected task.
314 Chapter 8 • Foundations of Personality Testing
controlling a person’s behavior. The behavioral approach to personality has produced a variety of direct assessment methods, which we discuss in the next chapter.
Behavioral theorists disagree mainly on the role that cognitions play in determining behavior. Cognitions are inferred mental processes such as problem solving, judging, or reasoning. Radical be- haviorists believe that resorting to mentalistic expla- nations of any kind is futile: “When what a person does is attributed to what is going on inside him, investigation is brought to an end” (Skinner, 1974). By contrast, social learning theorists make cautious reference to cognitions in explaining what it is, spe- cifically, that a person learns. A social learning theo- rist might argue that we learn expectations or rules about the environment, not just stimulus and re- sponse connections.
Modern social learning theory can be viewed as a cognitive variant of the strict behaviorism that was dominant in U.S. psychology early in the twen- tieth century. Social learning theorists accept the Skinnerian premise that external reinforcement is an important determinant of behavior. But they also maintain that cognitions have a critical influence on our actions as well. For example, Rotter (1972) has popularized the view that our expectations about future outcomes are the primary determinants of behavior. The probability that a person will be- have self-assertively, for example, depends on his or her expectations about the likely results of self- assertiveness. If the expected outcome is valued by the person, the behavior is more likely. Of course, expectations are a function of the person’s history of reinforcement, so Rotter’s social learning perspec- tive is similar to the behavioral viewpoint. But the implication of social learning theory is that behavior is the result of a belief, in particular, a belief that the behavior will result in a desired outcome. Thus, cog- nitions are assumed to affect actions.
Based on his social learning views, Rotter (1966) developed the Internal-External (I-E) Scale, an interesting measure of internal versus external locus of control. The construct of locus of con-
trol refers to the perceptions that individuals have about the source of things that happen to them. In particular, the I-E Scale seeks to assess the exam- inee’s generalized expectancies for internal versus
persons might be asked to sort the items so as to derive an average pile placement number (ranging from 1 to 9) for each item. An individual examinee would be considered more- or less-adjusted accord- ing to the resemblance between his or her sortings and the average sorting for adjusted persons. We will refer the reader to Block (1961, 2008) for details.
Another way to use the Q-sort is to compare an examinee’s self-sort with his or her ideal sort. Rogers used the discrepancy between these two sort- ings as an index of adjustment. His subjects were required to sort the items twice, according to the fol- lowing instructions:
1. Self-sort. Sort these cards to describe yourself as you see yourself today, from those that are least like you to those that are most like you.
2. Ideal sort. Now sort these cards to describe your ideal person—the person you would most like within yourself to be (Rogers & Dymond, 1954).
Using the item pile numbers, Rogers then correlated the two sorts for each subject separately. Consider what these data mean: If the self-sort and the ideal sort are highly similar, the correlation of Q-sort data will approach 1.0; if the two sorts are opposite one another, the correlation will approach –1.0. Of course, most sorts will be somewhere in be- tween but typically on the positive side. Butler and Haigh found that psychotherapy clients increased their congruence between self and ideal (Rogers & Dymond, 1954, chap. 4). Even so, adjusted control subjects possessed a greater congruence.
bEhavioral anD soCial lEarning thEoriEs
Behavioral and social learning theories have their origins in laboratory studies on operant learning and classical conditioning. A fundamental assumption of all behavioral theorists is that many of the behaviors that make up personality are learned. To understand personality, then, we must know about the learning history of the individual. Behavioral theorists also believe that the environment is of supreme impor- tance in shaping and maintaining behavior. Behav- ioral inquiry, therefore, seeks to identify the specific components of the current environment that are
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 315
runs out the door. These differences in behavior illustrate the role of self-referential thought as a me- diator between knowledge and action. The boy who ran out the door did not believe he could deal with the situation effectively. He had little perceived self- efficacy for snake handling. Bandura would argue that the primary determinant of the boy’s behavior is a self-judgment about personal capabilities. Cog- nitions are, therefore, assumed to be a major deter- minant of behavior.
Bandura (1997, 2006) has developed an ap- pealing approach to the assessment of self-efficacy expectations outlined below. But he warns against the idea that there can be one all-purpose measure of perceived self-efficacy:
One cannot be all things, which would require mastery of every realm of human life. People differ in the areas in which they cultivate their efficacy and in the levels to which they develop it even within their given pursuits. For exam- ple, a business executive may have a high sense of organizational efficacy but low parenting efficacy. Thus, the efficacy belief system is not a global trait but a differentiated set of self- beliefs linked to distinct realms of functioning (Bandura, 2006, p. 307).
As a consequence, scales of self-efficacy need to be adapted to the particular domain of functioning of interest to the practitioner or researcher.
Fortunately, Bandura (2006) has outlined a strategy for developing self-efficacy scales. The start- ing point is a simple rating format, which resembles the following hypothetical example of a scale that school administrators might use with teachers to gauge classroom self-efficacy:
Classroom Questionnaire: We are interested in the areas of challenge that teachers face in the class- room. Your answers will not be disclosed to others. Please rate your degree of confidence for doing the things below using this scale:
0 10 20 30 40 50 60 70 80 90 100
Can’t Mildly Moderately Completely
Do Uncertain Certain Certain
Confidence:
(0 to 100)
external control of reinforcement. The purpose of the I-E Scale is to determine the extent to which the examinee believes that reinforcement is con- tingent upon his or her behavior (internal locus of control) as opposed to the outside world (external locus of control). The instrument is a forced-choice self- report inventory. For each item, the examinee chooses the single statement (from a pair) with which he or she more strongly concurs. Items re- semble the following:
In general, most people get the respect they deserve.
OR
In reality, a person’s worth often passes unrecognized.
For the preceding item, the first alternative indicates an internal locus of control, whereas the second alternative signifies an external locus of control. The balance of internal to external responses determines the overall score on the scale. The I-E Scale is a reli- able and valid instrument that has stimulated a huge body of research on the nature and meaning of locus of control and related variables. Research indicates that locus of control has a strong relationship to occupational success, physical health, academic achievement, and numerous other variables. As the reader might suspect, an internal locus of control generally predicts a more positive outcome than an external locus of control. The interested reader can consult Lefcourt (1991) for further details.
Important contributions to social learning theory have also been made by Albert Bandura. In his early studies, Bandura examined the role of ob- servational learning and vicarious reinforcement in the development of behavior (Bandura, 1965, 1971; Bandura & Walters, 1963). More recently, he has proposed that perceived self-efficacy is a central mechanism in human action (Bandura, 1982; Ban- dura, Taylor, Ewart, Miller, & DeBusk, 1985). Self-
efficacy is a personal judgment of “how well one can execute courses of action required to deal with prospective situations” (Bandura, 1982). The con- cept of self-efficacy is useful in explaining why cor- rect knowledge does not necessarily predict efficient action. For example, two boys may be equally con- vinced that a garden snake in the bathtub presents no hazard, but one will pick it up while the other
316 Chapter 8 • Foundations of Personality Testing
primarily in terms of whether traits are split off into finely discriminable variants or grouped together into a small number of broad dimensions:
1. Cattell’s factor-analytic viewpoint identifies 16 to 20 bipolar trait dimensions.
2. Eysenck’s trait-dimensional approach co- alesces dozens of traits into two overriding dimensions.
3. Goldberg and others have sought a modern synthesis of all trait approaches by proposing a five-factor model of personality.
For readers who desire a more detailed discussion of this topic, Pervin (1993) and Wiggins (1997) provide an excellent review of trait approaches to personality theory.
Cattell’s factor-analytic trait theory
Cattell (1950, 1973) refined existing methods of factor analysis to help reveal the basic traits of per- sonality. He referred to the more obvious aspects of personality as surface traits. These would typi- cally emerge in the first stages of factor analysis when individual test items were correlated with each other. For example, true–false items such as “I enjoy a good prize fight,” “Getting stuck behind a slow driver really bothers me,” and “It’s important to let people know who is in charge” might be an- swered similarly by subjects, revealing a surface trait of aggressiveness.
But surface traits themselves tended to come in clusters, as revealed by Cattell’s more sophis- ticated application of factor analysis. For Cattell, this was evidence of the existence of source traits, the stable and constant sources of behavior. Source traits are, therefore, less visible than surface traits but are more important in accounting for behavior.
Cattell (1950) was unrivaled in his use of factor analysis to discover how traits were organized and how they were related to each other. One ap- proach was to have persons rate others they knew well by checking various adjectives such as aggres- sive, thoughtful, and dominating from a list of 171 choices. When the results from 208 subjects were subsequently factor analyzed, about 20 underlying personality factors or traits were tentatively identi- fied. Another approach was to have thousands of
Maintain control of the classroom when lecturing _____
Keep students on track during hard assignments _____
Deal with individuals who keep talking out of turn _____
Teach students who don’t want to be in class _____
Teach students who have no parental support _____
Motivate students who resist doing homework _____
Keep the brightest students interested in class _____
This is a only a preliminary and generic example. A complete scale would be longer and would undergo a few iterative cycles of revision before final draft. In a recent and helpful chapter, Bandura (2006) also gives advice on how to construct the best self-effi- cacy scales, starting with issues of content validity, response bias, item analysis, and ending with strate- gies for validation of scales. Yet, regardless of their psychometric excellence, self-efficacy scales need to be practical. They should be judged by the extent to which, ultimately, they enable people to fulfill de- sired personal and social transformations (Bandura, 2006).
trait ConCEPtions of PErsonality
A trait is any “relatively enduring way in which one individual differs from another” (Guilford, 1959). Psychologists developed the concept of trait from the ways people describe other people in every- day life. As language evolved, people found words to portray the consistencies and differences they encountered in their daily interactions with oth- ers. Thus, when we say one person is sociable and another is shy we are using trait names to describe consistencies within individuals and also differences between them (Goldberg, 1981a; Fiske, 1986).
Trait conceptions of personality have been enormously popular throughout the history of psy- chological testing, so the coverage here is neces- sarily selective. We will review two prominent and influential positions from the dozens of trait theories that have been proposed. These approaches differ
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 317
as single terms in some or all of the world’s languages. (Goldberg, 1990)
When trait terms in English are distilled down to a reasonably distinct and nonoverlapping set of adjec- tives, a few hundred characteristics typically emerge (Allport, 1937). For decades, researchers have been asking individuals to rate themselves or others on these or similar traits. When these ratings are sub- jected to factor analysis, the “Big Five” dimensions previously listed usually appear in one guise or an- other. In sum, a mounting body of research indicates that the five-factor model captures a valid and useful representation of the structure of human traits.
The five-factor approach also possesses evo- lutionary plausibility. Specifically, the five factors of personality previously listed capture individual dif- ferences that relate to such basic evolutionary func- tions as survival and reproductive success (Buss, 1997; Pervin, 1993). Goldberg (1981b) has theorized that people implicitly ask the following questions in their interactions with others:
1. Is X active and dominant or passive and sub- missive? (Can I bully X or will X try to bully me?)
2. Is X agreeable (warm and pleasant) or dis- agreeable (cold and distant)?
3. Can I count on X? (Is X responsible and con- scientious or undependable and negligent?)
4. Is X crazy (unpredictable) or sane (stable)? 5. Is X smart or dumb? (How easy will it be for
me to teach X?)
Directly or indirectly, each of these evaluations has a bearing on survival and reproductive success. For example, point 3 (conscientiousness) involves a trait that might ensure group survival in a hostile world. A person low on this trait (undependable) would be a poor choice for guarding the food supply. The abil- ity to discern conscientiousness in others therefore has adaptive value. Not surprisingly, the five points previously listed correspond to the five-factor per- sonality model.
The five-factor model of personality has inspired several personality scales and other sys- tems for assessment (deRaad & Perugini, 2002). For example, Costa and McCrae have developed
persons answer questions about themselves and then factor-analyze their responses. Sixteen of the original 20 personality traits were independently confirmed by this second approach (Cattell, 1973). These 16 source traits have been incorporated into the Sixteen Personality Factor Questionnaire (16PF), a trait-based paper-and-pencil test of personality that is discussed in the next chapter.
the five-factor Model of Personality
The five-factor model of personality has its origins in a review chapter by Goldberg (1981b). In his analysis of factor-analytic trait research, Goldberg identified several consistencies, which he referred to as the “Big Five” dimensions. Although researchers have used slightly different terms for these factors, the most common labels are
Neuroticism Extraversion Openness to Experience Agreeableness Conscientiousness
Rearranging the factors yields a simple acronym: OCEAN. The five-factor model is rapidly becoming theconsensus model of personality. Support for the five-factor approach comes from several sources, in- cluding factor analysis of trait terms in language and the analysis of personality from an evolutionary per- spective. We discuss these perspectives in the following.
The use of trait terms in the analysis of per- sonality is based upon the fundamental lexical
hypothesis. The essential point of this hypothesis is that trait terms have survived in language because they convey important information about our deal- ings with others:
The variety of individual differences is nearly boundless, yet most of these differences are in- significant in people’s daily interactions with others and have remained largely unnoticed. Sir Francis Galton may have been among the first scientists to recognize explicitly the fun- damental lexical hypothesis—namely that the most important individual differences in hu- man transactions will come to be encoded
318 Chapter 8 • Foundations of Personality Testing
basis of trait scores and also attempted to distinguish the kinds of situations in which behavior is largely determined by traits (e.g., Mischel, Shoda, & Mendoza-Denton, 2002; Wasylkiw & Fekken, 2002). These efforts met with modest success, raising the validity of some trait questionnaires—in some con- texts with some persons—substantially beyond the ominous r = .30 barrier posited by Mischel (1968). But gone forever are the days of simplistic, general- ized assertions such as “trait X predicts behavior Y.”
thE ProjECtivE hyPothEsis
Frank (1939, 1948) introduced the term projective method to describe a category of tests for studying personality with unstructured stimuli. In a projec-
tive test the examinee encounters vague, ambiguous stimuli and responds with his or her own construc- tions. Disciples of projective testing are heavily vested in psychoanalytic theory and its postulation of unconscious aspects of personality. These exam- iners believe that unstructured, vague, ambiguous stimuli provide the ideal circumstance for revela- tions about inner aspects of personality. The central assumption of projective testing is that responses to the test represent projections from the innermost unconscious mental processes of the examinee. We introduce this topic with some preliminary concepts and distinctions relevant to projective testing.
The assumption that personal interpretations of ambiguous stimuli must necessarily reflect the unconscious needs, motives, and conflicts of the examinee is known as the projective hypothesis. Frank (1939) is generally credited with popularizing the projective hypothesis:
When we scrutinize the actual procedures that may be called projective methods we find a wide variety of techniques and materials being employed for the same general purpose, to ob- tain from the subject, “what he cannot or will not say,” frequently because he does not know himself and is not aware what he is revealing about himself through his projections.
The challenge of projective testing is to de- cipher underlying personality processes (needs, motives, and conflicts) based on the individualized,
two personality tests based on the five-factor model (Costa, 1991; McCrae & Costa, 1987). The Revised NEO Personality Inventory (NEO-PI-R) contains 240 items rated on a five-point scale. In addition to the five major domains of personality, the inventory measures six specific traits (called facets) within each domain. A shortened 60-item version known as the NEO Five-Factor Inventory (NEO-FFI) also is avail- able. Trull, Widiger, Useda, and others (1998) have published a semistructured interview for the assess- ment of the five-factor model of personality. These tests are discussed in the next chapter.
Comment on the trait Concept
All trait approaches to personality share certain problems in common. First, there is disagreement whether traits cause behavior or merely describe be- havior (Fiske, 1986). It can be persuasively argued that invoking traits as causes is an empty form of circular reasoning. For example, a person with ex- tremely high standards might be said to possess the trait of perfectionism. But when asked to explain what is meant by perfectionism, we invariably end up referring to a pattern of extremely high stan- dards. Thus, when we assert that someone is per- fectionistic, are we really doing anything more than providing a short-hand description of their past be- havior? Miller (1991) has voiced this criticism of the five-factor approach, noting that the model merely describes psychopathology but does not explain it.
A second problem with traits is their ap- parently low predictive validity. Mischel (1968) is credited with the first effective disparagement of the trait concept in his influential book Personality and Assessment. He stated that “while trait theory pre- dicts behavioral consistency, it is behavior inconsis- tency that is typically observed” (Mischel, 1968). In a wide-ranging review of existing research, Mischel noted that trait scales produced validity coefficients with an upper limit of r = .30. He coined the term personality coefficient to describe these low corre- lations. Undoubtedly significant for large samples of subjects, correlations of r = .30 are of minimal value in the prediction of individual behavior.
Trait researchers responded to Mischel’s at- tack by refining and limiting the trait concept. Researchers sought to identify subgroups of persons whose behavior could be accurately predicted on the
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 319
of the inkblots are black or shades of gray, while five contain color; each is displayed on a white background. An inkblot of the type employed by Rorschach is shown in Figure 8.1. The Rorschach is suited to persons age 5 and up but is most com- monly used with adults.
Regrettably, Rorschach died before he could complete his scoring methods, so the systematiza- tion of Rorschach scoring was left to his followers. Five American psychologists produced overlapping but independent approaches to the test—Samuel Beck, Marguerite Hertz, Bruno Klopfer, Zygmunt Piotrowski, and David Rapaport (Erdberg, 1985). Predictably, the nuances of scoring varied from one scoring method to another. Beginning in the 1990s, John Exner and his colleagues began to codify and synthesize the scoring approaches into a single method known as the Rorschach Comprehensive System (Exner, 1991, 1993; Exner & Weiner, 1994). The Comprehensive System (CS) supplanted all previous methods and became the preferred scor- ing system because it was more clearly grounded in empirical research. Even so, reservations about the Rorschach in general and the CS in particular per- sisted in the trade (Lilienfield, Wood, & Garb, 2000, 2001).
unique, subjective responses of each examinee. In the sections that follow we will examine how well projective tests have met this portentous assignment.
a Classification of Projective techniques
Lindzey (1959) has offered a classification of projec- tive techniques that we will follow here. Based on the response required, he divided projectives into five categories:
• Association to inkblots or words • Construction of stories or sequences • Completions of sentences or stories • Arrangement/selection of pictures or verbal
choices • Expression with drawings or play
Association techniques include the widely used Rorschach inkblot test and its psychometrically superior cousin the Holtzman Inkblot Technique, as well as word association tests. Construction tech- niques include the Thematic Apperception Test and the many variations upon this early instrument. Completion techniques consist mainly of sentence completion tests, discussed later. Arrangement/se- lection procedures such as the Szondi test (discussed in the first chapter) are currently seldom used. Finally, expression techniques such as the Draw-A- Person or House-Tree-Person test are very popular among clinicians in spite of dubious validity data.
We will review prominent techniques within each category except the antiquated arrangement/ selection approaches, which are almost never used. However, the literature on major projective tech- niques is simply overwhelming, running to perhaps tens of thousands of articles on the Rorschach alone. We can suggest major trends in the research, but the reader will need to consult other sources for com- prehensive reviews.
assoCiation tEChniQuEs
the rorschach
The Rorschach consists of 10 inkblots devised by Herman Rorschach (1884–1922) in the early 1900s. He formed the inkblots by dribbling ink on a sheet of paper and folding the paper in half, produc- ing relatively symmetrical bilateral designs. Five
figurE 8.1 An inkblot Similar to Those Found on the
Rorschach
320 Chapter 8 • Foundations of Personality Testing
that requires significant training. We can only refer to highlights here. Responses are scored for a num- ber of variables such as location, content, form quality, thought processes, and determinants. Determinants are different aspects of the blot such as color, shading, and form, which appear to have influenced examinee responses (Table 8.2).
Interrater reliability of R-PAS scores is excel- lent. Using a diverse sample of 50 Rorschach re- cords randomly selected from ongoing research, the median intraclass correlation coefficient (an index of agreement between raters) for 60 variables was .92 (Viglione, Blume-Marcovici, Miller, Giromini, & Meyer, 2012). Another useful feature of this new ap- proach to Rorschach scoring is the availability of an international reference sample for standardization of scoring variables. This sample of 1,396 protocols was obtained from 15 nations, including Australia, Brazil, Japan, Israel, and Spain—just to give a sense of the global distribution.
The validity of the Rorschach as scored with the R-PAS (or any other scoring system) is difficult to summarize in any simple manner. Individual studies indicate good validity for some purposes, but limited validity for other applications. For example, with the R-PAS, Complexity scores were correlated with functional capacity (r = .30) and social skills capacity (r = .34) in a sample of 72 middle-aged and older outpatients with schizophrenia (Moore, Viglione, Rosenfarb, Patterson, & Mausbach, 2012). Psychological complexity, as measured by the Complexity score, assesses the mental effort, in- tricacy, and integration evident in responses, with higher scores indicating better coping skills. Thus, it makes theoretical and empirical sense that psycho- logical complexity would show positive correlations with functional and social capacities. These results support the validity of this Rorschach variable.
Once the entire protocol has been coded, the examiner computes a number of summary scores that form the primary basis for hypothesizing about the personality of the examinee. For example, the F+ per- cent is the proportion of the total responses that uses pure form as a determinant. A voluminous literature exists on the meaning of this index, but it seems safe to hypothesize that when the F+ percentage falls below 70 percent, the examiner should consider the possibility
Beginning in about 2010, a new system for administration, scoring, and interpretation of the Rorschach emerged as the clear choice for practi- tioners. The Rorschach Performance Assessment System (R-PAS) represents an extension and im- provement of the CS (Meyer, Viglione, Mihura, Erard, & Erdberg, 2011). Erard (2012) provides a succinct summary of its appeal:
Despite its recent formal introduction to the professional assessment community, R-PAS takes advantage of decades of research in peer reviewed publications (including the insights of Rorschach critics) and builds on established validity and general acceptance for most of its procedures and features (p. 122).
In using the R-PAS, the examiner first establishes rapport and then sits to the side of the client or pa- tient to minimize body language communication. For each card, the examiner asks the respondent to look at the stimulus and to answer “What might this be?” Before the test, the examiner asks for “two, maybe three responses” per card. During the test, if only one reply is given, the examiner prompts for additional response(s), and pulls the card after four responses are provided. This is called response optimization, which elicits a typical range of 18 to 28 responses. This technique greatly reduces short and long records (protocols with upwards of 100 responses have been encountered), which affords a better fit with norms. The R-PAS incorporates sev- eral laudable improvements (www.r-pas.org):
• Evidence-based selection of scoring variables • Detailed guidelines for test administration • Methods to optimize the number of responses • Guidelines for clarifying coding uncertainties • Normative reference values for international
samples • Form quality tables for accuracy and conven-
tionality • Inexpensive scoring with a web-based program • Easy-to-read graphs with standard scores • Translations into several languages
Once the test is administered and the responses re- corded, scoring begins. This is an intricate process
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 321
capacity to deal effectively with stress. Meyer and Eblin (2012) discuss R-PAS variables and composites.
Frank (1990) has emphasized that formal scoring of the Rorschach is insufficient for some
of severe psychopathology, brain impairment, or intellectual deficit in the examinee (Exner, 1993). The F+ percent is also considered to be an index of ego strength, with higher scores indicating a greater
tablE 8.2 Summary of Major Rorschach Scoring criteria
Location: Where on the blot was the percept located?
W Whole Entire inkblot used
D Common detail Well-defined part used
Dd Unusual detail Unusual part used
White Space: Was white space used in the response?
SR Space reversal White space as the figure
SI Space integration White space integrated in percept
Content: What is seen, and is it synthesized or vague?
H Human Percept of a whole human form
Hd Human detail Human form incomplete in any way
Ex Explosion An actual explosion
Sy Synthesis Objects are meaningfully related
Vg Vagueness Objects in the percept are vague
2 Pair Two identical, mirror-image percepts
Form Quality: How well does the percept fit the blot?
o Ordinary Obvious and easily seen
u Unusual Unusual but still a good fit
2 Minus Distorted and unrealistic percept
P Popular Designated high frequency percept
Determinants: What feature of the blot determined the response?
M Movement Movement seen or implied in percept
C Color Color helped determine the response
F Form Form a major determinant of percept
T Texture Shading involved in the response
Thought Processes: Are there issues with thought processes or themes?
DV1 Deviant Verbalization-1 Odd or unusual verbalization
DV2 Deviant Verbalization-2 Clearly bizarre verbalization
MOR Morbid Response has a clearly dysphoric tone
Note: This list is incomplete and illustrative only. The full scoring system is complex and allows for
blends. For example, the determinant FC means that both form and color were used to determine the
percept, but form was more important than color.
Source: Based on Exner (1993) and Meyer et al. (2011).
322 Chapter 8 • Foundations of Personality Testing
Meyer and Handler (1997) used meta-analysis to synthesize the results of 18 validity studies of the RPRS, comprising a total sample of 752 participants. Their results translated to a 78 percent success rate in psychotherapy for clients with high scores on the RPRS, but only a 22 percent success rate for clients with low scores on the scale. The RPRS is a promising scale that should receive wider use in clinical practice.
Another useful scoring system for the Ror- schach is the Thought Disorder Index (TDI), which assesses formal thought disorder (Holtzman, Levy, & Johnston, 2005). Thought disorder exists on a continuum from mild slippage to bizarre disorga- nization and is especially characteristic of patients with schizophrenia. Thus, the assessment of thought disorder is pivotal in the diagnosis and treatment of individuals with schizophrenia or other serious mental illness.
The following examples of thought disorder are from Holzman et al. (2005). Mild examples would include clients with peculiar language that employs stilted, inappropriate, or odd expressions. For exam- ple, in responding to the Rorschach, a patient with mild thought disorder might use expressions such as “He’s organizing in his organs” or “There’s a segre- gation between mouth and nose” or “Red is trouble, and Africa being red symbolizes that maybe the origin of man was in Africa and that’s why it looks red.” As thought disorder becomes more prominent, Rorschach responses reveal increasingly queer and confused qualities. The patient might describe por- tions of the blot as being “A foxed comic dog” or “The adhesive adjunctive extensions” or “These are the posterior pronunciations.” Extreme examples of thought disorder show an incoherent quality such as “Blood, and break their neck, you know, reject” or the invention of words, for example, “The property is more closely centulated to the trailroads.”
The TDI is calculated by scoring each response for the severity level of thought disorder from none to extreme, with possible scores of 0, .25, .50, .75, and 1.0. Then the average score is computed across all responses. This number is multiplied by 100 to yield the final score on a range from 0 to 100. Thus, an overall score of 0 would mean that not one re- sponse revealed any thought disorder, whereas a score of 100 would signify that, without exception, every response was highly bizarre and disorganized.
purposes such as the diagnosis of schizophrenia. He stresses that an analysis of the patient’s thinking for the presence of highly personal, illogical, and bizarre associations to the blots is essential for psychodiag- nosis. In his approach, the Rorschach is really an ad- junct to the interview, and not a test per se.
Bornstein and Masling (2005) have reminded us that neither the CS nor the R-PAS should be con- fused with being “the Rorschach.” After all, there are many other helpful and validated approaches to scor- ing the test. Their book, Scoring the Rorschach: Seven Validated Systems (2005), is a wonderful compen- dium of alternative scoring systems that can be used to answer specialized assessment questions. A case in point is the Rorschach Prognostic Rating Scale (RPRS; Handler & Clemence, 2005), a promising and validated system for predicting who will be successful in psychotherapy and who will not. Scoring the RPRS is complex and consists of assigning or subtracting points for various categories of clearly defined re- sponses. For example, a positive score is given if a re- sponse depicts a human as dancing, running, talking, or pointing, whereas a score of zero is coded if hu- mans are seen as sleeping, lying down, sitting, or bal- ancing. The meaningful use of color in the response also contributes to a positive score, whereas using color to depict explosions or diseases results in points being subtracted. Several categories are scored, yield- ing a total score that ranges from –12 to +17. The following interpretations are then assigned to differ- ent ranges of the RPRS score:
17 to 13: The person is almost able to help himself. A very promising case that just needs a little help.
12 to 7: Not quite so capable as the previous case to work out his problems himself but with some help is likely to do pretty well.
6 to 2: Better than 50–50 chance; any treatment will be of some help.
1 to −2: 50–50 chance.
−3 to −6: A difficult case that may be helped somewhat but is generally a poor treatment prospect.
−7 to −12: A hopeless case. (Handler & Clemence, 2005, p. 54)
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 323
• Informed fakers who listened to a detailed audiotape about paranoid schizophrenia
• Normal controls who took the test under stan- dard instructions
The uninformed fakers, informed fakers, and normal controls were students who had passed an MMPI screening and were judged reasonably nor- mal during interview. Each protocol was rated by six to nine judges, all fellows of the Society for Person- ality Assessment. The judges were told to provide a psychiatric diagnosis as well as other information not reported here. The judges were not informed as to the purpose of the study but were told to assess whether any profiles appeared to be malingered.
The informed fakers must have done an excellent job, for they were more likely to be diag- nosed psychotic than the real patients themselves (72 percent versus 48 percent, respectively). The un- informed fakers were fairly convincing, too, with a 46 percent rate of diagnosed psychosis. The normal controls were diagnosed as psychotic 24 percent of the time. Granted that the diagnostic challenge in this study was immense, it is still disturbing to find that the expert judges rated 24 percent of the normal protocols as psychotic, while correctly identifying psychosis in only 48 percent of the actual psychotic protocols. A more recent study by Netter and Viglione (1994) also concluded that the Rorschach was susceptible to the faking of psychosis.
In general, critics portray the test as possess- ing low reliability and a general lack of predictive validity (Carlson, Kula, & St. Laurent, 1997; Wood, Nezworski, & Stejskal, 1996; Lilienfeld, Wood, & Garb, 2000). In their meta-analytic review, Garb, Florio, and Grove (1998) concluded that the Rorschach explained a dismal 8 to 13 percent of the variance in client characteristics, as compared to the MMPI, which explained 23 to 30 percent of the variance.
Supporters of the test cite improvements in scoring offered by the R-PAS approach and are more optimistic in their outlook (Meyer & Eblin, 2012). A recent study by McGrath, Pogge, Stokes, and oth- ers (2005) found that the Rorschach could be scored with respectable reliability, even in the less con- trolled conditions typical of real-world testing. This was an important finding because virtually all prior
The reliability of the TDI is reasonably good, with split-half correlations around .80 and interra- ter reliability coefficients of .90 and higher. Validity has been supported from a number of directions, such as huge improvements in scores when patients with schizophrenia are tested before and after com- prehensive interventions including drug therapies (Holtzman et al., 2005). Mastering the TDI scoring criteria is far easier than learning the Comprehen- sive System. Insofar as the TDI provides valuable information about the extent of thought disorder— one of the foremost reasons that practitioners use the Rorschach—we can expect to see increased reli- ance on this approach to test scoring.
Space does not permit us to summarize vali- dated scoring systems. These scales are derived largely from psychoanalytic theory and include an index of object relations, a measure of oral depen- dency, barrier and penetration indices based on body image, a measure of primary process thinking, and a scale that assesses primitive psychological de- fenses (Bornstein & Masling, 2005).
Comment on the rorschach
The Rorschach has provoked more controversy in the field of assessment than any other personality test or instrument. Opinions tend to be polarized, and both proponents and detractors cite studies and analyses to support their case. For example, crit- ics of the test refer to a fascinating study by Albert, Fox, and Kahn (1980) on the susceptibility of the Rorschach to faking. We remind the reader that literally thousands of Rorschach research studies have been published. In fact, a search of PsychINFO using the key title word Rorschach yielded 5,324 articles dating back to 1925 (the test was published in 1921). The majority of these studies are positive in tone. But the skeptical results reported by Albert, Fox, and Kahn (1980) are not isolated. They submit- ted the Rorschach protocols of 24 persons to a panel of experts, asking for psychiatric diagnoses of each examinee. The 24 Rorschach protocols consisted of results from four groups of six persons each:
• Mental hospital patients with a diagnosis of paranoid schizophrenia
• Uninformed fakers given instructions to fake the responses of a paranoid schizophrenic
324 Chapter 8 • Foundations of Personality Testing
CoMPlEtion tEChniQuEs
sentence Completion tests
In a sentence completion test, the respondent is presented with a series of stems consisting of the first few words of a sentence, and the task is to pro- vide an ending. As with any projective technique, the examiner assumes that the completed sentences reflect the underlying motivations, attitudes, con- flicts, and fears of the respondent. Usually, sentence completion tests can be interpreted in two different ways: subjective-intuitive analysis of the underly- ing motivations projected in the subject’s responses, or objective analysis by means of scores assigned to each completed sentence.
An example of a sentence completion test is shown in Figure 8.2. This test is quite similar to ex- isting instruments in that the stems are very short and restricted to a small number of basic themes. The reader will notice that three topics reoccur in this short test (the respondent’s self-concept, mother, and father). In this manner the examinee has multiple opportunities to reveal underlying motivations about each topic. Of course, most sen- tence completion tests are much longer—anywhere from 40 to 100 stems—and contain more themes— anywhere from 4 to 15 topics.
Dozens of sentence completion tests have been developed; most are unpublished and unstandard- ized instruments produced to meet a specific clinical need. Some representative sentence completion tests in current use are outlined in Table 8.3. Of these
studies of reliability have been conducted in research settings. In response to the ongoing controversy, the prestigious Society for Personality Assessment requi- sitioned external reviews by an independent panel of “blue ribbon” experts, who concluded that the Ror- schach possesses reliability and validity similar to other accepted tests like the MMPI-2. The trustees of the society assert that the continued use of the Ror- schach, therefore, is appropriate and justified (Board of Trustees for the Society for Personality Assess- ment, 2005).
The controversy over the Rorschach probably will subside for awhile, but it is not likely to disap- pear entirely. Even if the test continues to prevail because of studies supporting the reliability of scor- ing and the validity of inferences, there are other concerns seldom mentioned by skeptics. One liabil- ity is that learning the scoring system is an arduous and time consuming task that requires dozens of hours of practice and years of supervised experience. Some doctoral programs offer an entire course (or two) on the Rorschach, and this is just the beginning of the training needed. A second problem is that administering and scoring the Rorschach requires a few hours of professional time from a licensed psychologist. This time is a precious and expensive commodity. Someone has to pay for it. These practi- cal issues are daunting. In regard to learning the test in the first place, and devoting the time to admin- ister and score it in the second place, many clinical training directors and practitioners (and not a few insurance companies) are asking “Is it worth it?”
Directions: Finish these sentences to indicate how you feel.
1. My best characteristic is _____________________________________________________
2. My mother _______________________________________________________________
3. My father ________________________________________________________________
4. My greatest fear is _________________________________________________________
5. The best thing about my mother was ___________________________________________
6. The best thing about my father was ____________________________________________
7. I am proudest about ________________________________________________________
8. I only wish my mother had ___________________________________________________
9. I only wish my father had ____________________________________________________
figurE 8.2 Example of a Short Sentence completion Test
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 325
adult—each containing 40 sentence stems written mostly in the first person (Rotter & Rafferty, 1950; Rotter, Lah, & Rafferty, 1992). Although the test can be subjectively interpreted in the usual manner through qualitative analysis of needs projected in the subject’s responses, it is the objective and quan- titative scoring of the RISB that has drawn the most attention.
In the objective scoring system each completed sentence receives an adjustment score from 0 (good adjustment) to 6 (very poor adjustment). These scores are based initially on the categorizing of each response as follows:
• Omission—no response or response too short to be meaningful
• Conflict response—indicative of hostility or unhappiness
• Positive response—indicative of positive or hopeful attitude
• Neutral response—declarative statement with neither positive nor negative affect
Examples of the last three categories include:
I hate . . . the entire world. (conflict response)
The best . . . is yet to come. (positive response)
Most girls . . . are women. (neutral response)
Conflict responses are scored 4, 5, or 6, from lowest to highest degree of the conflict expressed. Positive responses are scored 2, 1, or 0, from least to most positive response. Neutral responses and omis- sions receive no score. The manual gives examples of each scoring category. The overall adjustment score is obtained by adding the weighted ratings in the conflict and positive categories. The adjustment score can vary from 0 to 240, with higher scores in- dicating greater maladjustment.
The reliability of the adjustment score is exceptionally good, even when derived by assistants with minimal psychological expertise. Typically, interscorer reliabilities are in the .90s and split-half coefficients are in the .80s (Rotter et al., 1992; Rotter, Rafferty, & Schachtitz, 1965). The validity of this in- dex has been investigated in numerous studies using the RISB as a screening device with a “maladjust- ment” cutoff score. For example, a cutoff score of 135 has been found to correctly screen delinquent
instruments, Loevinger’s Washington University Sentence Completion Test is the most sophisticated and theory-bound (e.g., Weiss, Zilberg, & Genevro, 1989). However, the Rotter Incomplete Sentences Blank has the strongest empirical underpinnings and is the most widely used in clinical settings. We examine this instrument in more detail.
rotter incomplete sentences blank
The Rotter Incomplete Sentences Blank (RISB) con- sists of three similar forms—high school, college, and
tablE 8.3 Brief outline of Representative Sentence completion Tests
Sentence completion Series psychological Assessment Resources
The SCS consists of 50 sentence stems designed to aid the clinician in identifying underlying concerns and specific areas of client distress. A unique feature of this instrument is the publication of eight different forms, parallel in content, which allow for repeated testing.
Forer Structured Sentence completion Test Western psychological Services
This instrument is available in separate forms for men, women, adolescent boys, and adolescent girls. Each form contains 100 sentence stems designed to cover attitude–value systems, evasiveness, and defense mechanisms.
Geriatric Sentence completion Form psychological Assessment Resources
The GSCF is a 30-item form specifically developed for use with older adult clients. The GSCF elicits personal responses to four content domains: physical, psychological, social, and temporal orientation. The test manual includes a number of clinical case illustrations.
Washington University Sentence completion Test, privately published by Loevinger
The WUSC uses separate forms for men, women, and younger male and female subjects. This test is highly theory-bound; responses are classified according to seven stages of ego development: presocial and symbiotic, impulsive, self-protective, conformist, conscientious, autonomous, integrated.
326 Chapter 8 • Foundations of Personality Testing
The TAT was developed by Henry Murray and his colleagues at the Harvard Psychological Clinic (Morgan & Murray, 1935; Murray, 1938). The test was originally designed to assess constructs such as needs and press, elements central to Murray’s personality theory. According to Murray, needs organize perception, thought, and action and ener- gize behavior in the direction of their satisfaction. Examples of needs include the needs for achieve- ment, affiliation, and dominance. In contrast, press refers to the power of environmental events to in- fluence a person. Alpha press is objective or “real” external forces, whereas beta press concerns the sub- jective or perceived components of external forces. Murray (1938, 1943) developed an elaborate TAT scoring system for measuring 36 different needs and various aspects of press, as revealed by the examin- ee’s stories.
Almost as soon as Murray released the TAT, other clinicians began to develop alternative scoring systems (e.g.,Dana, 1959; Tomkins, 1947). Literature on the administration, scoring, and interpretation of the TAT burgeoned extensively, as documented by reviews (Aiken, 1989, chap. 12; Groth-Marnat, 1997; Weiner & Kuehnle, 1998). By the 1950s, there was no single preferred mode of administration, no single preferred system of scoring, and no single preferred
youths 60 percent of the time while identifying nondelinquent youths correctly 73 percent of the time (Fuller, Parmelee, & Carroll, 1982). The same cutoff identifies heavy drug users 80 to 100 percent of the time (Gardner, 1967). These and similar find- ings support the construct validity of the adjustment index but also indicate that classification rates are much lower than needed for individual decision making or effective screening. It also appears that the norms for the adjustment index are outdated. Lah and Rotter (1981) found that student scores dif- fer significantly from those obtained in the original study by Rotter and Rafferty (1950). Lah (1989) and Rotter et al. (1992) provide new normative, scoring, and validity data for the RISB.
As discussed by P. Goldberg (1965), the sim- plicity of the single adjustment score is both the test’s strength and weakness. True, the test provides a quick and efficient method for obtaining an overall index of how respondents are functioning on a day- to-day basis. However, a single score cannot possibly capture any nuances of personality functioning. In addition, the RISB is subject to the same types of bias as other self-report measures, namely, the informa- tion will reflect mainly what the respondent wants the examiner to know.
ConstruCtion tEChniQuEs
the thematic apperception test (tat)
The TAT consists of 30 pictures that portray a variety of subject matters and themes in black-and- white drawings and photographs; one card is blank. Most of the cards depict one or more persons en- gaged in ambiguous activities. Some cards are used for adult males (M), adult females (F), boys (B), or girls (G), or some combination (e.g., BM). As a con- sequence, exactly 20 cards are appropriate for every examinee.
A picture similar to those on the TAT is shown in Figure 8.3. In administering the TAT, the exam- iner requests the examinee to make up a dramatic story for each picture, telling what led up to the cur- rent scene, what is happening at the moment, how the characters are thinking and feeling, and what the outcome will be. The examiner writes down the story verbatim for later scoring and analysis.
figurE 8.3 A picture Similar to Those on the
Thematic Apperception Test
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 327
This person just had a hard physical workout. I guess it’s a her. She’s just tired. No trauma happened or anything. She was sitting around a table with friends and she got real tired. She’s not in a health danger or anything. These are her keys. Her friends drag her back to her room and put her to bed. She’s O.K. the next day. No trauma. She’s tired physically, not mentally. (Ryan, 1987)
What stands out in this response is the repetitive de- nial of danger or trauma. But later in the testing, the denial of trauma is no longer maintained. Read how the examinee responded to the blank card, relating a story of a young man, traumatized at school, who takes his car down to the river:
He sees the bridge, he’s really down. He re- members that he’s heard stories about people jumping off and killing themselves. He could never understand why they did that. Now he understands, he jumps and dies . . . he should have waited ’cause things always get better sometime. But he didn’t wait, he died. (Ryan, 1987)
Most clinicians would conclude that the examinee who produced these stories had been traumatized and was defending against self-destructive impulses. Correspondingly, the clinician would be well ad- vised to explore these issues in psychotherapy.
The psychometric adequacy of the TAT is difficult to evaluate because of the abundance of scoring and interpretation methods. Clinicians de- fend the test on an anecdotal basis, pointing out remarkable and confirmatory findings such as illus- trated here. However, data-minded researchers are more cautious. One problem is that formally scored TAT protocols possess very low test–retest reliability, with a reported median value of r = .28 (Winter & Stewart, 1977). Furthermore, an astonishing 97 percent of test users employ subjective and “per- sonalized” procedures for interpreting the TAT; that is, only a tiny fraction of clinical practitioners rely on
method of interpretation, a predicament that still endures today. Clinicians even vary the wording of the instructions and commonly select an individual- ized subset of TAT cards for each client. Indeed, the absence of standardized procedures is such that we should rightly regard the TAT as a method, not a test.
It is worth mentioning that Murray’s instruc- tions included a statement that the TAT was “a test of imagination, one form of intelligence” and further stipulated:
I am going to show you some pictures, one at a time; and your task will be to make up as dra- matic a story as you can for each. Tell what has led up to the event shown in the picture, de- scribe what is happening at the moment, what the characters are feeling and thinking; and then give the outcome. Speak your thoughts as they come to your mind. Do you understand? Since you have fifty minutes for ten pictures, you can devote about five minutes to each story. Here is the first picture. (Murray, 1943)
Currently, clinicians downplay the emphasis on imagination and intelligence when giving in- structions. Surely, this omission must influence the quality of the stories produced.
Even though more than a dozen scoring sys- tems have been proposed, interpretation of the TAT is usually based on a clinical-qualitative analysis of the story productions. A central consideration harks back to Murray’s “hero” assumption. According to this viewpoint, the hero is the protagonist of the examinee’s story. It is assumed that the examinee clearly identifies with this character and projects his or her own needs, strivings, and feelings onto the hero. Conversely, thoughts, feelings, or actions avoided by the hero may represent areas of conflict for the examinee. A specific example will help clarify these points. Consider the response to Card 3BM given by a depressed examinee2:
Looks like . . . I can’t tell if it’s a girl or boy. Could be either. I guess it doesn’t matter.
2Card 3BM depicts one person—arguably male or female—kneeling or slumped over on a couch with head bowed on one arm. In the cor- ner is a vaguely drawn object interpreted by some examinees to be a handgun or other weapon.
328 Chapter 8 • Foundations of Personality Testing
• About half of the pictures had to depict humans showing positive affective expression (e.g., smiling, embracing, dancing).
• About half of the pictures had to depict humans in active poses, not simply standing, sitting, or lying down.
In an initial pilot study, the authors compared TAT and PPT story productions of eight undergrad- uates on several variables such as length of stories, emotional tone, and activity level (Ritzler, Sharkey, & Chudy, 1980). Compared to the TAT productions, the PPT stories were of comparable length but were much more positive in thematic content and emo- tional tone. The PPT stories were also much more active, meaning that the central character had an active, self-determined effect on the situation in the story. Furthermore, the PPT stories placed greater emphasis on interpersonal rather than intrapersonal themes. In other words, the PPT stories placed more emphasis on “healthy,” adaptive aspects of personal- ity adjustment than did the TAT productions.
The PPT developers also compared their instrument against the TAT in a diagnostic valid- ity study (Sharkey & Ritzler, 1985). PPT and TAT story productions of 50 subjects were compared: normals, nonhospitalized depressives, hospital- ized depressives, hospitalized psychotics with good premorbid histories, and hospitalized psychotics with poor premorbid histories (10 subjects in each group). Although the TAT and PPT were essentially equal in their capacity to discriminate normal from depressed subjects, the PPT was superior in differen- tiating psychotics from normals and depressives. On the PPT, depressives told stories with gloomier emo- tional tone and psychotics made more perceptual distortions, and thematic/interpretive deviations. The PPT appears to be a very promising instru- ment, although it is obvious that further research is needed on its psychometric qualities. One note- worthy feature is that anyone can purchase the PPT stimuli at their local bookstore. The requisite mate- rials are found in the Family of Man photo collection ( Museum of Modern Art, 1955).
Children’s apperception test
Designed as a direct extension of the TAT, the Children’s Apperception Test (CAT) consists of 10
a standardized scoring system (Lilienfeld, Wood, & Garb, 2001). This is troubling because a consistent theme in research on projective testing is that intui- tive interpretations are likely to overdiagnose psy- chological disturbance.
In addition to clinical applications, the TAT has received considerable use for research purposes. For example, Turk, Brown, Symington, and Paul (2010) examined the content of TAT stories from 22 persons with agenesis of the corpus callosum (ACC), a congenital brain disorder in which the pathways connecting the two cerebral hemispheres are par- tially or completely absent. They used the linguistic inquiry software of James Pennebaker (Tauszcik & Pennebaker, 2010) to count words in psychologi- cally meaningful categories. Compared to age- and IQ-matched controls, the ACC individuals used fewer words pertaining to emotionality, cognitive processes, and social processes, indicating that they experienced greater difficulty imagining and infer- ring the mental and emotional states of others. In this research application, the TAT proved helpful for enhancing our understanding of the unique qualities of persons with ACC.
the Picture Projective test
The Picture Projective Test (PPT) is an attempt to construct a general-purpose instrument with im- proved psychometric qualities (Ritzler, Sharkey, & Chudy, 1980; Sharkey & Ritzler, 1985). The devel- opers of the PPT note that the majority of the TAT pictures exert a strong negative stimulus “pull” on storytelling. The TAT cards are cast in dark, shaded tones and most scenes portray persons in low-key or gloomy situations. It is not surprising, then, that pro- jective responses to the TAT are strongly channeled toward negative, melancholic stories ( Goldfried & Zax, 1965).
In contrast, the PPT uses a set of pictures taken from the Family of Man photo essay published by the Museum of Modern Art (1955). The follow- ing criteria were used in selecting 30 pictures:
• The pictures had to show promise of eliciting meaningful projective material.
• Most but not all of the pictures had to include more than one human character.
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 329
pictures and is suitable for children 3 to 10 years of age. The preferred version for younger children (CAT-A) depicts animals in unmistakably hu- man social settings (Bellak & Bellak, 1991). The test developers used animal drawings on the as- sumption that young children would identify bet- ter with animals than humans. A human figure version ( CAT-H) is available for older children (Bellak & Bellak, 1994). No formal scoring system exists for the CAT and no statistical information is provided on reliability or validity. Instead, the ex- aminer prepares a diagnosis or personality descrip- tion based on a synthesis of 10 variables recorded for each story: (1) main theme; (2) main hero; (3) main needs and drives of hero; (4) conception of
tablE 8.4 Thematic Apperception Tests for Specific populations
Family Apperception Test
For children ages 6 and older, the Family Apperception Test consists of 21 cards depicting a family in various situations. For example, one card shows a family sitting around a table with parents talking while the children eat. As with the TAT, the examinee is asked to describe what led up to the scene, what is happening now, what will happen next, and what the main characters are feeling. The test is based on family systems theory. The manual provides a scoring guide for categories such as limit-setting, conflict resolution, boundaries, quality of relationships, and emotional tone (Sotile, Julian, Henry, & Sotile, 1988).
Blacky pictures
For children ages 5 and older, the Blacky Pictures test was also based on the premise that children identify more readily with animals than humans. The 11 cartoon stimuli depict the adventures of the dog Blacky and his family (Mama, Papa, and sibling Tippy). In addition to requesting a story for each card, the examiner also presents multiple-choice questions based on stages of psychosexual development derived from psychoanalytic theory (Blum, 1950). Although the test was originally developed with adults, children enjoy taking the Blacky and are quite responsive to the pictures. Problems with this test include the absence of norms, especially for children, and poor stability of scores (LaVoie, 1987).
Michigan picture Test-Revised
For older children ages 8 to 14 years, the MPT-R consists of 15 pictures and a blank card. Responses are scored for Tension Index (e.g., portrayal of personal adequacy), Direction of Force (whether the central figure acts or is acted upon), and Verb Tense (e.g., past, present, future). These three scores can be combined to yield a Maladjustment Index. Reliability and norms are adequate, although evidence of validity is unsatisfactory. A major problem with this test is that the cards portray interpersonal relationships so vividly that little is left to the child’s imagination (Aiken, 1989).
Senior Apperception Test (SAT)
Although the 16 situations depicted on the SAT cards include some positive circumstances, the majority of pictures were designed to reflect themes of helplessness, abandonment, disability, family problems, loneliness, dependence, and low self-esteem (Bellak, 1992). Critics complain that the SAT stereotypes the elderly and therefore discourages active responding (Schaie, 1978).
environment (or world); (5) perception of parental, contemporary, and junior figures; (6) conflicts; (7) anxieties; (8) defenses; (9) adequacy of super- ego; (10) integration of ego (including originality of story and nature of outcome) (Bellak, 1992). The lack of attention to psychometric issues of scoring, reliability, and validity of the CAT is troublesome to most testing specialists.
other variations on the tat
The TAT has inspired a number of similar tests designed for children and older adults (Table 8.4). In addition, modifications and variations of the TAT have been developed for ethnic, racial,
330 Chapter 8 • Foundations of Personality Testing
ExPrEssion tEChniQuEs
the Draw-a-Person test
As the reader will recall from an earlier chapter, Goodenough (1926) used the Draw-A-Man task as a basis for estimating intelligence. Subsequently, psy- chodynamically minded psychologists adapted the procedure to the projective assessment of personal- ity. Karen Machover (1949, 1951) was the pioneer in this new field. Her procedure became known as the Draw-A-Person Test (DAP). Her test enjoyed early popularity and is still widely used as a clinical assessment tool. Watkins, Campbell, Nieberding, and Hallmark (1995) report that projective drawings such as the DAP rank eighth in popularity among clinicians in the United States.
The DAP is administered by presenting the examinee with a blank sheet of paper and a pencil with eraser, then asking the examinee to “draw a person.” When the drawing is completed the exam- inee usually is directed to draw another person of the sex opposite that of the first figure. Finally, the examinee is asked to “make up a story about this person as if he [or she] were a character in a novel or a play” ( Machover, 1949).
Interpretation of the DAP proceeds in an en- tirely clinical-intuitive manner, guided by a number of tentative psychodynamically based hypotheses (Machover, 1949, 1951). For example, Machover maintained that examinees were likely to project acceptable impulses onto the same-sex figure and unacceptable impulses onto the opposite-sex figure. She also believed that the relative sizes of the male and female figures revealed clues about the sexual identification of the examinee. For example, draw- ing a man with large eyes and lashes was thought to indicate a homosexually inclined male.
These interpretive premises are colorful, in- teresting, and plausible. However, they are based entirely on psychodynamic theory and anecdotal observations. Machover made little effort to vali- date the interpretations. The empirical support for her hypotheses is somewhere between meager and nonexistent (Swensen, 1968). In favor of the DAP, the overall quality of drawings does weakly predict psychological adjustment (Lewinsohn, 1965; Yama,
and linguistic minorities. One of the first was the Thompson TAT (T-TAT) in which 21 of the original TAT pictures were redrawn with African American figures (Thompson, 1949). This TAT modification incorporated certain unintended changes—for example, in facial expressions and the situations portrayed. As a result, the T-TAT should be considered a new test and not just a TAT trans- lation suited to African American individuals (Ai- ken, 1989).
Another specialized TAT-like test is the TEMAS, which consists of 23 colorful drawings that depict Hispanic persons interacting in contempo- rary, inner-city settings (Aiken, 1989; Constantino, Malgady, & Rogler, 1988). TEMAS is Spanish for themes and an acronym for “tell me a story.” The thematic content of TEMAS stories is scored for 18 cognitive functions, 9 personality (ego) functions, and 7 affective functions. The test can also be scored for various objective indices such as reaction time, fluency, unanswered inquiries, and stimulus trans- formations (e.g., a letter is transformed into a bomb). Hispanic children respond well to the TEMAS, even though they may be inarticulate in response to tradi- tional projective tests.
The inconsistent reliability of the TEMAS is a source of concern, because reliability constrains va- lidity. The manual reports that Cronbach’s alpha for the 34 scoring functions ranged from .31 to .98 with half below .70. Test–retest reliabilities were even lower; the highest correlation was r = .53 and for 26 of the 34 functions the correlations were near zero! In spite of the questionable reliability of the instru- ment, several studies provide support for its concur- rent and predictive validity. For example, in a clinical sample of 210 Puerto Rican children, TEMAS scale scores predicted independent criteria of ego devel- opment, trait anxiety, and adaptive behavior reason- ably well, with correlations ranging from .27 to .51 (Malgady, Constantino, & Rogler, 1984). A steady stream of research has continued to bolster the util- ity of this instrument, as surveyed by Constantino & Malgady (1996). Flanagan and di Guiseppe (1999) provide a critical review of the TEMAS; Constantino and Malgady (2000) describe recent developments with the test.
www.ebook3000.com
Topic 8A • Theories of Personality and Projective Techniques 331
clinicians soon abandoned the use of the H-T-P as a measure of intelligence, and it is now used almost exclusively as a projective measure of personality.
Although we will not delve into any details here, the interpretation of the H-T-P rests on three general assumptions: the House drawing mirrors the examinee’s home life and intrafamilial relation- ships; the Tree drawing reflects the manner in which the examinee experiences the environment; and the Person drawing echoes the examinee’s interper- sonal relationships. Buck (1981) provides numerous interpretive hypotheses for both quantitative and qualitative aspects of the three drawings.
The H-T-P is an alluring test that has fascinated clinicians for more than 40 years. Unfortunately, Buck (1948, 1981) has never provided any evidence to support the reliability or validity of this instrument. Indeed, he is perhaps his own worst critic. At one point in his test manual, he even asserts that valida- tional research is not possible with the H-T-P (Buck, 1981, p. 164).
In general, attempts to validate the H-T-P as a personality measure have failed miserably (for reviews see Krugman, 1970; Killian, 1987). Thoughtful reviewers have repeatedly recom- mended the abandonment of the H-T-P and similar figure-drawing approaches to personality assess- ment. The popularity of the H-T-P has dropped off in recent years. A search of PsychINFO revealed only nine articles on the test since 2000, including four dissertations.
Many clinicians do not use projective methods as tests at all but as auxiliary approaches to the clini- cal interview. These practitioners use projective techniques as clinical tools to derive tentative hy- potheses about the examinee. Most of these hypoth- eses will turn out to be false when examined more closely. However, the few that are confirmed may have important implications for the clinical man- agement of the examinee. Furthermore, we suspect that these fruitful hypotheses might not emerge—or might emerge more slowly—if the practitioner re- lied entirely on the interview or used only formal tests with established reliability and validity (Case Exhibit 8.1). However, this assertion is difficult to test empirically.
1990). However, judged by contemporary standards of evidence, the sweeping and cavalier assessments of personality so often derived from the DAP are embarrassing. Some reviewers have concluded that the DAP is an unworthy test that should no longer be used (Gresham, 1993; Motta, Little, & Tobin, 1993).
Rather than using the DAP to infer nuances of personality, a more appropriate application of this test is in the screening of children suspected of be- havior disorder and emotional disturbance. For this purpose, Naglieri, McNeish, and Bardos (1991) de- veloped the Draw A Person: Screening Procedure for Emotional Disturbance (DAP:SPED). In one study, diagnostic accuracy of problem children was signifi- cantly improved by application of the DAP:SPED scoring approach (Naglieri & Pfeiffer, 1992).
the house-tree-Person test (h-t-P)
The H-T-P is a projective test that uses freehand drawings of a house, tree, and person (Buck, 1948, 1981). The examinee is given almost complete free- dom in sketching the three objects; separate pencil and crayon drawings are requested. Although the examiner can improvise an H-T-P Test with mere blank pieces of paper, Buck (1981) recommends the use of a four-page drawing form with identifica- tion information on the first page. Pages two, three, and four are titled House, Tree, and Person. Two drawing forms are needed for each examinee, one for pencil drawings and the other for crayon draw- ings. Buck (1981) also provides a separate four-page form for a postdrawing interrogation phase, which consists of 60 questions designed to elicit the ex- aminee’s opinions about elements of the drawings. Many practitioners feel the postdrawing interroga- tion phase is not worth the extended effort. Also, the value of separate crayon drawings is questioned (Killian, 1987).
The House-Tree-Person Test has much the same familial lineage as the Draw-A-Person Test. Like the DAP Test, the H-T-P Test was originally conceived as a measure of intelligence, complete with a quantitative scoring system to appraise an approximate level of ability (Buck, 1948). However,
332 Chapter 8 • Foundations of Personality Testing
drawings. In one drawing he depicted himself as a helicopter gunner, spraying bullets indiscriminantly into the jungle below. When questioned about this drawing, he became quite animated and confessed that he relished combat. Guided by the possible im- plications of the morbid drawing, the psychologist sought to learn more about the veteran’s attitudes toward combat. In the course of several interviews, the veteran revealed that he particularly enjoyed firing on moving objects—animals, soldiers, civilians—it made no difference to him. Gradually, it became clear that the young veteran was an incipient war criminal who was depressed because his injury would prevent him from returning to the front lines. Needless to say, this information had quite an im- pact on the tenor of the psychological report.
CASE EXHIBIT 8.1
Projective Tests as Ancillary to the Interview
A specific example may help to clarify the role of projective techniques as ancillary to the clinical in- terview. During the Vietnam War, a Veteran’s Ad- ministration psychologist tested a young soldier who had accidentally shot himself in the leg with a 45-caliber pistol while practicing quick draw in the jungle. Surgeons found it necessary to amputate the soldier’s leg from the knee down. He was quite depressed, and everyone assumed that he suffered from grief and guilt over his great personal tragedy. He was virtually mute and nearly untestable. How- ever, he was persuaded to complete a series of figure
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 333
This review takes in a variety of personality tests, including the Minnesota Multiphasic Personality Inventory-2, arguably the most famous personality test ever published. We also examine contemporary approaches that rely upon structured interview, be- havioral observation, and ratings.
The self-report approaches to testing discussed in the following sections are steeped in the details of psychometric methodology. These tests feature prominent references to reliability indices, crite- rion keying, factor analysis, construct validation, and other forms of technical craftsmanship. For this reason, the approaches discussed here often are considered objective—as contrasted with projective. However, whether they are objective in any mean- ingful sense is really an empirical question that must be answered on the basis of research. Perhaps it is more accurate to call these methods structured. They are structured in the sense that highly specific rules are followed in the administration, scoring, inter- pretation, and narrative reporting of results. In fact, some of the approaches are so completely structured that an examinee can answer questions presented on a computer screen and observe a computer-gener- ated narrative report spewed forth from the printer, literally seconds later.3
We begin our discussion of structured assess- ment by reviewing several prominent personality tests. Contemporary psychometricians have relied mainly
A lthough there are many methods for the assessment of personality and related quali- ties, broadly speaking two approaches have
dominated the field: unstructured and structured. Unstructured methods such as the Rorschach, TAT, and sentence completion blanks permit broad latitude in the responses of the examinee. These approaches dominated personality testing in the early twentieth century but then slowly faded in standing. In contrast, structured approaches such as self-report inventories and behavior rating scales gained prominence in the mid-twentieth century and have continued to expand in popularity to the present time. Whereas only a handful of unstruc- tured techniques has ever risen to distinction, the number of structured instruments for assessment has grown almost exponentially.
In the previous topic we introduced the reader to the many varieties of unstructured tests such as inkblots, stimulus cards, and sentence comple- tion blanks. These methods are resplendent in the richness of the hypotheses they yield; however, projective techniques largely lack the approval of psychometrically oriented clinicians. In this topic, we focus on the more structured, objective methods for personality assessment favored by measurement- minded psychologists. We review a wide variety of true–false, rating scale, and forced-choice instru- ments for assessing personality and other qualities.
Topic 8B Self-Report and Behavioral Assessment of psychopathology
Theory-Guided Inventories
Factor-Analytically Derived Inventories
Criterion-Keyed Inventories
Behavioral Assessment
Behavior Therapy and Behavioral Assessment
Structured Interview Schedules
Assessment by Systematic Direct Observation
Analogue Behavioral Assessment
Ecological Momentary Assessment
3Computerized narrative reports may not be altogether a positive development. We discuss the benefits and pitfalls of computer-generated reports in the next chapter.
334 Chapter 8 • Foundations of Personality Testing
form E, which consists of all 22 scales in a modified 352-item test.
In constructing the PRF form E, Jackson first formulated rigorous and theoretically based definitions of the traits to be measured, following Murray’s (1938) system for personality description. Next, for each scale over 100 items were writ- ten to tape the traits underlying the hypothesized needs. After editorial review, these items were ad- ministered to large samples of college students. Item selection was based on simplicity of wording, high biserial correlations with total scale scores, low correlations with other scales (maximizing scale independence), and low correlations with the Desirability scale (minimizing social desirability bias). Convergent and discriminant validity was considered throughout. For the original long forms AA and BB, 20 items were selected for each scale, resulting in 20 × 22 or 440 items. For the PRF form E, about four items were dropped from each scale, yielding a 352-item test.
Unlike many other personality inventories, the PRF scales have no item overlap. As a result, the scales are unusually independent, with most intercorrelation coefficients in the vicinity of 6.30 ( Gynther & Gynther, 1976). Furthermore, the rig- orous scale construction procedures employed by Jackson (1970) yielded scales with good internal consistency, with a median coefficient alpha of .70. Test–retest reliabilities are exceptionally strong, ranging from .80 to .96 for a two-week interval, with a median of .91 (Jackson, 1999). Norms are based on thousands of college students from North America, and also include subgroup norms for psychiatric inpatients and criminal offenders. A desirable fea- ture of the PRF is its readability: The test requires only a fifth- or sixth-grade reading level (Reddon & Jackson, 1989).
The validity of the PRF rests upon a substantial body of research over many decades. A lengthy bib- liography citing more than 300 articles about the test can be found at www. sigmaassessmentsystems.com. For example, correlations between self and room- mate ratings on the PRF constructs are reported to range from .27 to .74, with a median of .53.
The construct validity of the PRF rests es- pecially upon confirmatory factor analyses
upon three tactics for personality test development: theory-bounded approaches, factor-analytic ap- proaches, and criterion-key methods. We will organize the discussion of personality inventories around these three categories. Of course, the boundaries are somewhat artificial and many test developers use a combination of methods.
thEory-guiDED invEntoriEs
The construction of several self-report inventories was guided closely by formal or informal theories of personality. In these cases, the test developer de- signed the instrument around a preexisting theory. Theory-guided inventories stand in contrast to factor-analytic approaches that often produce a retrospective theory based upon initial test find- ings. Theory-guided inventories also differ from the stark atheoretical empiricism found in criterion-key instruments such as the MMPI and MMPI-2. An example of a theory-guided inventory is the Per- sonality Research Form (PRF), based on Murray’s (1938) need-press theory of personality. Some theory-guided inventories such as the State-Trait Anxiety Inventory (STAI) attempt to measure very specific components of personality. We review these tests in more detail in the following.
Personality research form
The Personality Research Form (Jackson, 1999) is a true–false inventory based loosely on Murray’s (1938) theory of manifest needs. The reader will re- call from an earlier discussion that Murray posited 15 needs and developed a projective test, the The- matic Apperception Test, to tap those needs. Based on factor-analytic approaches, Jackson expanded the number of needs and produced several forms for as- sessment. The forms differ in the number of scales and number of items per scale. In addition to parallel short tests (forms A and B), the Personality Research Form (PRF) also exists as parallel long forms (forms AA and BB). These forms, used primarily with col- lege students, consist of 440 true–false items. The long forms yield 20 personality-scale scores and two validity scores, Infrequency and Desirability (Table 8.5). The most popular version of the PRF is
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 335
Some of the confirmatory correlations between PRF and EPI scales for 218 male and female college stu- dents are reported as follows:
Achievement (PRF) 3 Is a Hard Worker (EPI). 74 Change (PRF) 3 Likes a Set Routine (EPI) 2.54 Nurturance (PRF) 3 Helps Others (EPI) .64 Succorance (PRF) 3 Dependent (EPI) .73
Because these instruments were developed inde- pendently according to different test construction
corroborating the grouping of the items into 20 scales (Jackson, 1970, 1984b). In addition, research indicates positive correlations with comparable scales on other inventories (Mungas, Trontel, & Weingardner, 1981). For example, Edwards and Abbott (1973) found exceptionally strong and con- firmatory correlations between similar scales on the PRF and the Edwards Personality Inventory (EPI; Edwards, 1967). The EPI is a respected but little- used test consisting of 1,200(!) true–false questions.
tablE 8.5 personality Research Form Scales
Scale Interpretation of High Score
Abasement Self-effacing, humble, blame-accepting
Achievement Goal striving, competitive
Affiliation Friendly, accepting, sociable
Aggression Argues, combative, easily annoyed
Autonomy Independent, avoids restrictions
Change Avoids routine, seeks change
Cognitive Structure Prefers certainty, dislikes ambiguity
Defendence On guard, takes offense easily
Dominance Influential, enjoys leading
Endurance Persevering, hard-working
Exhibition Dramatic, enjoys attention
Harm Avoidance Avoids risk and excitement
Impulsivity Impulsive, speaks freely
Nurturance Caring, sympathetic, comforting
Order Organized, dislikes confusion
Play Playful, light-hearted, enjoys jokes
Sentience Notices, remembers sensations
Social Recognition Concern for reputation and approval
Succorance Insecure, seeks reassurance
Understanding Values logical thought
Desirability Validity Scale: favorable presentation
Infrequency Validity Scale: infrequent responses
Source: Based on Personality Research Form Scales and Descriptions from Jackson, D. N. (1989).
Personality research form manual (3rd ed.). Port Huron, MI: Sigma Assessment Systems, Inc.,
Research Psychologists Press division. (800) 265-1285.
336 Chapter 8 • Foundations of Personality Testing
military recruits). The STAI has received extensive service in research, and also is used in health- related clinical applications such as gauging anxiety in pregnant women (Gunning, Denison, Stockley, and others, 2010), monitoring improvement in psychotherapy patients (Vautier & Pohl, 2009), and detecting mental disorder in elderly patients (Kvaal, Ulstein, Nordhus, & Engedal, 2005).
State anxiety fluctuates in response to environ- mental circumstances and may change even from hour to hour. Therefore, we can expect that test– retest reliability will be lower for state anxiety than for trait anxiety. This is precisely what researchers find, with short-range reliability in the .40s and .50s for the A-State scale and in the high .80s for the A-Trait scale (Rule & Traver, 1983; Spielberger et al., 1970). Internal consistency of the scale is excellent, with Cronbach’s alpha of .86 for the total score in a sam- ple of medical patients (Quek, Low, Razack, Loh, & Chua, 2004). Individual alpha values for A-State and A-Trait are robust as well, with results of .95 and .93, respectively, in a sample of 567 patients treated at an anxiety disorders clinic (Grös, Antony, Simms, & McCabe, 2007).
The validity of the STAI is well established from dozens of studies demonstrating content valid- ity, convergent/discriminant validity, and construct validity (Spielberger, 1989). In a factor-analytic study of scores for 205 patients with panic disorder, Oei, Evans, and Crook (1990) found that a two- factor oblique solution was the best fit, accounting for 41 percent of the variance. Notably, 18 of the A-State items revealed salient loadings on factor 1 (state anxiety) and all 20 of the A-Trait items showed prominent loadings on factor 2 (trait anxiety). In sum, the STAI is a brief, reliable, and valid measure of state and trait anxiety. The measure is a mainstay for clinicians and researchers.
faCtor-analytiCally DErivED invEntoriEs
Eysenck Personality Questionnaire
The Eysenck Personality Questionnaire (EPQ) was designed to measure the major dimensions of nor- mal and abnormal personality (Eysenck & Eysenck, 1975). Based on a lifelong program of factor-analytic
philosophies, the findings bolster the validity of both tests. Several recent empirical comparisons also sup- port the validity and utility of the PRF. For example, Goffin, Rothstein, and Johnston (2000) proved that the PRF outperformed the more widely used Sixteen Personality Factor Questionnaire (16PF, discussed later in this section) in predicting the job perfor- mance of 487 candidates for managerial positions. Vernon (2000) also reports favorably on the validity of the PRF in his review of recent studies.
state-trait anxiety inventory
The State-Trait Anxiety Inventory (STAI) is a popu- lar self-report measure of anxiety, used in research and clinical settings (Spielberger, 1983, 1989). The current version is called Form Y, a minor revi- sion of the original Form X (Spielberger, Gorsuch, & Lushene, 1970). A similar scale for children also is available (Spielberger, 1973). The test has been translated into more than 40 languages. We limit our discussion here to the adult version.
The purpose of the STAI is to differentiate between the temporary condition of state anxiety and the more long-standing quality of trait anxiety. State anxiety is defined as a “transitory emotional state or condition characterized by subjective feel- ings of tension and apprehension, and by activation of the autonomic nervous system.” Trait anxiety refers to “relatively stable individual differences in anxiety proneness” (Gaudry, Vagg, & Spielberger, 1975, p. 331).
The state scale (A-State scale) consists of 20 items that evaluate how the respondent feels “right now, at this moment.” Items are similar to I feel at peace and I am distressed. Responses are on a 4-point scale (Not At All, Somewhat, Moderately So, and Very Much So). The trait scale (A-Trait scale) consists of 20 items that assess how the respondent feels “generally.” Items are similar to I am a stable person and I lack confidence. Reponses are on a 4-point scale (Almost Never, Sometimes, Often, and Almost Always). Of course, scoring is reversed for positively stated items. The range of scores for each scale is 20 to 80, with higher scores indicating greater anxiety. Extensive normative data are avail- able, stratified by age and subdivided by setting (em- ployed adults, college students, high school students,
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 337
using behavioral, emotional, learning, attentional, and therapeutic criteria (reviewed in Eysenck & Eysenck, 1985). Friedman (1987) provides a short but thorough introduction to other sources on the EPQ.
A major focus of research with the EPQ has been on the empirical correlates of extraversion and its polar opposite, introversion. Eysenck and Eysenck (1975) describe the typical extravert as follows:
The typical extravert is sociable, likes parties, has many friends, needs to have people to talk to, and does not like reading or studying by himself. He craves excitement, takes chances, often sticks his neck out, acts on the spur of the moment, and is generally an impulsive individual.
They describe the typical introvert as follows:
The typical introvert is a quiet, retiring sort of person, introspective, fond of books rather than people; he is reserved and distant except to intimate friends. He tends to plan ahead, “looks before he leaps,” and mistrusts the im- pulse of the moment.
Eysenck and his followers have linked a number of perceptual and physiological factors to the extraversion/introversion dimension. Because of space limitations, we can only list representative findings here:
• Introverts are more vigilant in watchkeeping. • Introverts do better at signal-detection tasks. • Introverts are less tolerant of pain but more
tolerant of sensory deprivation. • Extraverts are more easily conditioned to
stimuli associated with sexual arousal. • Extraverts have a greater need for external
stimulation.
Aiken (1989) summarizes additional research on the real-world correlates of the EPQ extraversion/intro- version dimension.
In general, the technical characteristics of the EPQ are very strong, certainly stronger than found
questionnaire research and laboratory experiments on learning and conditioning, Eysenck isolated three major dimensions of personality: Psychoticism (P), Extraversion (E), and Neuroticism (N). The EPQ consists of scales to measure these dimensions and also incorporates a Lie (L) scale to assess the valid- ity of an examinee’s responses. The EPQ contains 90 statements answered “yes” or “no” and is designed for persons aged 16 and older. A Junior EPQ con- taining 81 statements is suitable for children ages 7 to 15.
Items on the P scale resemble the following:
Do you often break the rules? (T) Would you worry if you were in debt? (F) Do you take risks just for fun? (T)
High scores on the P scale indicate aggressive and hostile traits, impulsivity, a preference for liking odd or unusual things, and empathy defects. Antisocial and schizoid patients often obtain high scores on this dimension. In contrast, low scores on P foretell more desirable characteristics such as empathy and interpersonal sensitivity. Items on the E scale resem- ble the following:
Do you like to meet new people? (T) Are you quiet when with others? (F) Do you like lots of excitement? (T)
High scores on the E scale indicate a loud, gregarious, outgoing, fun-loving person. Low scores on the E scale indicate introverted traits such as a preference for solitude and quiet activities. Items on the N scale resemble the following:
Are you a moody person? (T) Do you feel that life is dull? (T) Are your feelings easily hurt? (T)
The N scale reflects a dimension of emotionality that ranges from nervous, maladjusted, and over- emotional (high scores) to stable and confident (low scores).
The reliability of the EPQ is excellent. For example, the one-month test–retest correlations were .78 (P), .89 (E), .86 (N), and .84 (L). Internal con- sistencies were in the .70s for P and the .80s for the other three scales. The construct validity of the EPQ is also well established through dozens of studies
338 Chapter 8 • Foundations of Personality Testing
(C) Social Conformity versus Rebelliousness. Individuals with high scores accept society as it is, resent nonconformity in others, seek the approval of society, and respect the law.
(A) Activity versus Lack of Energy. High-scoring individuals have a great deal of energy and en- durance, work hard, and strive to excel.
(S) Emotional Stability versus Neuroticism. High-scoring persons are free from depres- sion, optimistic, relaxed, stable in mood, and confident.
(E) Extraversion versus Introversion. High- scoring individuals meet people easily, seek new friends, feel comfortable with strangers, and do not suffer from stage fright.
(M) Mental Toughness versus Sensitivity. High-scoring individuals tend to be rather tough-minded people who are not bothered by blood, crawling creatures, vulgarity, and who do not cry easily or show much interest in love stories.
(P) Empathy versus Egocentrism. High-scoring individuals describe themselves as helpful, generous, sympathetic people who are inter- ested in devoting their lives to the service of others.
Reflecting its careful factor-analytic derivation, the CPS scales possess exceptional internal consis- tencies, which range from .91 to .96. These findings indicate that the CPS is most likely a reliable test, but traditional test–retest data are scant. Cross-cultural studies with the CPS are highly supportive of its va- lidity. Brief and Comrey (1993) report that the eight- factor solution to CPS item responses is found in factor analyses with Russian, U.S., Brazilian, Israeli, Italian, and New Zealand samples. Other validational studies with the CPS are not straightforward in their interpretation. On the one hand, the correlations between CPS scale scores and personality-relevant biographical data are very small (Comrey & Backer, 1970; Comrey & Schiebel, 1983). On the other hand, extreme scores on the CPS scales are strongly asso- ciated with psychological disturbance ( Comrey & Schiebel, 1985). This is particularly true for low scores on Trust versus Defensiveness, Activity versus Lack of Energy, Emotional Stability versus Neuroticism,
in most self-report inventories. The practical utility of the instrument is supported by voluminous re- search literature. Nonetheless, the EPQ has never caught on among American psychologists, who seem enamored of multiphasic instruments that produce 10, 20, or 30 scores, not a simple trio of ba- sic dimensions.
Comrey Personality scales
For practitioners who desire a short self-report inventory suitable for college students and other adults, the Comrey Personality Scales (Comrey, 1970, 1980, 2008) would be a good choice. As a pro- tégé of Guilford, Comrey pursued a factor- analytic strategy in developing his 180-item test. Comrey relied exclusively upon college students in the development and standardization of his test, so the CPS is well suited to assessment of personality in this subpopulation.
A special virtue of the CPS is its brevity. Consisting of 180 statements, the test is only one- third as long as competing instruments such as the MMPI-2. The eight CPS personality scales consist of 20 items each, divided equally between positively and negatively worded statements. Another 20 items are devoted to a validity check and the assessment of social desirability response bias.
The following description of CPS scales is based upon Merenda (1985) and Comrey (1995, 2008):
(V) Validity Check. A score of 8 is the expected raw score. Any score on the V scale that gives a T-score equivalent below 70 is still within the normal range, however. Higher scores are sug- gestive of an invalid record.
(R) Response Bias. High scores indicate a tendency to answer questions in a socially de- sirable way, making the respondent look like a “nice” person.
(T) Trust versus Defensiveness. High scores indicate a belief in the basic honesty, trustwor- thiness, and good intentions of other people.
(O) Orderliness versus Lack of Compulsion. High scores are characteristic of careful, meticulous, orderly, and highly organized individuals.
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 339
of carefully defined psychiatric patient groups ( average N of about 50) with item responses of 724 control subjects. The result was a remarkable test useful both in psychiatric assessment and the de- scription of normal personality. Within a few years, the MMPI became the most widely used personality test in the United States.
At first the MMPI aged gracefully; what ap- peared to be minor flaws were tolerated by practi- tioners. But as the MMPI reached middle age, the need for rejuvenation became increasingly obvious. The most serious problem was the original con- trol group, which consisted primarily of relatives and visitors of medical patients at the University of Minnesota Hospital. The narrow choice of con- trol subjects, tested mainly in the 1930s, proved to be a persistent source of criticism for the MMPI. All of the control subjects were white, and most were young (average age about 35), married, and from a small town or rural area. This was a sample of con- venience that was significantly unrepresentative of the population at large.
The item content of the MMPI also raised concerns (Graham, 1993). Several items used ar- chaic and obsolete terminology, referring to “drop the handkerchief” (a parlor game from the 1930s), sleeping powders (sleeping pills), and streetcars ( electric-powered buses). Other items used sexist language. Examinees found some items objection- able, especially those dealing with Christian religious beliefs. These items were the source of occasional law- suits alleging invasion of privacy. Finally, a few items dealing with bowel functions and sexual behavior were just downright offensive.
From the standpoint of measurement, a more serious problem with item content was that of omis- sion. The MMPI item pool was not broad enough to assess many important characteristics, includ- ing suicidal tendencies, drug abuse, and treatment- related behaviors. An additional motive for MMPI revision was to extend the range of item coverage.
The MMPI-2 was released in 1989 after nearly a decade of revision and restandardization. The new,
Extraversion versus Introversion, and high scores on Orderliness versus Lack of Compulsion. Shen and Comrey (1997) describe the utility of the CPS with medical students, showing that the test is a reason- able predictor of clinical performance and personal suitability. In general, reviewers conclude that the CPS is a promising test that needs updated standard- ization and additional documentation on its techni- cal qualities. Comrey (1995) summarizes validity studies of his test.
CritErion-KEyED invEntoriEs
The final self-report inventories that we will review embody a criterion-keyed test development strategy. In a criterion-keyed approach, test items are assigned to a particular scale if, and only if, they discriminate between a well-defined criterion group and a relevant control group. For example, in devising a self-report scale for depression, items endorsed by depressed persons significantly more (or less) frequently than by normal controls would be assigned to the de- pression scale, keyed in the appropriate direction. A similar approach might be used to develop scales for other constructs of interest to clinicians such as schizophrenia, anxiety reaction, and the like. Notice that the test developer does not consult any theory of schizophrenia, depression, or anxiety reaction to de- termine which items belong on the respective scales. The essence of the criterion-keyed procedure is, so to speak, to let the items fall where they may.4
Minnesota Multiphasic Personality inventory-2 (MMPi-2)
First published in 1943, the MMPI was a 566-item true–false personality inventory designed originally as an aid in psychiatric diagnosis (Hathaway & McKinley, 1940, 1943; McKinley & Hathaway, 1940, 1944; McKinley, Hathaway, & Meehl, 1948). The test authors followed a strict empirical keying approach in the construction of the MMPI scales. The clinical scales were developed by contrasting item responses
4We are glossing over certain complexities here. Some items reflecting general psychopathology might discriminate all the contrast groups from the control group. The test developer might discard these in favor of items that are differentially discriminating for just one contrast group but not the others.
340 Chapter 8 • Foundations of Personality Testing
were used to construct a scale for social introversion. The MMPI-2 retains the basic clinical scales with only minor item deletions and revisions. Ben-Porath and Butcher (1989) investigated the characteristics of the rewritten items on the MMPI-2 and discov- ered that they are psychometrically equivalent to the original items.
The MMPI-2 can be scored for four validity scales, 10 standard clinical scales, and dozens of supplementary scales. In practice, clinicians place the greatest emphasis upon the validity and stan- dard clinical scales. The supplementary scales are just that—supplementary. They provide informa- tion helpful in fine-tuning the interpretation of the traditional validity and clinical scales. MMPI-2 scale raw scores are converted to T scores, with a mean of 50 and a standard deviation of 10. Scores that exceed T of 65 merit special consideration. These elevated scores are statistically uncommon in the general population and may signify the presence of psychi- atric symptomatology. We will concentrate upon the traditional scales here, beginning with a review of the four validity scales, known as Cannot Say (or ?), L, F, and K.
The Cannot Say score is simply the total num- ber of items omitted or double-marked in comple- tion of the answer sheet. The instructions for the test encourage examinees to mark all items, but omis- sions or double-marked items will occur. However, this is rare—the modal number of items omitted is zero (Tamkin & Scherer, 1957). Omission of up to 10 items appears to have little effect on the overall test results—one of the benefits of having a huge pool of statements in the MMPI-2. A very high score on this scale may indicate a reading problem, oppo- sition to authority, defensiveness, or indecisiveness caused by depression.
The L Scale is composed of 15 items all scored in the false direction. By answering “false” to L Scale items, the examinee asserts that he or she possesses a degree of personal virtue that is rarely observed in our culture (e.g., never gets angry, likes every- one, never lies, reads every newspaper editorial, and would rather lose than win). The L Scale was
improved MMPI-2 incorporates a contemporary normative sample of 2,600 individuals who are loosely representative of the general population on major demographic variables (geographic location, race, age, occupational level, and income). Although higher educational levels are overrepresented, the MMPI-2 normative sample is still a vast improve- ment over the MMPI normative sample. The item pool has been significantly improved by revision of obsolete items, deletion of offensive items, and addi- tion of new items to extend content coverage.
The MMPI-2 is a significant improvement upon the MMPI, but maintains substantial continu- ity with its esteemed predecessor. The test developers retained the same titles and measurement objec- tives for the traditional validity and clinical scales. The restandardization provides a better calibration for scale elevations, a much-needed improvement (Tellegen & Ben-Porath, 1992). Although dozens of items were rewritten, most of these revisions are cosmetic and do not affect the psychometric char- acteristics of the test (Ben-Porath & Butcher, 1989). In fact, when large samples of subjects complete the MMPI and the MMPI-2, scores on the individual va- lidity and clinical scales typically correlate near .99.
The MMPI-2 consists of 567 items carefully designed to assess a wide range of concerns. The examinee is asked to mark “true” or “false” for each statement as it applies to himself or herself. Most of the items are self-referential. The items encompass a wide variety of mainly pathological themes (Dahl- strom, Welsh, & Dahlstrom, 1972; Graham, 1993).
The MMPI requires a sixth-grade reading level and is completed by most persons in 1 to 1½ hours.
The original MMPI scales were developed by contrasting item responses of carefully defined psy- chiatric patient groups (average N of about 50) with item responses of about 700 controls. The psychiat- ric patient groups included the following diagnostic categories: hypochondriasis, depression, hysteria, psychopathy, male homosexuality, paranoia, psych- asthenia,5 schizophrenia, and the early phase of ma- nia (hypomania). In addition, samples of socially introverted and socially extraverted college students
5This outdated diagnostic term is quite similar to what would now be labeled obsessive-compulsive disorder.
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 341
for this practice is that elevations on K betoken an artificial reduction of scores on these clinical scales. Portions of the raw score on K are thus added to these clinical scale scores prior to computation of the T scores. The K-corrected scales, discussed later, include Hypochondriasis, Psychopathic Deviate, Psychasthenia, Schizophrenia, and Hy- pomania. Whether K correction actually improves the MMPI-2 is debatable, but the test publish- ers continued the tradition from the MMPI for the sake of continuity. Separate norms for non- K-corrected scale score transformations are also available.
In addition to the validity scales, the MMPI-2 is always scored for 10 clinical scales. With the ex- ception of Social Introversion, these clinical scales were constructed in the usual criterion-keyed man- ner by contrasting responses of clinical subjects and normal controls. As noted previously, Social Introversion was developed by contrasting the re- sponses of college students high and low in social introversion. The 10 clinical scales and common interpretations of elevated scores are outlined in Table 8.6.
Dozens of supplementary scales can also be scored on the MMPI-2. Some of the supplemen- tary scales are based upon rational identification of symptom clusters and subsequent scale purification by empirical means. Fifteen useful MMPI-2 Content Scales were developed in this manner (Butcher, Graham, Williams, & Ben-Porath, 1990). Many of the supplementary scales were developed by in- dependent investigators; these scales vary widely in quality. In practice, only about 30 of the addi- tional scales are routinely scored. Examples of the supplementary scales include Anxiety, Repression, Ego Strength, and the MacAndrew Alcoholism Scale-Revised. Anxiety (A) and Repression (R) are the first two major factors that always emerge from factor analysis of MMPI-2 responses. An interesting supplementary scale is Barron’s (1953) Ego Strength (Es) Scale, which purports to predict positive re- sponse to psychotherapy. However, not all studies confirm this use of the scale (Graham, 1987). The MacAndrew Alcoholism Scale-Revised ( MAC-R; MacAndrew, 1965) is a useful index of alcohol or other substance abuse. The MAC-R is not only useful
designed to identify a general, deliberate, evasive test-taking attitude. A high score on the L Scale in- dicates that the examinee is not only defensive, but naively so. Persons with any degree of psychological sophistication can adopt a defensive test-taking at- titude and still score in the normal range on the L Scale.
The F Scale consists of 60 items answered by normal subjects in the scored direction no more than 10 percent of the time. These items reflect a broad spectrum of serious maladjustment, includ- ing peculiar thoughts, apathy, and social alienation. Even though F Scale items seem to indicate psychiat- ric pathology, they are seldom endorsed by patients. Fewer than 50 percent of these items appear on the clinical scales. Many persons with significant psychi- atric disturbance do produce elevated scores in the range of T =70 or 80 on the F Scale. On the other hand, exceptionally high scores suggest additional hypotheses: insufficient reading ability, random or uncooperative responding, a motivated attempt to “fake bad” on the test, or an exaggerated “cry for help” in a distressed client.
The K Scale was designed to help detect a subtle form of defensiveness. The 30-item scale is composed, in part, of 22 items that differentiated normal profiles produced by defensive hospitalized psychiatric patients from those produced by normal controls. Additionally, eight items that improved discrimination of depressive and schizophrenic symptoms were added (McKinley, Hathaway & Meehl, 1948). An elevated score on the K Scale may indicate a defensive test-taking attitude. Normal range elevations on the K Scale suggest good ego strength—the presence of useful psychological de- fenses that allow the person to function well in spite of internal conflict.
The combined use of F and K may be useful in the detection of MMPI-2 profiles that have been faked or malingered. In one study, 81 percent of fake-good profiles were identified by a simple deci- sion rule (using raw scores) of F−K < 212, whereas 87 percent of fake-bad profiles were identified by a simple decision rule (using raw scores) of F–K > 7 (Bagby, Rogers, Buis, & Kalemba, 1994).
Several clinical scales are “K-corrected” to improve their discriminatory power. The rationale
342 Chapter 8 • Foundations of Personality Testing
distilled the meaning of various elevations on the Pa or Paranoia scale as follows:
T = 27–44 examinee may be stubborn, touchy, or difficult
T = 45–59 no undue sensitivity and adequate regard for others
T = 60–69 increasing probability of rigidity and oversensitivity
T = 70–79 rigid, touchy, projects blame and hostility
T = 79–100 frankly delusional paranoid fea- tures may be present
The configural approach to MMPI-2 in- terpretation is somewhat more complicated and consists of classifying the profile as belonging to one or another loosely defined code type that has been studied extensively. Code types are usually defined
in assessment of alcoholism but is also helpful in the identification of heavy drinkers and drug-dependent individuals (Wolf, Schubert, Patterson, Grande, & Pendleton, 1990). We cannot possibly review all the useful supplementary scales here. The interested reader should consult Butcher and Williams (1992) and Graham (1993).
MMPi-2 interpretation
The interpretation of an MMPI-2 profile can pro- ceed along two different paths: scale by scale or configural. In the simplest possible approach, scale by scale, the examiner determines the validity of the test, as discussed previously, by inspecting the four validity scales. If the test appears reasonably valid by these criteria, the examiner consults a relevant re- source book and proceeds scale by scale to produce a series of hypotheses. For example, Lachar (1974) has
tablE 8.6 The 10 clinical Scales from the Minnesota Multiphasic personality inventory-2
Scale No. and Abbreviation Scale Name K Correction
Typical Interpretation of Elevation
1 Hs Hypochondriasis .5K Excessive physical preoccupation
2 D Depression Sad feelings, hopelessness
3 Hy Hysteria Immaturity, use of repression, denial
4 Pd Psychopathic deviate .4K Authority conflict, impulsivity
5 Mf Masculinity- femininity
Masculine interests [women], feminine interests [men]
6 Pa Paranoia Suspiciousness, hostility
7 Pt Psychasthenia 1K Anxiety and obsessive thinking
8 Sc Schizophrenia 1K Alienation, unusual thought processes
9 Ma Hypomania .2K High energy, possible agitation
0 Si Social introversion Shyness and introversion
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 343
misused by unqualified persons. We discuss the pit- falls of computerized test interpretation in the final chapter of the book.
technical Properties of the MMPi-2
From the standpoint of traditional psychometric criteria, the MMPI-2 presents a mixed picture. Reliability data are generally positive, with median internal consistency coefficients (alpha) typically in the .70s and .80s, but as low as the .30s for some scales in some samples. One-week test–retest co- efficients range from the high .50s to the low .90s, with a median in the .80s (Butcher, Dahlstrom, Gra- ham, Tellegen, & Kaemmer, 1989). These are good figures considering that some attributes—such as those measured by the Depression scale—change so quickly that the test–retest methodology is of ques- tionable suitability.
A shortcoming of the MMPI-2 is that inter- correlations among the clinical scales are extremely high. For example, in the case of scales 7 and 8, the Psychasthenia and Schizophrenia scales, the correla- tion is commonly in the .70s. In part, this reflects the item overlap between MMPI scales—scales 7 and 8 share 17 items in common. But it is also true that the criterion-keyed approach is not well suited to the de- velopment of independent measures. A high inter- correlation of basic scales is one price to be paid for using this test development strategy.
The validity of the MMPI-2 is difficult to sum- marize, owing to the sheer volume of research on this instrument and its predecessor, the MMPI. As of 1975, over 6,000 studies employing the MMPI had been completed (Dahlstrom, Welsh, & Dahlstrom, 1975). Of course, thousands of additional stud- ies have been published since then. Graham (1993) provides a brief but excellent review of validity stud- ies on the MMPI/MMPI-2. He notes that the aver- age validity coefficient for MMPI studies conducted between 1970 and 1981 was a healthy .46. He also points out the confirming pattern of extratest cor- relates in dozens of studies of identified patient groups. Research also indicates that the MMPI-2 is highly comparable to the MMPI, for which a sub- stantial body of validity data has been compiled (Hargrave, Hiatt, Ogard, & Karr, 1994). Finally, bias studies comparing MMPI-2 results for Caucasian
by a combination of elevation (two or more clinical scales elevated beyond a certain criterion) and defi- nition (two or more clinical scales clearly standing out from the others). For example, in its full-blown manifestation, the 4–9 code type can be defined by a valid profile in which scale 4 (Psychopathic Deviate) and scale 9 (Hypomania) are the high-point el- evations, both exceed T of 65 (elevation), and both exceed the next highest clinical scale by at least 5 T-score points (definition). Here is how Graham (1993) describes persons who fit this code type:
The most salient characteristics of 49/94 individuals is a marked disregard for social standards and values. They frequently get in trouble with the authorities because of anti- social behavior. They have a poorly developed conscience, easy morals, and fluctuating ethical values. Alcoholism, fighting, marital problems, sexual acting out, and a wide array of delinquent acts are among the difficulties in which they may be involved. This is a common code type among persons who abuse alcohol and other substances.
The most likely diagnosis for such individuals is antisocial personality disorder.
We should mention briefly that several computerized interpretation systems are available for the MMPI and the MMPI-2 (Fowler, 1985; Butcher, 1987). The Minnesota Report™ (Butcher, 1993) is the best. This system generates a very cau- tious and methodical 16-page report that includes discussion of profile validity, symptomatic patterns, interpersonal relations, diagnostic considerations, and treatment considerations. The Minnesota Re- port™ also provides a variety of figures and tables to illustrate test results.
The adequacy of computerized MMPI-2 narrative reports is generally good, but the reader should realize that computer programs are writ- ten by fallible human beings. There is a danger that computer-generated test reports will be erroneous. Furthermore, some less-reputable interpretive sys- tems can be purchased on microcomputer diskette for a few hundred dollars. This increases the risk that computer-based test interpretations will be
344 Chapter 8 • Foundations of Personality Testing
MMPI-2-RF will rest upon accumulated research in the coming years.
Millon Clinical Multiaxial inventory-iii (MCMi-iii)
The MCMI-III is a personality inventory designed for the same purposes as the MMPI-2, namely, to provide useful information for psychiatric diagno- sis (Millon, 1983, 1987, 1994). The MCMI-III has two advantages over the MMPI-2. First, it is much shorter (175 true–false items) and, therefore, more palatable to clinical referrals; second, it is planned and organized to identify clinical patterns in a man- ner that is compatible with the Diagnostic and Statis- tical Manual (DSM-IV) of the American Psychiatric Association.
The MCMI-III is a highly theory-driven test, incorporating Millon’s elaborate theoretical formu- lations on the nature of psychopathology and per- sonality disorder (Millon, 1969, 1981, 1986; Millon & Davis, 1996). The test includes 27 scales, listed in Table 8.7. The first 11 scales measure personality
and African American clients indicate that slight ra- cial differences do exist in average profiles. However, these differences validly reflect emotional func- tioning; that is, the MMPI-2 is not racially biased ( McNulty, Graham, Ben-Porath, & Stein, 1997). The MMPI-2 likely will maintain its status as the pre- miere instrument for assessment of psychopathol- ogy in adulthood for many years to come.
In 2008, a new version of the MMPI-2 with reduced length and restructured scales was re- leased (Ben-Porath & Tellegen, 2008; Tellegen & Ben-Porath, 2008). Because it embodies a re- structured format (RF), the recent entry is called the MMPI-2-RF. This innovative test comprises 338 items carefully selected from the original 567 items of the MMPI-2, using modern psychometric methods for scale construction. Certainly the re- duced length is a potential advantage. Patients of- ten tire when completing the MMPI-2, and some find the experience tedious and onerous. Even so, the MMPI-2-RF constitutes a dramatic departure from the parent instrument and is therefore re- ally a new test (Butcher, 2011). The utility of the
tablE 8.7 Scales of the Millon clinical Multiaxial inventory-iii
clinical personality patterns clinical Syndromes
1 Schizoid A Anxiety
2A Avoidant H Somatoform
2B Depressive N Bipolar: Manic
3 Dependent D Dysthymia
4 Histrionic B Alcohol Dependence
5 Narcissistic R Post-Traumatic Stress Disorder
6A Antisocial
6B Aggressive (Sadistic) Severe Syndromes
7 Compulsive SS Thought Disorder
8A Passive-Aggressive (Negativistic) CC Major Depression
8B Self-Defeating PP Delusional Disorder
Severe personality pathology Validity (Modifying) indices
S Schizotypal X Disclosure
C Borderline Y Desirability
P Paranoid Z Debasement
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 345
used. Millon offers the arguable justification that a patient sample is adequate for the normative sample because the base rates (in the general population) for specific personality and clinical disorders were consulted to calibrate the cutting points on the in- dividual scales (Millon & Davis, 1996). But this approach is complex, experimental, and difficult to understand. The reliability of the individual scales is good: Internal consistency coefficients average .82 to .90, and test–retest coefficients for one week range from .81 to .87. Support for the validity of the MCMI-III is mixed (Haladyna, 1992; Piersma & Boes, 1997). Craig (1993) has assembled a series of articles that are largely supportive of the MCMI. Jankowski (2002) provides a beginner’s guide to the test.
Personality inventory for Children-2 (PiC-2)
The PIC-2 (Lachar & Gruber, 2001) is a substantial revision of the PIC-R, a popular instrument that dates back to the late 1950s (Wirt & Broen, 1958; Wirt, Lachar, Klinedinst, & Seat, 1984). The current version, suitable for children 5 through 19 years of age, consists of 275 true–false statements that are completed by a parent or parental surrogate. The PIC-2 is one corner of a triad of instruments devel- oped by David Lachar and colleagues to provide a comprehensive, multiview perspective on children’s emotional and behavioral adjustment in the home, school, and community. The complementary in- struments are the Personality Inventory for Youth (PIY), which is filled out by the child, and the Stu- dent Behavior Survey (SBS), which is filled out by the teacher. We discuss only the PIC-2 here. Items on the PIC-2 resemble the following:
My child finds it difficult to fall asleep.
My child is a finicky eater.
My child has threatened to kill himself (herself).
Sometimes my child swears at other adults.
Our marriage has been full of turmoil.
The instrument also provides a shorter 96-item ver- sion known as the Behavioral Summary, suitable for screening and research purposes.
styles or traits such as narcissism and antisocial tendencies; the next three assess more severe per- sonality pathology (schizotypal, borderline, and paranoid disorders); the following seven scales assess clinical syndromes such as anxiety and depression; the next three scales assess severe clinical syndromes such as thought disorder; the last three scales are validity (response style) indices. Scores on these scales (Disclosure, Desirability, and Debasement) are used to adjust the other scale scores upward or downward, based on defensiveness or exaggeration of symptoms, respectively.
Scale development for the MCMI-III and its precursors was careful and methodical. We can only portray the broad outline here, in which 3,500 initial items were culled to 175 statements in three stages of test development: a theoretical-substantive stage (theory-guided item writing), an internal-structural stage (item-scale correlations), and an external- criterion stage (contrast of diagnostic groups with the reference group). A special feature of the last stage was Millon’s use of general psychiatric pa- tients instead of normal controls as the reference group. The purpose of this strategy was to enhance the capacity of MCMI scales to differentiate specific diagnostic groups from one another. Unfortunately, one side effect of this particular criterion-keyed approach was a rather substantial degree of item overlap for the clinical scales. Millon planned for and expected the item overlap but probably did not anticipate that some pairs of scales on the MCMI would share the majority of their items in common. Some of this overlap was eliminated with the further refinement of the test for the second and third edi- tions. The revised instrument also incorporates an item-weighting procedure. In this approach, indi- vidual questions are weighted 2 or 1 to reflect their importance in discriminating the prototype for each scale. The item-weighting approach has been criticized as unnecessary and unwieldy (Streiner, Goldberg, & Miller, 1993).
The normative sample for the MCMI-III con- sisted of about a thousand men and women patients from across the United States. This is an unusual and controversial approach to the collection of a norma- tive sample. More typically, population-proportionate sampling of reasonably normal individuals is
346 Chapter 8 • Foundations of Personality Testing
with Oppositional Defiant Disorder showed highly elevated scores (average T scores of 75 to 80) on the following PIC-2 subscales: Disruptive Behavior, Fearlessness, Dyscontrol, and Noncompliance. This is a perfect match to the major clinical features of this DSM-IV diagnostic category. Overall, the test developers have cited an impressive body of research that supports the reliability and validity of their in- strument. Although independent studies of this test are yet to be published, it seems clear that the PIC-2 will earn wide usage in the behavioral and emotional assessment of school-aged children.
The test developers of the PIC-2 followed a complex multistage methodology to assign indi- vidual items to scales and subscales. The goal was to minimize content overlap between scales and subscales by examining preliminary item × subscale correlations and then retaining only those items for each specific subscale that showed high correlations. As a consequence of this test development strategy, each subscale possesses homogeneous content and the individual statements correlate substantially with one another. The resulting instrument con- sists of three response validity scales (Inconsistency, Dissimulation, Defensiveness) and nine adjustment scales. Each of the adjustment scales includes two or three subscales (Table 8.8).
Scale raw scores are converted to T scores with a mean of 50 and standard deviation of 10. Higher T scores indicated increased probability of psy- chopathology or deficit. Norms for children ages 5 through 19 years of age are based on a nationally representative sample of 2,306 parents of boys and girls in kindergarten through 12th grade.
With the possible exception of the three va- lidity scales (Inconsistency, Dissimulation, and Defensiveness), the PIC-2 scale and subscale names are self-explanatory. The validity scales are (1) Inconsistency, which includes 35 similar pairs of items to determine consistency of respond- ing; (2) Dissimulation, a 35-item scale designed to identify deliberate exaggeration (fake bad) about symptoms or random responding; and (3) Defensiveness, a 24-item scale consisting of improb- able virtues (e.g., “my child never has any problems”) and therefore an index of naive defensiveness.
The reliability of PIC-2 scales and subscales is good, with test–retest values in the range of .82 to .92 and internal consistency coefficients in the range of .81 to .92. The test manual (Lachar & Gruber, 2001) summarizes a huge body of criterion-related validity studies such as correlations with indepen- dent ratings from clinicians. These correlations are very strong for similar behavioral dimensions (and weak for dissimilar behavioral dimensions), thus supporting the validity of individual scales and sub- scales. In like manner, PIC-2 subscale scores show theory-consistent relationships with the DSM-IV diagnostic categories of clinic-referred children. For example, 63 children independently diagnosed
tablE 8.8 Adjustment Scales and Subscales of the personality inventory for children-2
Adjustment Scales Subscales
Cognitive Impairment Inadequate Abilities
Poor Achievement
Developmental Delay
Impulsivity and Distractibility
Disruptive Behavior Fearlessness
Delinquency Antisocial Behavior
Dyscontrol
Noncompliance
Family Dysfunction Conflict among Members
Parent Maladjustment
Reality Distortion Developmental Deviation
Hallucinations and Delusions
Somatic Concern Psychosomatic Preoccupation
Muscular Tension and Anxiety
Psychological Discomfort Fear and Worry
Depression
Sleep Disturbance/Death Preoccupation
Social Withdrawal Social Introversion
Isolation
Social Skills Deficits Limited Peer Status
Conflict with Peers
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 347
momentary assessment in more detail at the end of this chapter.
Behavioral assessment is often—but not always—an integral part of behavior therapy designed to change the duration, frequency, or intensity of a well-defined target behavior. For exam- ple, one therapy goal for a shy college student might be that she initiate a minimum of five conversations
bEhavioral assEssMEnt
Behavioral assessment concentrates on behavior itself rather than on underlying traits, hypotheti- cal causes, or presumed dimensions of personality. The many methods of behavioral assessment offer a practical alternative to projective tests, self-report inventories, and other unwieldy techniques aimed at global personality assessment.
Typically, behavioral assessment is designed to meet the needs of therapists and their clients in a quick and uncomplicated manner. But behavioral assessment differs from traditional assessment in more than its simplicity. The basic assumptions, practical aspects, and essential goals of behavioral and traditional approaches are as different as night and day. Traditional assessment strategies tend to be complex, indirect, psychodynamic, and often extraneous to treatment. In contrast, behavioral as- sessment strategies tend to be simple, direct, behav- ior-analytic, and continuous with treatment.
Behavior therapists use a wide range of mo- dalities to evaluate their clients, patients, and sub- jects. The methods of behavioral assessment include, but are not limited to, behavioral observations, self-reports, parent ratings, staff ratings, sibling ratings, judges’ ratings, teacher ratings, therapist ratings, nurses’ ratings, physiological assessment, biochemical assessment, biological assessment, structured interviews, semistructured interviews, and analogue tests. In their Dictionary of Behavioral Assessment Techniques, Hersen and Bellack (1988) list 286 behavioral tests used in widely diverse prob- lems and disorders in children, adolescents, adults, and the geriatric population. Dozens more are ref- erenced in a more recent compendium (Hersen & Bellack, 1998). So that the reader can appreciate the diversity of techniques available, we provide a sam- pling of these tests in Table 8.9.
In recent years, a new form of behavioral as- sessment known as ecological momentary assess- ment has become increasingly popular. In ecological momentary assessment, the client carries a wireless handheld device similar to a personal digital assis- tant and responds in real time to preplanned inqui- ries from the researcher. This approach is designed to circumvent a number of limitations of tradi- tional self-report techniques. We discuss ecological
tablE 8.9 A Sampling of Behavioral Assessment Tests and Techniques
Abnormal Involuntary Movement Scale
Alcohol Dependence Scale
Assertiveness Self-Statement Test
Automatic Thoughts Scale
Behavioral Assessment of Satiety
Behavioral Pain Scale
Blood Alcohol Level
Body Sensation Questionnaire
Compulsive Activity Checklist
Conversational Skills Rating Scale
Current Dieting Questionnaire
Dementia Behavioral Assessment Test
Drinking Context Scale
Gifted Behaviors Rating Scale
Goal Attainment Scaling
Health Risk Attitude Scale
Irrational Beliefs Inventory
McGill Pain Questionnaire
Physical Activity and TV Viewing
Physical Fighting—Youth Risk Survey
Pittsburgh Insomnia Rating Scale
Prosocial Behaviors of Children
Rape Trauma Symptom Rating Scale
Scale for the Assessment and Rating of Ataxia
Scale of Sexual Experience
Six Minute Walk Test
Sleep Assessment Scale
Victimization in Dating Relationships
348 Chapter 8 • Foundations of Personality Testing
consequences, it must be cognitively mediated. As a consequence of this paradigm shift, practi- cally all modern-day behavior therapists concern themselves—at least to some extent—with the thoughts and beliefs of their clients. This new em- phasis is reflected in a family of very popular treatment procedures known collectively as cogni- tive behavior therapy (Hofmann & Reinecke, 2010).
bEhavior thEraPy anD bEhavioral assEssMEnt
At present, the specific techniques of behavior ther- apy can be classified into four overlapping categories (Johnston, 1986): exposure-based methods, cognitive behavior therapies, self-control procedures, and so- cial skills training. Behavioral assessment is used in all of these approaches, as reviewed in the following sections. However, there are relatively few behavior- ally based tools for the evaluation of social skills, so this category is not discussed. Readers who desire limited coverage of instruments for the behavioral evaluation of social skills training (including asser- tiveness) should consult Meier and Hope (1998).
Exposure-based Methods
Exposure-based methods of behavioral therapy are well suited to the treatment of phobias, which in- clude intense and unreasonable fears (e.g., of spi- ders, blood, public speaking). One approach to phobic avoidance is systematic exposure of the cli- ent to the feared situation or object. Wolpe (1973) favored gradual exposure with minimal anxiety in a procedure known as systematic desensitization. In this therapeutic approach, the client first learns total relaxation and then proceeds from imagined expo- sure to actual or in vivo exposure to the feared stim- ulus. Another exposure-based method is flooding or implosion in which the client is immediately and totally immersed in the anxiety-inducing situation.
The therapist needs some type of behavioral assessment to gauge the continuing progress of a client undergoing an exposure-based treatment for a phobia. In the simplest possible assessment approach, known as a behavioral avoidance test (BAT), the therapist measures how long the client
lasting two minutes or more each day. The therapist might recommend that she approach this goal incre- mentally, beginning with a few brief social exchanges before proceeding to lengthier conversations with strangers. In this example, behavioral assessment might take the form of self-monitoring in which the student uses a wristwatch for timing and a diary for keeping track of conversations.
As noted, behavioral assessment often exists in service of behavior therapy. In many cases, the nature of behavioral assessment is dictated by the procedures and goals of behavior therapy. For this reason, the reader will better appreciate behavioral assessment tools if we interweave this topic with a discussion of behavior therapy methods.
Behavior therapy, also called behavior modifi- cation, is the application of the methods and find- ings of experimental psychology to the modification of maladaptive behavior (Plaud & Eifert, 1998). The roots of behavior therapy can be traced to Skinner’s (1953) seminal book, Science and Human Behavior, which detailed the application of operant condition- ing to the problems of human behavior. Skinner shunned any reference to private, nonobservable events such as thoughts or feelings; he emphasized the importance of identifying observable behaviors and methodically altering the environmental conse- quences of those behaviors.
Research by Wolpe (1958) on the systematic behavioral treatment of phobias also was influen- tial in founding the methods of behavior therapy. Wolpe’s clinical procedures were derived from his laboratory work on the conditioning and counter- conditioning of fear in cats. Like Skinner, Wolpe de- emphasized the significance of thoughts and beliefs. He viewed fear as a learned phenomenon that could be unlearned by following a strict protocol of gradu- ated exposure to the feared object or situation.
After Skinner, Bandura (1977), Mahoney and Arnkoff (1978), and Meichenbaum (1977) reintro- duced cognitive factors into the ever-changing be- havioral framework. For example, Bandura (1977) demonstrated that persons are perfectly capable of cognitively based learning. In particular, he showed that individuals can learn from mere observation of the response contingencies experienced by models. Since this learning occurs in the absence of personal
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 349
A fear survey schedule is another type of behavioral assessment useful in the identification and quantification of fears. Fear survey schedules are face valid devices that require respondents to indicate the presence and intensity of their fears in relation to various stimuli, typically on a 5- or 7-point Likert scale. Dozens of these instruments have been published, including versions by Wolpe (1973), Ollendick (1983), and Cautela (1977). Tasto, Hickson, and Rubin (1971) used factor analysis to develop a 40-item survey that yields a profile of fear scores in five categories. A generic fear survey sched- ule is shown in Table 8.10. Fear survey schedules are often used in research projects to screen large samples of persons in search of subjects who share a common fear. Another use of these schedules is to monitor changes in fears, including those that have been targeted for clinical intervention.
Klieger and Franklin (1993) have raised a num- ber of cautions about the use of fear survey schedules in clinical research. These authors note that reliabil- ity data for fear surveys are almost nonexistent. A more serious problem has to do with the validity of these instruments. Using the Wolpe and Lang (1977) Fear Survey Schedule-III (FSS-III), a highly respected and widely used schedule, Klieger and Franklin (1993) found no relationship between reported fears on the FSS-III and BAT measures of the same fears. For example, subjects who reported a high fear of blood on the FSS-III were just as likely to approach a bloody white towel and touch it as were subjects who reported no fear of blood. Similar results were found for subjects who feared snakes, spiders, and fire. The researchers concluded that the FSS-III and similar instruments are a poor choice for identifying experi- mental groups and a poor basis for measuring the outcome of therapeutic interventions. The essential downfall seems to be that fear survey schedules pos- sess such “obvious” validity that few researchers have bothered to evaluate the traditional psychometric characteristics of reliability and validity. Fear survey schedules should be used with caution.
Cognitive behavior therapies
The one factor common to all cognitive behav- ior therapies is an emphasis on changing the belief
can tolerate the anxiety-inducing stimulus. Here is one classic example of a standardized BAT used to evaluate patients with agoraphobia, a disabling fear of open spaces often accompanied by panic attacks:
The standardized Behavioral Avoidance Test (BAT) was conducted a week after intake. All anxiolytics, antidepressants, or other psycho- tropic medication had been taken away at least 4 days before the test. The test was adminis- tered by the first author, who was blind to the patients’ diagnoses [and] not involved in the treatment. The patients were asked to walk alone as far as they could from the hospital along a mildly trafficated road that was 2 km long. The route was divided into eight inter- vals of equal length, and the patients rated their anxiety level on a 0–10 scale at the end of each interval. Uncompleted intervals were given a score of 10. An avoidance-anxiety score was computed by summing the anxiety scores for all intervals. (Hoffart, Friis, Strand, & Olsen, 1994)
The researchers discovered that the avoidance- anxiety score from the BAT technique was strongly related to self-reports of catastrophic thoughts (e.g., choking to death, having a heart attack, acting fool- ish, becoming helpless). This finding illustrates that behavioral assessment approaches often encompass a cognitive component as well. Notice, too, the di- rect relationship between the goal of therapy and the behavioral avoidance test. In agoraphobia, the primary treatment goal is to reduce patients’ anxiety about walking alone in open spaces—which is ex- actly what the BAT measures.
The BAT approach is predicated on the rea- sonable assumption that the client’s fear is the main determinant of behavior in the testing situation. Unfortunately, demand characteristics for desirable behavior may exert a strong influence on the cli- ent’s behavior. The client’s tolerance of the anxiety- inducing stimulus will bear some relationship to experienced fear but also has much to do with the situational context of assessment (McGlynn & Rose, 1998). The results of BAT assessments may not gen- eralize, and the therapist must be wary of foreclosing treatment too soon.
350 Chapter 8 • Foundations of Personality Testing
view of the future. In therapy, he uses a gentle form of cognitive restructuring to help the client perceive his or her problems in alternative, solvable terms.
Cognitive behavior therapists need not use formal assessment tools in their clinical practice. Typically, these therapists monitor the belief structure of their clients on an informal session-to-session ba- sis. Irrational and distorted thoughts are challenged as they arise during therapy. In the end, the client’s self-report of improvement may constitute the main index of therapeutic success. Nonetheless, several straightforward measures of cognitive distortion are available. We have outlined a few prominent instru- ments in Table 8.11. These instruments are mainly research questionnaires suitable to the testing of group differences, but not sufficiently validated for individual assessment. Clark (1988) faults the devel- opers of cognitive distortion questionnaires for pre- mature release of their instruments. In particular, he notes the absence of research on the concurrent and discriminant validity of most self-statement measures. Another problem is that existing questionnaires were designed to validate constructs in research and conse- quently do not work well in clinical practice.
structure of the client. The three best-known vari- ants of cognitive behavior therapy are Ellis’s (1962) rational emotive therapy (RET), Meichenbaum’s (1977) self-instructional training, and Beck’s (1976) cognitive therapy. Ellis postulates that most dis- turbed behavior is caused by irrational beliefs, such as the widespread belief that one must have the love and approval of all significant persons at all times. Ellis attempts to alter such core irrational beliefs, pri- marily by logical argument and forceful exhortation. Meichenbaum’s self-instructional technique consists of teaching the client to use coping self-statements to combat stressful situations. For example, a college student suffering from intense test-taking anxiety might be taught to use the following self-talk dur- ing examinations: “You have a strategy this time. . . . Take a deep breath and relax. . . . Just answer one question at a time. . . .” Beck’s cognitive therapy con- centrates mainly on the role of cognitive distortions in the maintenance of depression and other emo- tional disturbances. Beck (1983) regards depression as primarily a cognitive disorder characterized by the negative cognitive triad: a pessimistic view of the world, a pessimistic self-concept, and a pessimistic
tablE 8.10 Example of a Fear Survey Schedule
Please check the column that best describes your current response to these situations or objects.
Degree to which you would be disturbed
Not at All Just a Little Moderate Amount Very Much
Extremely Bothered
Being in a strange place
Speaking in public
Walking into a party
Getting an injection
People watching me work
Large open spaces
Being fat
Spider on the wall
Cat in the room
Reprimand from the boss
Note: Most fear survey schedules consist of several dozen items.
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 351
tablE 8.11 Questionnaire Measures of cognitive Distortion
Anxious Self-Statements Questionnaire (ASSQ) (Kendall & Hollon, 1989)
Examinee rates how often specific anxious thoughts occurred over the last week. Items are of the form:
I can’t stand it anymore. What’s going to happen to me now? I’m not going to make it.
A psychometrically sound instrument, the ASSQ can be used to assess changes in the frequency of anxious self-talk.
Automatic Thoughts Questionnaire (ATQ) (Hollon & Kendall, 1980; Kazdin, 1990)
The ATQ is a frequency measure of depression-related cognitions that assesses personal maladjustment, negative self-concept and expectations, low self-esteem, and giving up/ helplessness. The 30-item ATQ correlates very well with the MMPI Depression scale and the Beck Depression Inventory (Ross, Gottfredson, Christensen, & Weaver, 1986).
cognitive Errors Questionnaire (cEQ) (Lefebvre, 1981)
The CEQ assesses the degree of maladaptive thinking in general situations and also situations related to chronic low back pain. Discrete vignettes concerning chronic back pain and general scenes are each followed by an illogical dysphoric cognition. The respodent indicates on a 5-point scale how similar the cognition is to the thought he or she would have in the same situation. For example: “You just finished spending three hours cleaning the basement. Your spouse, however, doesn’t say anything about it. You think to yourself, ‘S(he) must think I did a poor job.’” Smith, Follick, Ahern, and Adams (1986) found that overgeneralization was the specific CEQ cognitive error most consistently correlated with chronic low back pain disability.
Attribution Styles Questionnaire (ASQ) (Seligman, Abramson, Semmel, & Von Baeyer, 1979)
The ASQ measures three attributional dimensions relevant to Seligman’s learned helplessness model of depression: internal-external, stable-unstable, and global-specific. Depressed persons attribute bad outcomes to internal, stable, and global causes; they attribute good outcomes to external, unstable causes. The questionnaire consists of 12 hypothetical situations, 6 describing good outcomes, 6 describing bad outcomes (e.g., “You have been looking for a job unsuccessfully for some time”). The respondents rate each vignette on a 7-point scale for degree of internality, stability, and globality.
Hopelessness Scale (HS) (Beck, 1987; Dyce, 1996)
A 20-item true/false scale, the HS is designed to quantify hopelessness, one component of the negative cognitive triad found in depressed persons. (The triad consists of negative views of self, world, and future.) The scale is sensitive to changes in the patient’s state of depression. In a validational study, Beck, Riskind, Brown, and Steer (1988) found that HS scores had a negligible relationship to anxiety or general psychopathology when the influence of coexisting depression was partialed out. Thus, the HS appears to measure a specific attribute of depression rather than general psychopathology.
352 Chapter 8 • Foundations of Personality Testing
A variety of normative results are available, with BDI data for samples of patients with major depression, dysthymia, alcoholism, heroin addic- tion, and mixed problems. The manual also provides guidelines for degree of depression based upon BDI score (0 to 9, normal; 10 to 19, mild to moderate; 20 to 29, moderate to severe; 30 and above, extremely severe). These ratings are based upon clinical evalu- ations of patients.
The BDI has been extensively validated against other measures of depression and independent cri- teria of depression. For example, correlations with clinical ratings and scales of depression such as from the MMPI are typically in the range of .60 to .76 (Conoley, 1992). Sex differences are minimal, al- though there may be slight differences in the expres- sion of depression between men and women (Steer, Beck, & Brown, 1989). Large college student samples of Whites (N = 838) and Blacks (N = 139), the BDI-II was found to be free of racial bias (Sashidharan, Pawlow, & Pettibone, 2012). Yet, in a comparison of 218 older patients (M = 69.4 years of age) versus 613 younger patients (M = 37.9 years of age), Kim, Pilkonis, Frank, Thase, and Reynolds (2002) found strong evidence of differential item functioning. Specifically, older patients tended to report fewer cognitive symptoms, especially for low to average levels of depression, and tended to report more so- matic symptoms, especially for high levels of depres- sion. The authors propose revised cut-off scores for the various levels of depression (mild, moderate, and severe) in older patients.
The BDI-II is particularly useful in primary care medical settings, where the presence of signifi- cant depression can be overlooked. Many patients are not aware of their illness, and some physicians may not be trained to examine for it. In a sample of 340 medical outpatients, Arnau, Meagher, Norris, and Bramson (2001) found that 23 percent of the group scored in the range indicative of mild, moderate, or severe depression on the test. The instrument proved helpful in identifying patients with depression who might otherwise be overlooked. Overall, the BDI-II was 92 percent accurate in identifying patients meet- ing the formal criteria for Major Depressive Disorder.
The only shortcoming of the BDI-II is its transparency. Patients who wish to hide their
An exceptional and well-validated measure not listed in Table 8.11 is the Beck Depression Inventory (BDI). The BDI is a short, simple, self-report question- naire that focuses, in part, on the cognitive distortions that underlie depression (Beck & Steer, 1987; Beck, Ward, Mendelsohn, Mock, & Erbaugh, 1961). One reason for its popularity is that most patients can com- plete the 21 items on the BDI in 10 minutes or less. The test has been widely used: More than 1,900 articles using the BDI have been published (Conoley, 1992). A second edition of the inventory was released in 1996 (Beck, Steer, & Brown, 1996). On the BDI-II, several items were revised so as to bring the inventory into closer conformity with prevailing diagnostic criteria for depression. The 21 items are of the following form:
Check the statement from this group that you feel is most true about you:
0 I am upbeat about the future.
1 I feel slightly discouraged about the future.
2 I feel the future has little to offer for me.
3 I feel that the future is utterly hopeless.
Thirteen items cover cognitive and affective compo- nents of depression such as pessimism, guilt, crying, indecision, and self-accusations; eight items assess somatic and performance variables such as sleep problems, body image, work difficulties, and loss of interest in sex. The examinee receives a score of 0 to 3 for each item; the total raw score is the sum of the endorsements for the 21 items; the highest possible score is 63.
In a meta-analysis of BDI research studies, the internal consistency of the scale (coefficient al- pha) ranged from .73 to .95, with a mean of .86 in nine psychiatric populations (Beck, Steer, & Garbin, 1988). The BDI-II possesses excellent internal con- sistency with a coefficient alpha of .92 (Beck, Steer, & Brown, 1996). Test–retest reliability of the BDI is modest, with a range of .60 to .83 in nonpsychiatric samples and .48 to .86 in psychiatric samples. How- ever, the test–retest methodology is not well suited to phenomena such as depression that are naturally unstable. Subjective depression fluctuates dramati- cally from week to week, day to day, even hour to hour. A lackluster value for test–retest reliability might signify valid change in the construct being measured rather than unwanted measurement error.
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 353
colleagues (Lewinsohn, Munoz, Youngren, & Zeiss, 1986).
Lewinsohn observed that depression goes hand in hand with a marked reduction in the experiencing of pleasant events. Depressed persons retreat from engaging in pleasant activities; the behavioral with- drawal only contributes further to their depression, inciting a continuous downward spiral. Fortunately, it is possible to replace the downward spiral with an upward one. To help reverse the downward spiral of depression, Lewinsohn and his colleagues devised the Pleasant Events Schedule (PES; MacPhillamy & Lewinsohn, 1982). The purpose of the PES is two- fold. First, in the baseline assessment phase, the PES is used to self-monitor the frequency (F) and pleas- antness (P) of 320 largely ordinary, everyday events. Examples of the kinds of events listed on the PES in- clude the following:
reading magazines going for a walk being with pets playing a musical instrument making food for charity listening to the radio reading poetry attending a church service watching a sports event playing catch with a friend working on my job
The frequency and pleasantness of these every- day events are both rated 0 to 2.6 The mean rate of pleasant activities is then calculated from the sum of the F × P scores; that is, mean rate = F × P/320. Normative findings for mean F, mean P, and mean F × P are reported in Lewinsohn, Munoz, Youngren,
despair or exaggerate their depression can do so eas- ily. However, for patients who are motivated to ac- curately report their cognitive and emotional status, the BDI-II ranks among the best instruments for in- dexing the presence and degree of depression. Some clinicians ask patients to complete the BDI-II after each therapy session; they use the BDI much as a physician might use a thermometer.
self-Monitoring Procedures
A common misconception about behavior therapy is that it consists of authoritarian therapists applying powerful rewards and punishments to passive cli- ents. Although this stereotypical model may be true for some impaired clients with limited behavioral repertoires, for the most part behavior therapy con- sists of humane practitioners teaching their clients methods of self-control. An emphasis upon self- monitoring is fundamental to all forms of behavior therapy. In self-monitoring, the client chooses the goals and actively participates in supervising, chart- ing, and recording progress toward the end point(s) of therapy. According to this model, the therapist is relegated to the status of expert consultant.
Self-monitoring procedures are especially useful in the treatment of depression, a prevalent behavior disorder consisting of sad mood, low ac- tivity level, feelings of worthlessness, concentration problems, and physical symptoms (sleep loss, ap- petite disturbance, reduced interest in sex). Several self-monitoring programs for depression have been reported (Lewinsohn & Talkington, 1979; Rehm, Kornblith, O’Hara, & others, 1981). In order to il- lustrate the self-monitoring approach to the control of depression, we will summarize one small corner of the program advocated by Lewinsohn and his
6The Frequency Scale is calibrated as follows: 0—This has not happened in the past 30 days. 1—This has happened a few times (1 to 6 times) in the past 30 days. 2—This has happened often (7 times or more) in the past 30 days.
The Pleasantness Scale is calibrated as follows:
0—This was not pleasant. 1—This was somewhat pleasant. 2—This was very pleasant.
354 Chapter 8 • Foundations of Personality Testing
hypothyroidism, heart disease) that may bear upon psychological adjustment. Axis IV is for reporting psychosocial and environmental problems (e.g., loss of friends, unemployment, litigation, no health insur- ance) that may impact personal functioning. Axis V consists of an anchored rating scale, the Global Assessment of Function (GAF) Scale, used to assign a summary score of functioning from 1 (e.g., immo- bilized, suicidal) to 100 (e.g., thriving, sought out). Of course, intermediate scores are available and clearly operationalized. For example, a GAF score of 70 indicates some mild symptoms but generally good psychological functioning.
Diagnosis is construed by some people as a form of pointless, overconfident, pigeonholing. In truth, it serves a number of indispensable functions. As outlined by Andreasen and Black (1995), these key purposes include:
• Reducing the complexity of clinical phenomena • Facilitating communication between clinicians • Predicting the outcome of the disorder • Deciding on an appropriate treatment • Assisting in the search for etiology • Determining the prevalence of diseases
worldwide • Making decisions about insurance coverage
Yet, for all of its advantages, there are also problems with DSM-IV. One problem is the sheer amount of time it can take to determine a multiaxial diagno- sis. A second and related difficulty is that, although the DSM-IV textbook describes the diagnostic cat- egories and alternatives with great precision, it does not specify a coherent method for arriving at the diagnosis. A third problem flows from the previ- ous two, namely, psychiatric diagnosis is mixed in its reliability (Andreasen & Black, 1995). Interrater agreement for some diagnoses is very high (e.g., Alcohol Use Disorder) but for other diagnoses it is only moderate to low (e.g., Borderline Personality Disorder).
Several interview schedules have been devel- oped to reduce the time needed for diagnosis and also to improve the reliability of the enterprise by standardizing the procedures. Broadly speaking, these instruments are of two types: semistructured approaches that allow for some clinician leeway in follow-up questioning, and structured approaches
and Zeiss (1986) and serve as a basis for treatment planning. Participants in the Lewinsohn program also monitor their daily mood on a simple 1 (worst) to 9 (best) basis.
The second use of the PES is to self-monitor therapeutic progress. Based on the initial PES re- sults, clients identify 100 or so potentially pleasant events and strive to increase the frequency of these events, monitoring daily mood along the way. Cli- ents who increase the frequency of pleasant events generally show an improvement in mood and other depressive symptoms.
The PES is a highly useful tool for clinicians who wish to implement a self-monitoring approach to the assessment and treatment of depression. MacPhillamy and Lewinsohn (1982) report favor- ably on the technical qualities of the PES and dis- cuss a variety of rational, factorial, and empirical subscales, which we cannot review here. The in- strument has fair to good test–retest reliability (one-month correlations in the range of .69 to .86), excellent concurrent validity with trained observers, and promising construct validity. In general, the subscales behave as one would predict on the basis of the constructs they purport to measure—we refer the reader to MacPhillamy and Lewinsohn (1982) for details.
struCturED intErviEw sChEDulEs
An important responsibility for many mental health practitioners is to determine a proper psychiatric diagnosis for their patients, within prevailing guide- lines. Almost without exception, practitioners uti- lize the Diagnostic and Statistical Manual of Mental Disorders, now in its fourth edition (DSM-IV; APA, 2000). The latest version includes a “Text Revision” and for this reason is known technically as DSM- IV-TR. Here we use the less cumbersome acronym DSM-IV. DSM-V is scheduled for release in 2013.
Five axes are included in the DSM-IV classi- fication. Axis I concerns clinical disorders such as Alcohol Use Disorder, Panic Disorder, Major De- pressive Disorder, or Schizophrenia. Axis II pertains to personality disorders such as Borderline Person- ality Disorder, Avoidant Personality Disorder, or Dependent Personality Disorder. Axis III is em- ployed to identify general medical conditions (e.g.,
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 355
SCID-NP for nonpatient settings in which a current psychiatric disorder is unlikely. All of the forms fol- low the same format in which the interviewer reads the SCID questions to the client in sequence, the objective being to elicit sufficient information to determine whether individual DSM-IV criteria are met. The interviewer has the leeway to ask for spe- cific examples of affirmative answers. Thus, SCID is a semistructured interview. A logical flow sheet is followed to determine the appropriate diagnosis. The SCID reveals generally good interrater agreement for DSM-IV diagnosis, but this is variable from one diag- nosis to the other. In Table 8.12, we have summarized the average kappas from multiple studies of SCID re- liability. Kappa values above .70 are considered good agreement, values from .50 to .69 are deemed fair, and values below .50 indicate poor agreement.
assEssMEnt by systEMatiC DirECt obsErvation
Although not a prominent approach with adults, systematic and direct observation is widely used in the evaluation of children, especially by psycholo- gists who work in school systems. In fact, Wilson and Reschly (1996) determined that systematic ob- servation is the single most commonly used assess- ment method among school-based practitioners, who reported an average of more than 15 student behavioral observations per month.
It is essential to distinguish systematic, direct observation from more casual approaches such as naturalistic observation. Anyone can engage in the informal and anecdotal methods that characterize naturalistic observation—and most people do so ev- ery day. These methods typically culminate in form- less conclusions such as “Johnny seems to be out of his seat a lot during the school day.” In contrast, sys- tematic and direct observation is highly structured and set apart by five characteristics (Hintze, Volpe, & Shapiro, 2002; Salvia & Ysseldyke, 2001):
1. The goal of observation is to measure specific behaviors.
2. The target behaviors have been operationally defined beforehand.
3. Observations are conducted under objective, standardized procedures.
that mandate a completely scripted approach. Here we will describe two prominent schedules to illustrate this important form of psychological assessment.
The Schedule for Affective Disorders and Schizophrenia (SADS; Spitzer & Endicott, 1978) is a highly respected diagnostic interview for evaluat- ing Axis I mood and psychotic disorders. The SADS is a semistructured inquiry that includes standard questions asked of all patients and optional probes used to clarify patient responses (Rogers, Jackson, & Cashel, 2004). Additional unstructured questions can be asked to augment the optional probes. Part I of the SADS methodically examines Axis I symp- toms for the current episode, including the worst period and the current week, whereas Part II pro- vides a survey of past episodes. Through a progres- sion of questions and criteria, the interviewer solicits sufficient information to assess the severity of dis- turbance and also to elucidate the diagnosis. For ex- ample, one item on the SADS addresses prominent signs of depression: pessimism and hopelessness. A standard inquiry for this item might be: “Have you felt discouraged?” An affirmative answer would trigger optional probes such as “How do you see things working out?”
Rogers (2001) has reviewed the voluminous research on reliability and validity of the SADS and offers an encouraging endorsement of the instru- ment. For example, the consensus from over 21 studies is that the interrater reliability for specific diagnoses is typically strong, with median kappa co- efficients of greater than .85. Kappa is the index of interrater agreement, corrected for chance ( Cohen, 1960). Validity for the SADS also is robust with moderate predictive validity (e.g., results moderately predict the course and outcome of mood disorders) and strong concurrent validity (e.g., results corre- late with other similar schedules). A child’s version of the schedule, known as the “kiddie” SADS or K-SADS, also is available (Ambrosini, 2000).
Finally, we would be remiss not to mention a family of instruments known as SCID, the Structured Clinical Interview for DSM-IV (First & Gibbon, 2004). SCID comes in numerous editions and varia- tions, including SCID-I for Axis I diagnoses, SCID-II for Axis II diagnoses, SCID-P for determining the differential diagnosis of psychotic symptoms, and
356 Chapter 8 • Foundations of Personality Testing
This form of assessment is appealing because of its direct link to intervention. In fact, it is com- mon to employ observational assessment before, during, and after an intervention to determine the impact on the individual student.
Commonly, systematic and direct observation is executed by means of an objective, structured cod- ing system. Many different styles of coding systems have been proposed; we have space here only to il- lustrate a few popular methods. Sattler (2002) pro- vides an extensive review, devoting two chapters to this topic. One straightforward approach is simple frequency counting of target behaviors. Typically, the target behaviors are undesirable behaviors such as a student leaving his or her seat, calling out, or being off task. Of course, the characteristics of these behaviors would be carefully specified in advance. Then an observer sits off to the side and unobtru- sively records the frequency of each behavior within discrete time periods. The purpose of this kind of as- sessment is to objectify the extent of troublesome ac- tions. This information serves as a baseline for later comparison to determine the effectiveness of any interventions. See Figure 8.4 for an example. In this hypothetical example, it is evident that the student “Sammy” is more out of control in the afternoon than the morning, which may be valuable informa- tion when it comes to remediation planning.
Another approach to systematic, direct obser- vation is to record the duration of target behaviors. Typically, target behaviors are undesirable actions such as temper tantrums, social isolation, or aggres- sive outbursts, but the focus of assessment also may include desirable behaviors such as staying on task during a designated reading period or vigilantly work- ing on a homework assignment (Hintze, Volpe, & Shapiro, 2002). For some behaviors, duration may be more important than frequency. Consider out-of- seat behavior as an example. A third grader who is out of his seat in a morning for six brief episodes of a few seconds each is far, far less problematic—both to self and others—than a student who leaves his seat once for 10 minutes. See Figure 8.5 for an example of a duration recording sheet. In this hypothetical ex- ample, it is evident that “Susan” exhibits a high level of undesirable behavior. The goal of intervention might be to reduce both the frequency and the aver- age duration of her tantrum behaviors.
4. The times and places for observation are care- fully specified.
5. Scoring is standardized and does not vary from one observer to another.
tablE 8.12 Average SciD interrater Agreement for psychiatric Diagnosis
Axis I Diagnoses Weighted
Kappa
Major Depressive Disorder 79
Dysthymic Disorder 63
Bipolar Disorder 77
Schizophrenia 80
Alcohol Dependence/Abuse 90
Other Substance Dependence/Abuse 86
Panic Disorder 75
Social Phobia 63
Obsessive Compulsive Disorder 53
Generalized Anxiety Disorder 66
Post-Traumatic Stress Disorder 89
Somatoform Disorder 41
Eating Disorder 71
Axis II Personality Disorders
Avoidant 64
Dependent 66
Obsessive Compulsive 56
Passive-Aggressive 67
Self-Defeating 62
Depressive 65
Paranoid 68
Schizotypal 70
Schizoid 76
Histrionic 64
Narcissistic 74
Borderline 62
Antisocial 72
Note: Decimals omitted.
Source: Average results for multiple studies reported on the
SCID website (www.scid4.org).
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 357
Although direct observations offer the utmost simplicity in format, it is important to recognize a number of threats to reliability and validity for this genre of assessment (Baer, Harrison, Fradenburg, Pe- tersen, & Milla, 2005). Sattler (2002) has catalogued the sources of unreliability, which include personal qualities of the observer, poor design of instruments, and problems in obtaining a representative sample of behavior. For example, observer drift occurs when an observer becomes fatigued and less vigilant over time, thus failing to notice target behaviors when they occur. Expectations also can influence ratings such as when the observer has been told that a child is aggressive—and then records questionably aggressive acts as aggressive. The primary antidote to observer inaccuracy is careful training and cross-checking of one observer against another to demonstrate a high level of interrater agreement. With regard to poor de- sign of instruments, the most common error is coding
complexity, in which there are too many categories or ill-defined categories. Attention to design of rat- ing scales and pretesting of instruments will avert this problem. Problems also can arise in the suitable sam- pling of behavior. For example, if a child’s attentional difficulties mainly arise in the afternoon, clearly it is pointless to collect data only in the morning. Ratings
In addition to the individualized forms of direct observation that we have illustrated here, dozens of published forms also are available (e.g., Sattler, 2002, Chapters 4 and 5). For these instruments, the catego- ries of observation and the operational definitions are prespecified, which saves time for the practitioner. For example, Shapiro (1996) has issued the Behav- ior Observation of Students in Schools (BOSS), a straightforward form that consists of six categories of classroom behavior—five designed for students and one for the teacher. The BOSS classifies behaviors as active engagement, passive engagement, off-task mo- tor, off-task verbal, and off-task passive. Of course, these categories are thoroughly defined in operational terms. Direct instruction by the teacher also is re- corded. The BOSS is rated in 15-second intervals for a 15-minute interval. The instrument also allows for the collection of behavioral norms for classmates to determine normative patterns in each category.
figurE 8.4 Example of a Frequency Recording Sheet
Date:
Student:
November 10, 2005
Sammy Smith
Observer:
Age: 8-5
Judy Jones
Grade: 3
Calling Out: Specific episodes of interrupting teacher, calling to classmates, making noise, yelling
Leaving Seat: Separate event such as standing with- out permission, leaving the seat, knees on seat
Off Task: Not doing assigned work (e.g., daydream- ing, playing with objects, doing other work)
Target Behaviors
Time Period
9:00–9:15
9:15–9:30
9:30–9:45
9:45–10:00
2:00–2:15
2:15–2:30
2:30–2:45
2:45–3:00
Calling Out
××××
×××
×××
×
×××××
××××××
×××××
××××
Leaving Seat
××
×××
×××
××
×××××
××××
×××
××××
Off Task
××××
××
××
××
××
×××××××
×××××××
×××××××
figurE 8.5 Example of a Duration Recording Sheet
Date:
Student:
November 10, 2005
Susan Brown
Observer:
Age: 8-5
Judy Jones
Grade: 3
Tantrum Behavior Separate Incidents
1
2
3
4
5
Total:
Average Episode
Time Start: 9:00
Elapsed Time in Minutes and Seconds
3 min 00 s
2 min 30 s
1 min 15 s
4 min 30 s
2 min 45 s
14 min 00 s
2 min 48 s
Time Stop: 12:00
358 Chapter 8 • Foundations of Personality Testing
couples, including husbands and wives seeking marital therapy (Heyman, 2001). In a standard paradigm, the clinician asks the couple to discuss two conflict areas for 5 to 7 minutes each. The clini- cian sits to the side observing the interactions and recording communication patterns with a standard form such as the Rapid Couples Interaction Scoring System (RCISS; Krokoff, Gottman, & Hass, 1989). The RCISS consists of 22 codes that address speaker and listener behaviors, both verbal and nonverbal, in such categories as criticism, disagreement, com- promise, positive solution, questioning, humor, and smiling. Instruments of this genre typically do not reveal strong interrater agreement for specific con- structs (e.g., put-downs), but the more inclusive con- structs such as positive affect versus negative affect fare better and provide information that is helpful in characterizing communication patterns (Heyman, 2001). There are little or no data on the test–retest reliability of the RCISS or similar instruments, and some researchers advise caution in their use. For ex- ample, King (2001) faults the RCISS because it does not deal adequately with issues of subtext or “read- ing between the lines” in couples’ communication.
ECologiCal MoMEntary assEssMEnt
Recent advances in wireless connectivity have spawned an entirely new approach to assessment known as eco- logical momentary assessment (EMA). Ecological
momentary assessment is defined as the “real-time measurement of patient experience in the real world, at the point of experience” (Shiffman, Hufford, & Paty, 2001). Consider the research problem of determining whether a new drug treatment is effective in amelio- rating the severe pain of migraine headaches. Whereas previous research methods relied upon retrospective questionnaire reports of patients receiving a new drug treatment, an EMA approach instead would consist of patients reporting their instantaneous experiences on a handheld device, with responses immediately trans- mitted (via the same wireless technology used by cell phones) to a central computer for ultimate analysis with sophisticated software. For example, the hand- held device might “beep” to signal that the patient should immediately respond (on a touch-sensitive
should be collected throughout the day or, if this is not possible, during the most salient time periods.
analoguE bEhavioral assEssMEnt
The methods of analogue behavioral assessment are closely related to the methods of systematic, direct observation. The main difference has to do with the settings in which the observations occur. In system- atic, direct observation, the assessment of clients takes place in a natural setting such as a classroom. In analogue behavioral assessment, clients are ob- served in a contrived but plausible setting and also are instructed to engage in relevant tasks designed to elicit behaviors of interest (Haynes, 2001). The goal is to create a state of affairs analogous to pivotal situa- tions in real life—hence, the use of the word analogue in describing this form of observational assessment.
Perhaps some examples will help clarify the nature and scope of this approach. One application of analogue behavioral assessment is the evalua- tion of children referred for assessment of behavior or school problems (Mori & Armendariz, 2001). A specialist who works with these children could dedi- cate a separate room in his or her clinic to analogue behavioral assessment. The room might resemble a small classroom, complete with blackboard, a few student desks, and bookcases. The referred child would be given a realistic homework assignment and told to work on it for 30 minutes while waiting for the interview. The psychologist then observes through a one-way window and records relevant be- haviors using a suitable rating scale.
Analogue behavioral assessment also can be used to evaluate parent–child interactions. For ex- ample, in evaluating a 3-year-old referred for behav- ior problems, the clinician might place the parent and child in a room full of toys with instructions to play for 10 minutes. The psychologist then in- structs the parent to tell the child, “Okay, it’s time to go. You have to pick up the toys just like you do at home.” The clinician observes through a one-way window and codes both the parental management style and the nature and degree of child compliance.
In like manner, analogue behavioral assess- ment has been used in the assessment of adult
www.ebook3000.com
Topic 8B • Self-Report and Behavioral Assessment of Psychopathology 359
expect this new technique to become commonplace in psychological outcome studies with clients.
In addition to practical applications in health care research, the EMA methodology also can be used to test psychological theories, as illustrated by a recent study of emotions. Tong, Bishop, Enkelmann, and others (2005) enlisted the cooperation of 118 police officers in Singapore to wear an ambulatory blood pressure monitor during their work day. This device also beeped at random about every 30 min- utes, a signal that the officer should fill out a simple 12-item questionnaire in a palmtop as soon as pos- sible. The items, rated on 5-point scales, included topics such as:
• How pleasant is this event? • To what extent are you getting what you
expected? • How much personal effort is needed to deal
with it? • How much control do you have over the event?
With practice, it would take less than a minute to fill out a questionnaire of this nature. Of course, the added advantage of the EMA approach is that data are collected in naturalistic settings in real time, and, therefore, not prone to biases in recall.
In some cases, EMA provides for insights that would be difficult to achieve with any other research methodology. Consider the common belief that binge eating is maintained because it reduces nega- tive affect, which is known as the affect regulation model (Polivy & Herman, 1993). Put simply, this is the view that people binge on food because they feel bad, and bingeing helps them feel better, at least in the short run. Because retrospective reports are no- toriously untrustworthy, researchers prefer more immediate access to personal experiences in real time. Fortunately, when EMA is used with large samples of binge eaters, it is inevitable that some of the randomly requested mood reports will occur just before and just after episodes of binge eating. In a meta-analysis of 36 EMA studies including 968 par- ticipants, Haedt-Matt and Keel (2011) found that negative affect increased prior to episodes of binge eating. But they also discovered that negative affect continues to increase afterward, which fails to sup- port a key prediction of the affect regulation model.
screen) to a series of rating scales for pain, mood, fa- tigue, and other relevant dimensions. The entire self- rating procedure might take less than a minute. The ratings would be requested several times a day on a randomized schedule.
Because EMA responses of clients are immediate and based on a schedule determined by the researcher, several biases of human recall are avoided. For example, consider the biasing effects of saliency, in which emotionally charged events domi- nate recall. For instance, a very brief episode of se- vere migraine pain may be recalled as lasting much longer than the actual experience because of the emotional valence of the incident. Whereas a retro- spective questionnaire report of this pain would be affected by the salience of the event, an EMA analy- sis, with periodic real-time sampling of the actual pain experiences, would provide a more accurate portrayal of the episode. Recency is another recall bias that is circumvented by EMA. The recency bias refers to the fact that people are more likely to recall recent events than remote events. Potentially, this could lead to underestimation of the therapeutic ef- fects of a drug if retrospective recall coincided with the onset of symptoms. In contrast, with an EMA analysis, client reporting consists of periodic and in- stantaneous time samples; the results are relatively unaffected by the recency bias.
In general, EMA provides a more accurate and reliable approach to the assessment of patient experience than traditional approaches such as retrospective questionnaires. One advantage is that compliance cannot be faked (as when patients fill out a week’s worth of daily questionnaires minutes before handing them in to the researcher). In fact, because EMA approaches are highly user-friendly, researchers report an astonishing overall compli- ance of 93 to 99 percent averaged across many studies (Shiffman et al., 2001). EMA has been used in research into treatments for acute pain, alcohol- ism, arthritis, asthma, depression, eating disorders, headaches, hypertension, gastrointestinal disorders, schizophrenia, smoking, and urinary incontinence (Shiffman & Hufford, 2001; Shiffman, Hufford, Hickcox, and others, 1997; Smyth, Wonderlich, Crosby, and others, 2001). As EMA technology be- comes streamlined and more affordable, we can
360
C H A P T E R 9
Evaluation of Normality and Individual Strengths
I n the previous chapter we surveyed tests used by psychologists to evaluate clients for a range of symptoms and life difficulties. These instruments included the mainstays of the profes- sion such as the MMPI-2, MCMI-III, Rorschach, and TAT. Such tests might be referred to
as “clinical” in nature, because they are well suited to the needs of clinical practice. But what are practitioners to do if they want to evaluate someone who is reasonably normal? In other words, assessment does not always entail delving into symptoms, distress level, defense mechanisms, diagnosis, and the like. One example might be a young executive who wants to know about “growth edges” in regard to leadership positions. Another example might be a college student who desires self-knowledge as part of vocational explorations.
Even though clinical tests such as those surveyed in the previous chapter can be employed within the normal spectrum, they do not excel in this application. In fact, the evaluation of nor- mal personality was not the original purpose of tests such as the MMPI or the Rorschach. For ex- ample, the initial objective of the MMPI-2 was the diagnosis of psychopathology, which remains the most dominant and effective application of the instrument. Historically, the purpose of the Rorschach has been described by Frank (1939) and others as providing an “X-ray of the mind” to identify themes hidden away from ordinary observation. Currently, the most common ap- plication of the test is with clients who display complex psychological symptoms that do not fit neatly into the categories of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV).
When a practitioner wants to assess personality within the normal spectrum, tests designed expressly for that purpose typically provide a more helpful perspective than instru- ments developed from the standpoint of psychopathology. Instead of measuring concepts such
TOPIC 9A Assessment Within the Normal Spectrum
Broad Band Tests of Normal Personality
Myers-Briggs Type Indicator (MBTI)
California Psychological Inventory (CPI)
NEO Personality Inventory-Revised (NEO-PI-R)
Stability and Change in Personality
Assessment of Moral Judgment
Assessment of Spiritual and Religious Concepts
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 361
human potential, an interest that has remained largely dormant in psychology since the early 1900s ( Seligman & Csikszentmihalyi, 2000). A special focus in this topic is the assessment of creativity.
Broad Band TesTs of normal PersonaliTy
A broad band test is one that measures the full range of functioning, as opposed to limited aspects. Beginning in the 1940s, researchers sought to cap- ture the nuances of normal personality by develop- ing broad-band self-report instruments. The sheer variety of approaches to this task is a testament to the complexity of human functioning. An enduring question, related to the previous topic on theories of personality, is how best to conceptualize the multi- faceted notion of personality. For example, is per- sonality best construed as a limited number of types, with most people resembling one type or another with reasonable precision? Or, is personality best interpreted as several dimensions, with each unique individual revealing a specific level of each dimen- sion? If a dimensional approach is preferred, how many dimensions are needed to describe the array of human responses: 5, 16, 20—or more?
There are no definitive answers to these ques- tions, although dimensional approaches generally have prevailed over typological methods in the his- tory of test development. Even so, useful and popu- lar typological approaches do exist. In fact, we begin the discussion of broad-band tests with an instru- ment that flexibly permits both a typological and a dimensional approach to the understanding of nor- mal personality.
myers-Briggs TyPe indicaTor (mBTi)
Originally published in 1962, the MBTI is a forced-choice, self-report inventory that attempts to classify persons according to an adaptation of Carl Jung’s theory of personality types (Myers & McCaulley, 1985; Tzeng, Ware, & Chen, 1989). As discussed below, recent adaptations of the test also provide dimensional scores in addition to the well- known four-letter typological codes.
as depression, paranoia, anxiety, narcissism, or sui- cide potential, the focus in these alternative instru- ments is on qualities pertinent to the normal range of human functioning. We are referring here to fea- tures like responsibility, social presence, intuition, locus of control, attachment style, or faith maturity. This chapter investigates an assortment of instru- ments suitable for assessment within the normal continuum and beyond.
Normality differs from abnormality by shades of gray rather than revealing a sharp demarcation (Offer & Sabshin, 1966). Understanding the vari- ous definitions of normality would involve a lengthy detour; we do not pursue the topic here. In their comprehensive textbook of psychiatry, Sadock and Sadock (2004) provide an excellent overview. Our goal here is to focus on useful tests and measures, including some that have been neglected because of the emphasis on psychopathology within the field of clinical psychology.
In Topic 9A, Assessment Within the Normal Spectrum, we explore the qualities of several tests and discuss their strengths and weaknesses. We feature a few widely used scales in this topic, in- cluding the venerable Myers-Briggs Type Indicator (Myers & McCaulley, 1985), one of the most widely employed personality tests of all time, and the Cali- fornia Psychological Inventory (Gough & Bradley, 1996), a measure with strong empirical roots.
In addition to their value in the assessment of client personality, tests also contribute to our under- standing of both typical and atypical trajectories of personality across the life span. For this reason, we follow a key research issue in personality psychol- ogy, namely, whether personality remains stable or tends to shift in specific directions with age. We close the topic with an evaluation of tools for assess- ing spiritual and religious constructs.
Other forms of assessment pertinent to the normal spectrum of adult functioning also are cov- ered in Topic 9A. We are referring here to the evalu- ation of spiritual, religious, and moral constructs. These specialized forms of assessment have received an increasing amount of attention in recent years.
In Topic 9B, Positive Psychological Assessment, we examine a number of relatively new scales that have emerged in response to a reawakening of interest in
362 Chapter 9 • Evaluation of Normality and Individual Strengths
Sensing–iNtuition involves two opposite ways of perceiving. Those who prefer sensing (S) rely on the immediate senses, whereas those who prefer intu- ition (N) rely upon “relationships and/or possibili- ties that have been worked out beyond the reach of the conscious mind” (Myers & McCaulley, 1985). Of course, the letter N is used to designate intuition because the letter I already is taken to label Intro- version. Thinking–Feeling refers to basing conclu- sions on thinking (T), that is, logic and objectivity, as opposed to feeling (F), which involves a reliance on personal values and social harmony. Finally, Judging–Perceiving indicates a preference for deci- siveness and closure (J) or an open-ended flexibility and spontaneity (P). Whereas in common parlance the notion of “judging” often has a negative conno- tation, this is not the case when the term is applied to this polarity of the MBTI.
The 16 possible four-letter types are not equally represented in the general population, and some types are more common in specific occupational groups. For example, in a sample of 231 education graduate students from a Midwestern university, the ENFP type was by far the most common (N = 43), followed by ENFJ (N = 28) in frequency. Codes be- ginning with the letter E (Extraversion) constituted nearly two-thirds of this sample, which highlights the importance of Extraversion in the field of education. Paraphrasing from Myers and McCaulley (1985, p. 78), the work expectations for someone who em- bodies the ENFP type are as follows:
• prefers to work interactively with a succession of people away from the desk
• likes to work with a succession of new prob- lems to be solved
• prefers to provide service that is appreciated • likes to work in changing situations that re-
quire adaptation
These qualities align well with the role expectations for people heading into the field of education.
Standardization data for the MBTI is extensive and based on large samples collected over many decades (Myers & McCaulley, 1985). One particularly useful table is a list of occupations em- pirically attractive to the sixteen types. For example, 18 percent of attorneys are INTJ in type, whereas only 2 percent of elementary school teachers fit this
According to the publisher, the MBTI is the most widely used individual test in history, taken by approximately 2 million people a year. Proponents of the instrument deem it valuable in vocational guidance and organizational consulting. It comes in a number of versions, including Form M, a 93-item test which can be purchased by qualified psycholo- gists in a self-scoring paper-and-pencil format, or administered on-line. Other forms such as the 126-item Form G and the 144-item Form Q are available on-line and must be authorized by a psy- chologist who has agreed to a licensing arrangement with the publisher, Consulting Psychologists Press (www.cpp.com).
Regardless of the version employed, the MBTI is scored on four theoretically independent polari- ties: Extraversion–Introversion, Sensing–iNtuition, Thinking–Feeling, and Judging–Perceiving. The test- taker is categorized on one side or the other of each polarity, which results in a four-letter code such as ENTJ (Extraversion, iNtuition, Thinking, Judging). Because there are two poles to each of the four di- mensions, this allows for 24 or 16 different person- ality types. Each of the 16 types has been studied extensively over the years.
The four polarities (E-I, S-N, T-F, J-P) do not necessarily correspond to common understandings of the anchor terms and hence require some expla- nation. It is also important to note that the concepts are intended to be value-neutral and merely descrip- tive. Thus, it is neither better nor worse to manifest Extraversion or Introversion. Likewise, Thinking and Feeling are simply different modalities and one is not better than the other, and so forth. The oppo- site ends of each polarity are simply different modes of being that may have a variety of implications for relationships, vocation, leadership, and personal functioning. Possessing the qualities of one polarity or the other may be advantageous (or not) in differ- ent situations.
Extraversion–Introversion is probably the easiest to describe. An extravert (E) directs energy outward to people and conversations, whereas an in- trovert (I) directs energy inward to his or her inner world. A note of clarification: The MBTI retains the original spelling of Extraversion, preferred by Jung, instead of using the synonymous concept of Extro- version, preferred by contemporary psychologists.
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 363
Given the stringency of the reliability approach (agreement across three administrations), these are respectable findings.
More than 400 references citing the MBTI were found in PsychINFO from 2000 to 2009, many pertaining to the validity of the instrument. For ex- ample, in a study of 177 managers, Higgs (2001) reported a significant relationship between emo- tional intelligence and the dominant MBTI function of iNtuition. Emotional intelligence is monitoring emotions of self and others and using this informa- tion to guide thinking and actions (Mayer & Salovey, 1993). A positive relationship with MBTI iNtuition is strong support for the validity of this dimension.
Another recent study also provides support for the validity of the polarities assessed by the MBTI. Furnham, Moutafi, and Crump (2003) tested 900 adults with two instruments: the MBTI and the Revised NEO-Personality Inventory (NEO-PI-R, Costa & McCrae, 1992). The NEO-PI-R is a well validated measure of personality that evaluates five factors of personality known as the “big five.” These factors are Neuroticism, Extraversion, Openness (to experience), Agreeableness, and Conscientiousness. As predicted by the authors, the MBTI dimensions revealed healthy and appropriate correlations with corresponding factors from the NEO-PI-R. Spe- cifically, the following averaged concurrent validity correlations were found between the MBTI dimen- sions and the NEO-PI-R scales: E-I correlated .71 with Extraversion; S-N correlated –.65 with Open- ness; T-F correlated –.35 with Agreeableness; and, J-P correlated .46 with Conscientiousness. The negative correlations indicate an inverse relation- ship, that is, those categorized as S (Sensing) on the MBTI obtained low scores on Openness, whereas those categorized as N (iNtuition) obtained high scores on Openness. In like manner a T or Thinking type tended to obtain low scores on Agreeableness whereas an F or Feeling type tended to obtain high scores. All of these correlations are consistent with theoretical understandings of the MBTI and hence buttress the validity of the instrument.
As mentioned, recent versions of the MBTI yield additional information beyond the four-letter typological classification. For example, the 144-item form Q, available on-line, provides a highly detailed and sophisticated summary report that partitions
code. This is useful information for clients who take the test in search of personal or career guidance. Split-half reliabilities for the four scales are in the .80s for the combined subject pool of nearly 56,000 participants. Test–retest reliabilities for the four scales are somewhat lower and depend on the inter- val between tests. When the interval is short, on the order of a few weeks, results are strong, with coef- ficients mainly in the .70s and higher. Yet, when the interval is longer, on the order of several years, the coefficients are predictably lower, in the .40s and .50s. With regard to reliability, an important ques- tion with the MBTI is the stability of the four let- ter code from test to retest. The test manual reports on a dozen studies of code type stability, with retest intervals ranging from 5 weeks to 5 years (most in- tervals a year or two). On average, about 41 percent of examinees retained their identical code type, that is, all four letters of the code remained the same from test to retest. About 38 percent of examinees remained stable on three of the four letters, that is, one letter changed for them. About 17 percent of examinees retained two of their four letters, but switched on the other two. And, 3 percent retained only one letter, switching on the other three. Over- all, these are impressive results as to the long-term stability of the MBTI code types.
In a review of 17 studies reporting reliability coefficients, Capraro and Capraro (2002) found re- spectably strong reliability coefficients of .84 (E-I), .84 (S-N), .67 (T-F), and .82 (J-P). Salter, Forney, and Evans (2005) conducted an especially rigorous evaluation of MBTI reliability, looking at the stabil- ity of MBTI categories across three administrations with 231 graduate students in education. The three administrations were at the beginning of the first year, beginning of the second year, and end of the second year. Their report included extensive analy- ses, but of interest here is the percentage of respon- dents who received the same classification (e.g., Extraversion or Introversion) on all three occasions. The percentage who displayed complete consistency for each dimension was as follows:
• E-I 67% • S-N 66% • T-F 69% • J-P 71%
364 Chapter 9 • Evaluation of Normality and Individual Strengths
One concern about the MBTI is that the increasing cost of administering the instrument—in the range of $10 to $30 per individual—provides a disincentive for outside researchers who want to conduct reliability or validity studies. This is an issue not only for the MBTI but also for the most widely used contemporary tests. Understandably, test publishers want to profit from their massive and expensive efforts at test development. But the down- side is that scholarly researchers need substantial funding if they desire to administer newer versions of the MBTI to large samples of examinees. Partly in reaction to the paucity of independent research on newer versions of this test, reviewers continue to suggest caution in its use, especially when making simplistic inferences from the four-letter type for- mulas ( Pittenger, 2005).
california Psychological invenTory (cPi)
Originally published in 1957, the CPI is a true–false test designed expressly to measure the dimensions of normal personality (Gough & Bradley, 1996; McAl- lister, 1988). The instrument is available in two forms, the CPI-434 (Gough, 1995) and the CPI-260 (www.skillsone.com), which is available only online. The component scales and the interpretive strategies are nearly identical for the two versions, which dif- fer mainly in the number of items—434 versus 260. Psychometric properties of both versions are similar and strong. Because of its ease of administration and the immediacy with which the practitioner receives an extensive computer-generated report, the CPI- 260 rapidly is gaining favor among psychological practitioners.
The CPI-260 is scored for 20 folk measures of personality, 7 work-related scales, and 3 broad vectors. The purpose of the test is to provide a clear picture of the examinee by using descriptors based on the ordinary language of everyday life (Gough & Bradley, 1996). Three of the basic personality scales also provide information on test-taking attitudes and therefore function as validity scales. These scales are Good Impression (Gi), which assesses the extent to which the individual presents a favor- able image to others; Communality (Cm), which
each of the four polarities into five facet scores. Hence the report includes a total of 20 facet scores in addition to the four-letter code. For example, the Thinking-Feeling dimension includes bipolar facets such as Logical-Empathetic, Reasonable- Compassionate, and Tough-Tender. The dimensions and facets of this version of the MBTI are displayed in Table 9.1. The report includes not only the typo- logical classifications (e.g., T or F) but also a rating for each bipolar facet on an 11-point continuum. This kind of nuanced dimensional information ap- peals to many users.
TaBle 9.1 Dimensions and Facets of the MBTI, Form Q
Extraversion (E) (I) Introversion
Initiating Receiving
Expressive Contained
Gregarious Intimate
Active Reflective
Enthusiastic Quiet
Sensing (S) (N) Intuition
Concrete Abstract
Realistic Imaginative
Practical Conceptual
Experiential Theoretical
Traditional Original
Thinking (T) (F) Feeling
Logical Empathetic
Reasonable Compassionate
Questioning Accommodating
Critical Accepting
Tough Tender
Judging (J) (P) Perceiving
Systematic Casual
Planful Open-Ended
Early Starting Pressure-Prompted
Scheduled Spontaneous
Methodical Emergent
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 365
The 20 folk measures and 7 work-related scales are listed and briefly described in Table 9.2. These scales are reported as T-scores normed to a mean of 50 and a standard deviation of 10 in the
measures unusual responses that might arise from carelessness or faking bad; and Well-being (Wb), which gauges the portrayal of serious emotional problems.
TaBle 9.2 Brief Description of Standard and Work-Related CPI-260 Scales
Standard Scales Common Interpretation of High Score
Do Dominance dominant, persistent, good leadership ability
Cs Capacity for Status personal qualities that underlie and lead to status
Sy Sociability outgoing, sociable, participative temperament
Sp Social Presence poise, spontaneity, and self-confidence in social situations
Sa Self-acceptance self-acceptance and sense of personal worth
In Independence high sense of personal independence, not easily influenced
Em Empathy good capacity to empathize with other persons
Re Responsibility conscientious, responsible, and dependable
So Social Conformity strong social maturity and high integrity
Sc Self-control good self-control, freedom from impulsivity and self-centeredness
Gi Good Impression concerned about creating a good impression
Cm Communality valid and thoughtful response pattern
Wb Sense of Well-being not worrying or complaining, free from self doubt
To Tolerance permissive, accepting, and nonjudgmental social beliefs
Ac Achievement via Conformance achieves well in settings where conformance is necessary
Ai Achievement via Independence achieves well in settings where independence is necessary
Cf Conceptual fluency high degree of personal and intellectual efficiency
Is Insightfulness int erested in and responsive to the inner needs, motives, and experiences of others
Fx Flexibility flexible and adaptable in thought and social behavior
Sn Sensitivity sensitive to others’ feelings, personally vulnerable
Work-Related Scales Common Interpretation of High Score
Mp Managerial Potential good judgment, effective at dealing with people
Wo Work Orientation strong work ethic, rarely complains about work
Ct Creative Temperament creative thinker who prefers what is new or different
Lp Leadership strong leadership skills, deals well with stress
Ami Amicability collegial and cooperative, a good team player
Leo Law Enforcement Orientation practical, well suited to work in law enforcement
Source: Based on Gough, H. G. and Bradley, P. (1996). CPI manual (3rd ed.). Mountain View, CA: Consulting Psychologists Press.
Also, Megargee, E. (1972). The California Psychological Inventory handbook. San Francisco: Jossey-Bass; and McAllister, L. (1988).
A practical guide to CPI interpretation. Palo Alto, CA: Consulting Psychologists Press.
366 Chapter 9 • Evaluation of Normality and Individual Strengths
variously referred to as self-realization, psychological competence, or ego integration. In the client feed- back report provided by the publisher, v.3 is referred to as Level of Satisfaction and scored 1 (low) to 7 (high). This vector acts as a moderator for each of the lifestyles, with high scores on v.3 leading to a posi- tive expression and low scores leading to a negative expression.
Results from several correlational studies con- firm distinctive psychological portraits for the four lifestyles mentioned above (Gough & Bradley, 1996). Briefly, the four life styles are as follows:
• Implementers (extroverted and rule-favoring) tend to do well in managerial and leadership roles.
• Supporters (introverted and rule-favoring) function well in supportive or ancillary positions.
• Innovators (extroverted and rule-questioning) are adept at creating change.
• Visualizers (introverted and rule- questioning) work best alone in fields such as art or literature.
The CPI Manual provides a wealth of infor- mation about each lifestyle, including adjective correlates obtained from spouses, peers, and profes- sional evaluators. From these empirical sources, a clear portrait of each lifestyle emerges. For example, the summary statement for Innovators is as follows:
Gammas attend to and seek the monetary, prestige, and other rewards offered by society, but are often at odds with the culture concern- ing the criteria by which these rewards are apportioned. Their values are personal and individual, not traditional or conventional. Gammas [Innovators] are the doubters, the skeptics, those who see and resist the arbitrary and unjustified features of the status quo. At their best, they are innovative and insightful creators of new ideas, new products, and new social forms. At their worst, they are rebellious, intolerant, self- indulgent, and disruptive; and at low levels on the v.3 scale, they often behave in wayward, rule-violating, and narcissistic ways. (Gough & Bradley, 1996, p. 50)
general population. The test developers used an empirical methodology of criterion-keying to de- velop the majority of the scales. Specifically, extreme groups of participants (mainly college students) were formed on such scale-relevant criteria as school grades, sociability, and participation in curricular activities. Item-endorsement frequencies were then contrasted to ferret out the best statements for each scale. For example, the Sociability (Sy) scale was constructed by contrasting item-endorsement rates for persons reporting a large number of social activi- ties versus those reporting few or no social activities. In constructing four of the folk scales, the authors used a rational basis backed up by indices of internal consistency.
Reflecting the care with which the scales were constructed, reliability data for the CPI are respect- able. Most alpha coefficients are in the .70s and .80s, with a median value of .76. The test–retest reliability coefficients tend to be somewhat lower, with a me- dian retest correlation of .68. The authors provide a wealth of normative data, including average test scores for 52 samples of males and 42 samples of females, subdivided by education, occupation, col- lege major, gender, and other variables. The basic normative sample consists of 3,000 males and 3,000 females of varying age, social class, and geographic region (Gough & Bradley, 1996).
In addition to the wealth of information pro- vided by the individual scale scores, the CPI also is scored on three broad dimensions or vectors de- rived from decades of factor-analytic studies with the instrument. The three vectors include two basic orientations and a third theme reflecting ego in- tegration. The first basic orientation called vector 1 or v.1 has two polarities: toward people or to- ward one’s inner life. This vector is similar to the extraversion– introversion dimension found in nearly every personality theory ever proposed. The second basic orientation or v.2 also has two polari- ties: rule-favoring or rule-questioning. This vector reflects a conventional–unconventional dimension also found in many studies. These first two bipolar orientations, v.1 and v.2, provide a 2 × 2 typology of four lifestyles termed the Implementer, Supporter, Innovator, and Visualizer lifestyles, described below. The third vector or v.3 assesses a 7-point continuum
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 367
previous chapter. It is available in two parallel forms consisting of 240 items rated on a five-point dimen- sion. An additional three items are used to check validity. A shorter version, the NEO Five-Factor Inventory (NEO-FFI) is also available (Costa & McCrae, 1989). We limit our discussion to the NEO PI-R. Form S is for self-reports whereas Form R is for outside observers (e.g., the spouse of a cli- ent). The item format consists of five-point ratings: strongly disagree, disagree, neutral, agree, strongly agree. The items assess emotional, interpersonal, ex- periential, attitudinal, and motivational variables.
The five domain scales of the NEO PI-R are each based upon six facet (trait) scales (Table 9.3). The internal consistency of the scales is superb: .86 to .95 for the domain scales, and .56 to .90 for the facet scales. Stability coefficients range from .51 to .83 in three- to seven-year longitudinal studies. Va- lidity evidence for the NEO PI-R is substantial, based on the correspondence of ratings between self and spouse, correlations with other tests and checklists, and the construct validity of the five-factor model it- self (Costa & McCrae, 1992; Piedmont & Weinstein, 1993; Trull, Useda, Costa, & McCrae, 1995).
The NEO PI-R is an excellent measure of personality that is especially useful in research. Rubenzer, Faschingbauer, and Ones (2000) describe a particularly fascinating research project with the test in which all U.S. presidents were evaluated by 115 highly informed, expert presidential biographers who filled out the NEO PI-R on behalf of the presi- dents, from George Washington through George H. W. Bush. The authors developed a typology of pres- idents from the data and related facets of the test to presidential success (i.e., historical greatness). They also published individual presidential profiles, such as the following results for George Washington (50 is average in the general population):
Neuroticism 47 Extraversion 44 Openness 39 Agreeableness 40 Conscientiousness 72
The portrait that emerges is of a leader who is well-adjusted, slightly introverted, not particu- larly open to experience, markedly disagreeable, and
The reader will notice that the third vector, v.3, moderates the expression of the Implementer lifestyle, for better or for worse. When v.3 is high, the Implementer is innovative and insightful. When v.3 is low, the Implementer is wayward and narcis- sistic. A similar pattern holds true for the other three lifestyles—each can have a positive or negative ex- pression, depending on the level of personal integra- tion reflected on the v.3 scale.
The CPI is heir to a long history of empirical research that substantiates a number of real-world correlates for distinctive test profiles. Due to space limitations, we can only list several prominent areas in which the value of the test has been empirically confirmed. The CPI is useful for helping predict the following:
• Psychological and physical health • High school and college achievement • Effectiveness of student-teachers • Effectiveness of police and military personnel • Leadership and management success
The CPI is particularly effective at identify- ing adolescents or adults who follow a delinquent or criminal lifestyle. For example, Gough and Bradley (1992) studied a sample of 672 delinquent or crimi- nal men and women, contrasting their CPI scale scores with a large sample of controls. Of the 27 scales evaluated, they found significant mean differ- ences on 25 for men and 26 for women. The most discriminating scale was Social Conformity (So), which revealed healthy point-biserial correlations of .54 for men and .58 for women. They also found that low scores on v.3 (a measure of ego integration) were associated with greater incidence of delinquency. The reader can find further details on the real-world empirical correlates of CPI profiles in Groth-Marnat (2003) and Hargrave and Hiatt (1989).
neo PersonaliTy invenTory- revised (neo Pi-r)
The NEO Personality Inventory-Revised (NEO PIR) embodies decades of factor-analytic research with clinical and normal adult populations (Costa & McCrae, 1992). The test is based upon the five- factor model of personality described in the
368 Chapter 9 • Evaluation of Normality and Individual Strengths
usually a safe assumption in research settings but may not hold true in forensic, personnel, or psychi- atric settings.
For purposes of education and research, sev- eral psychometricians have constructed websites where it is possible to self-administer an equivalent version of the NEO PI-R. Although not identical to the commercial version of the test (Costa & McCrae, 1992), these parallel adaptations do provide esti- mates of examinee standing on the five broad do- mains and 30 subdomains of personality tested by the NEO PI-R and also provide useful narrative reports. One such site can be found at www. personalitytest.
com. Another useful site is available at http://ipip.ori.
org. This location hosts the International Personality Item Pool (IPIP), advertised as a “scientific collabo- ratory for the development of advanced measures of personality and other individual differences.” The term collaboratory was coined by Finholt and Olson (1997) to describe Internet-based arrange- ments that facilitate the collaboration of test spe- cialists, regardless of geographical location. For example, the specific mission of IPIP is to bring test
extremely conscientious. After reviewing the specific facet scores (see Table 9.3), the authors concluded that Washington “falls quite short of the modern political commodities of warmth, empathy, and open-mindedness.”
The test also shows promise as a measure of clinical psychopathology. For example, Clarkin, Hull, Cantor, and Sanderson (1993) found that pa- tients diagnosed with borderline personality disor- der scored very high on Neuroticism and very low on Agreeableness, which resonates strongly with every clinician’s response to these challenging pa- tients. Ranseen, Campbell, and Baer (1998) deter- mined that 25 adults with attention deficit disorder scored significantly higher than controls in the Neuroticism domain and significantly lower in the Conscientiousness domain, demonstrating the use- fulness of the NEO PI-R in understanding attention deficit disorders in adulthood. One minor concern about the instrument is that it lacks substantial va- lidity scales—only three items assess validity. The administration of the NEO PI-R assumes that sub- jects are cooperative and reasonably honest. This is
TaBle 9.3 Domain and Facet (Trait) Scales of the NEO PI-R
Domains Facets
Neuroticism Anxiety Self-Consciousness
Angry Hostility Impulsiveness
Depression Vulnerability
Extraversion Warmth Activity
Gregariousness Excitement Seeking
Assertiveness Positive Emotions
Openness to Experience Fantasy Actions
Aesthetics Ideas
Feelings Values
Agreeableness Trust Compliance
Straightforwardness Modesty
Altruism Tender-Mindednesss
Conscientiousness Competence Achievement Striving
Order Self-Discipline
Dutifulness Deliberation
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 369
malleability of personality. What the lay public seldom recognizes, however, is that issues of stability and change in personality can be approached with empiricism through psychological assessment. As we will see, a few tests figure prominently in lifespan developmental research, especially instruments that embody the five-factor approach (Costa & McCrae, 1992).
One question central to the field of personality psychology is whether personality remains stable throughout life, or reveals predictable shifts in cer- tain qualities as we age. On the surface this question appears amenable to straightforward longitudinal research. Simply administer a suitable instrument to a large sample of the general population, and retest every five years or so. Then, chart the trends in di- mensions of personality over the life span. But this is not as simple as it seems. One problem is selec-
tive attrition, in which less healthy individuals tend to drop out, disappear, or discontinue the project for reasons known and unknown (Barry, 2005). Although there are methodological adjustments for minimizing the impact, selective attrition nonethe- less may skew results toward an unrealistically op- timistic picture of trends in aging. Another problem with longitudinal research is that decades of time are needed to follow individuals over the life span. Long-term developmental research is difficult and expensive.
An alternative strategy is cross-sectional research in which a large sample of individuals of all ages (from teenagers to persons in their 90s) is tested at one point in time, allowing for immedi- ate age comparisons in personality characteristics. This is an appealing technique but also fraught with methodological concerns. In particular, the cross-sectional strategy is vulnerable to a research problem known as cohort effects (Schaie, 2011). A cohort is a group of individuals born at roughly the same time who therefore share particular life expe- riences and historical influences. A cohort effect is the inference that differences between age groups (cohorts) are due to disparities in the nature and quality of early developmental or historical experi- ences rather than caused by the impact of aging. A hypothetical example will serve to illustrate. Suppose we observe in a cross-sectional study of neuroticism
development into the public domain and serve as a forum for the dissemination of research findings and psychometric developments.
Recently, the developers of the NEO-PI-R produced a new version that is more readable and therefore better suited to students as young as 12 years of age. The NEO-PI-3 is a careful and modest revision of the original instrument that ad- dresses a number of problematic items difficult for adolescents and young adults to comprehend ( McCrae, Costa, & Martin, 2005). As noted above, the NEO-PI-R consists of 240 items rated on a 5-point Likert scale from Strongly Agree to Strongly Disagree. The authors identified 30 items using words on a par with laissez-faire, fastidious, and adhere that even adults might find challenging. The authors rewrote these items for transparency and carefully tested them for equivalence in a new sam- ple of 500 respondents. Three illustrations of old items and replacement items (in boldface) are shown below. These are representative only, not the actual items and revisions:
1. I feel angst about the future.
1. I feel nervous about the future.
2. I think of myself as laissez-faire.
2. I think of myself as easy-going.
3. I enjoy situations of raucous hilarity.
4. I like to laugh.
An additional 18 items were rewritten be- cause they revealed low item-total correlations with the facet (trait) scale to which they belonged. The resulting instruments, the NEO-PI-3, retained the original five-factor structure and revealed better in- ternal consistency and readability than the previous version. In sum, the authors improved their test, es- pecially for applications with adolescent and college- aged clients (Costa, McCrae, & Martin, 2008).
sTaBiliTy and change in PersonaliTy
Most of us have heard adages like “People don’t change” or “Personality traits become exaggerated with age” or “You have to hit bottom before change is possible.” Opinions abound on the stability or
370 Chapter 9 • Evaluation of Normality and Individual Strengths
Attachment theory (Ainsworth & Bowlby, 1965) broadly distinguishes secure attachment from insecure attachment. In secure attachment, distressed infants seek proximity to caregivers and receive nurturance from them without pause or ambivalence. In insecure attachment, distressed in- fants are unable to receive a sense of security from caregivers who are themselves limited or erratic (George & Solomon, 1999). Insecure attachment is further subdivided into three types: avoidant, am- bivalent, and disorganized (Main & Hesse, 1990). Volumes have been written about these styles; we can provide only the barest of outlines here. In the avoidant style, the distressed infant appears emo- tionally distant and the caregiver is disengaged. In the ambivalent style, the distressed infant becomes anxious, insecure, and angry, and the caregiver is inconsistent. In the disorganized style, the dis- tressed infant seems depressed, angry, and passive, and the caregiver is extremely erratic.
Attachment theory is relevant to adult per- sonality development because, in the words of Mitchell (2007), “Attachment status becomes personality style” (p. 97). Corresponding to the four styles of infant attachment mentioned above (se- cure, avoidant, ambivalent, and disorganized), the linked attachment styles in adulthood are described as secure, dismissing of attachment, preoccupied with attachment, and disorganized-fearful (Main & Solomon, 1986). Questionnaires have been devel- oped to assess attachment style in adulthood (e.g., Simpson, Rholes, & Nelligan, 1992), but they are limited and drab in comparison to qualitative anal- ysis of interview materials.
In the case study of Ann, Mitchell (2007) determined that Ann started her journey into adult- hood (age 21) with a distinctly insecure attachment of the avoidant style. In narrative statements, Ann described a frightening childhood in which her mother died a prolonged death from cancer. This was bad enough, but compounding the trauma was that her father, previously a source of security, proved incapable of breaking the painful news to Ann, leaving it to her grandfather instead. Then, her father withdrew and became distant, which Ann experienced as even more devastating than the death of her mother. Ann developed an avoidant
(anxiety-proneness) that persons in their 70s score higher than those in their 50s. We might be tempted to attribute the apparent increase in neuroticism to the impact of aging and its attendant concerns. But that inference overlooks the possibility that the older participants in our study were always higher in neuroticism than the younger members, perhaps because their early formative years occurred during the frantic uncertainty of World War II, or for other unknown reasons. In this hypothetical example, the higher level of neuroticism would not be a general trend or result of traversing into old age, but a spe- cific quirk of the older cohort. Again, this is an hy- pothetical example. Real age trends in neuroticism are reviewed below.
Yet, the proposal that historical forces can shape the personality of an entire cohort is accurate. Elder (1974) has documented historical impacts on personality in a path-breaking longitudinal study of children raised during the Great Depression (1929– 1941). Among other findings, these children grew into adults who responded with habits of greater frugality than preceding or subsequent cohorts.
In studying age trends in personality, a certain degree of tentativeness is warranted, because no sin- gle study or method is conclusive. Some researchers combine longitudinal and cross-sectional methods in what is known as the cross-sequential approach (Nestor & Schutt, 2012). This method involves the longitudinal retesting of cross-sectional study par- ticipants on at least one additional occasion. The beauty of the cross-sequential method is that cohort effects can be distinguished from genuine longitudi- nal trends. This allows researchers to identify typical changes resulting from intrinsic maturation.
It is important to mention that core issues of personality change may not be wholly amenable to traditional methods of measurement. Consider the case study of Ann, interviewed on videotape five times over a span of 40 years, from age 21 to age 61 (Mitchell, 2007). She was one of more than 100 participants in the monumental Mills Longitudinal Study, conducted by Ravenna Helson (Helson & Soto, 2005). Mitchell (2007) analyzed the videotaped interviews of Ann through the lens of attachment theory, which we summarize briefly before return- ing to her story.
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 371
Individual reports of developmental trends in the Big Five factors over the life course often seem inconclusive or contradictory. In a study of 2,274 participants in their forties retested after 6 to 9 years, Costa, Herbst, McCrae, and Siegler (2000) found minimal or no change in mean level of the Big Five factors, even though popular accounts indicate that midlife is a time of crisis and turmoil. In contrast, others report that personality traits continue to transform in middle and old age, with increases in conscientiousness and agreeableness, and decreases in some elements of extraversion (Helson, Kawn, John, & Jones, 2002).
How can we reconcile these contradictory re- ports? Perhaps the best approach to this dilemma is a comprehensive synthesis of all relevant studies by means of meta-analysis. Meta-analysis is a sophisti- cated statistical procedure for combining data from multiple studies. In this method, results from stud- ies using different measurement techniques can be transformed to a common metric, the effect size, and then combined for powerful statistical analyses (Cohen, 1988). One type of effect size is Cohen’s d, which is the mean difference on a variable between two comparison groups divided by the standard de- viation of the pooled groups on that variable, or d = (M1 – M2)/sp. While effect sizes exist theoretically on an infinite range in positive and negative direc- tions, it is rare in everyday research that they exceed the bounds of +3.0 to −3.0, a value of 0 indicating no difference between groups. The beauty of meta- analysis is that studies using diverse tests, measuring slightly different constructs, based on varying scales of measurement, nonetheless can be transformed to the common metric of effect size and then combined for comprehensive analysis.
In regard to shifts in Big Five personality fac- tors over the life course, Roberts, Walton, and Viechtbauer (2006) completed a meta-analysis of 92 longitudinal samples to determine the patterns of normative change. Their findings constitute an authoritative synthesis of research in the field. They sorted the various personality test results into six categories closely resembling the Big Five taxonomy of personality traits. Their categories are effectively identical to the Big Five, except they split extraversion into two subcategories of social dominance and social
attachment style. She feared abandonment for most of her life:
This narrative presents a set of rich characters in a plot that devolves from intimate tender- ness to death, abandonment, and benign ne- glect. The strong-minded girl escaped, but in the process a door was closed that would not open again for nearly 40 years. (Mitchell, 2007, p. 100)
The door opened gradually after the birth of an adored daughter, four years of therapy to deal with attachment concerns, divorce, falling in love again, remarriage, return to school, and a new career. When last interviewed, Ann revealed an amazing shift to a secure attachment style:
At 61, Ann was phasing in retirement and was “much less stressed, much more easy going.” She was learning foreign languages, doing photography, involved in local politics, and often with her partner, family, and friends (p. 113).
The analysis provided by Mitchell (2007) is full of rich detail that we cannot recount here. The point of this somewhat lengthy digression into the case of Ann is that analyses based on average test scores from large groups of research participants, whether longitudinal or cross-sectional, will not cap- ture the depth and vibrancy available from the quali- tative study of individual lives in transition. Even so, empirical analyses provide a general framework for understanding stability and change in personality. Thus, we review key studies and conclusions below.
Personality stability and change in middle and late life
Do people change in personality traits across the life course? Several researchers have sought to identify mean-level changes or normative changes that are generalizable patterns of development found in most people (Caspi & Roberts, 1999). Most commonly, investigators use the Big Five model of personal- ity as their measurement perspective (Goldberg, 1981b). As the reader will recall, this is the view that personality is best conceived as five factors labeled neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness.
372 Chapter 9 • Evaluation of Normality and Individual Strengths
Young adulthood is when most persons leave home, find a career, and integrate with the community. The authors caution that their findings are based entirely on Western samples and generalization to non-Western cultures therefore is unknown.
Soto, John, Gosling, and Potter (2011) pur- sued the question of age differences in personality traits with an intriguing and massive cross-sectional research project. Their sample consisted of an as- tonishing 1,267,218 individuals (age 10 to 65) who responded to a Web-based questionnaire on Big Five personality traits. Their assessment instrument was the Big Five Inventory (BFI), a simple 44-item mea- sure with excellent psychometric qualities (John, Donahue, & Kentle, 1991; John, Naumann, & Soto, 2008). The BFI is freely available to researchers for noncommercial purposes. The format of the instru- ment is depicted in Table 9.4. The test developers isolated two distinctive subscales, called Facet scales, for each of the Big Five domains.
vitality. The six categories they investigated were emotional stability, conscientiousness, agreeableness, social dominance, social vitality, and openness to ex- perience. They summarize their findings as follows:
This study demonstrates that personality traits show a clear pattern of normative change across the life course. People become more so- cially dominant, conscientious, and emotion- ally stable mostly in young adulthood, but in several cases also in middle and old age. We found that individuals demonstrated gains in social vitality and openness to experience early in life and then decreases in these two trait do- mains in old age (Roberts et al., 2006, p. 14).
Further, they note that contrary to popular views about personality development, the biggest shifts occur not in adolescence, but in young adult- hood when social role expectations are more taxing.
TaBle 9.4 The BFI Facet Scales: Names and Example Items
BFI Facet Scale Example Items
Extraversion
Assertiveness 1. Has an assertive personality. 2. Is sometimes shy, inhibited. (R)
Activity 3. Is full of energy. 4. Generates a lot of enthusiasm.
Agreeableness
Altruism 1. Is helpful and unselfish with others. 2. Is considerate and kind to almost everyone.
Compliance 3. Has a forgiving nature. 4. Starts quarrels with others. (R)
Conscientiousness
Order 1. Tends to be disorganized. (R) 2. Can be somewhat careless. (R)
Self-Discipline 3. Perseveres until the task is finished. 4. Is easily distracted. (R)
Neuroticism
Anxiety 1. Worries a lot. 2. Remains calm in tense situations. (R)
Depression 3. Is depressed, blue. 4. Can be moody.
Openness to Experience
Openness to Aesthetics 1. Values artistic, aesthetic experiences. 2. Has few artistic interests. (R)
Openness to Ideas 3. Likes to reflect, play with ideas. 4. Is curious about many things.
Note: Reverse-keyed items are denoted by (R). The common stem for all BFI items is “I see myself as someone who . . . ” BFI _ Big Five
Inventory.
Source: Reprinted with permission from Soto, C. J., John, O. P., Gosling, S. D., & Potter, J. (2011). Age differences in personality traits
from 10 to 65: Big Five domains and facets in a large cross-sectional sample. Journal of Personality and Social Psychology, 100, 330–348.
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 373
level from age 20 to age 50, and then decline moderately to age 65. The higher scores for women compared to men document an estab- lished epidemiological trend in which women are more likely to manifest anxiety and de- pression than men (McLean, Asnaani, Litz, & Hofmann, 2011).
• Women score higher than men at all ages for Agreeableness, Conscientiousness, and Extra- version. Gender differences on Openness to Experience are complex, but at all ages men score higher than women on the Ideas facet. The interpretation of these gender differences is unclear.
The literature on age differences and longitudinal trends in Big Five personality domains is vast. We refer the reader to Helson and Soto (2005), Lüdtke, Roberts, Trautwein, and Nagy (2011), Specht et al. (2011), and Wortman, Lucas, and Donnellan (2012).
The assessmenT of moral JudgmenT
The moral Judgment scale
Kohlberg has proposed one of the few theories of moral development that is both comprehensive and empirically based (Colby, Kohlberg, Gibbs, & Lieberman, 1983; Kohlberg, 1958, 1981, 1984; Kohlberg & Kramer, 1969). Although he was more concerned with theory-based problems of moral development than with the nuances of standard- ized measurement, Kohlberg did generate a method of assessment that is widely used and intensely de- bated. We will review the underlying rationale for his measurement tool and discuss the psychometric properties of the instrument as well. In addition, we will take a brief look at a more objectively based ad- aptation of Kohlberg’s approach known as the Defining Issues Test (Rest, 1979; Rest & Thoma, 1985).
stages of moral development
Kohlberg’s theory grew out of Piaget’s (1932) stage theory of moral development in childhood. Kohlberg extended the stages into adolescence and adulthood. In order to explore reasoning about difficult moral
Their assessment tool, the BFI, is appropriate for children and adults of any age with a fifth-grade reading level. However, for participants younger than 10 and older than 65, sample sizes were too small to provide highly reliable estimates. The mini- mum sample size for each year of age was 922, and at least 422 persons of each gender were included. Their study is vast and comprehensive in its conclu- sions. We need to keep in mind that cross-sectional age differences may not mirror longitudinal age trends. As discussed above, cohort effects always could be at play. However, Soto et al. (2011) col- lected their cross-sectional data over a period of 7 years, and thus were able to examine for cohort ef- fects, which they did not find. We summarize here a few essential and remarkable findings:
• Scores for Agreeableness and Conscien- tiousness take a nosedive after age 10, reach- ing their lowest levels by far in the entire life span at age 13 and then climbing sharply into young adulthood at age 20. The popular ste- reotype that young teenagers are disagreeable and lacking in self-discipline rings true in this study.
• Scores on Agreeableness, Conscientiousness, and Openness to Experience all climb gradu- ally or moderately throughout the entire span of adulthood, from age 20 to 65. Some quali- ties do appear to improve indefinitely with age (at least to age 65).
• Scores on Extraversion are at their highest level at age 10, drop sharply until age 15, and then remain level across the life span. The contribution of the Activity component (e.g., “Is full of energy”) appears to explain the very high scores on Extraversion at age 10. After age 15 there is essentially no change in Extra- version scores.
• Scores on Neuroticism reveal abrupt gender differences. Women outscore men, sometimes dramatically so, at all age levels. For women, scores rise sharply to their highest levels at age 15–16 and then decline at a moderate pace for the remainder of the life span. It appears that the mid-teen years are especially difficult for girls. For men, scores on Neuroticism de- cline moderately from age 10 to 20, remain
374 Chapter 9 • Evaluation of Normality and Individual Strengths
several different dimensions to scoring, the one ele- ment most frequently cited in research studies is the overall stage of moral reasoning that characterizes a respondent.
critique of the moral Judgment scale
Early versions of the Moral Judgment Scale suffered serious shortcomings of scoring and interpretation. For example, in his doctoral dissertation, Kohlberg (1958) proposed two scoring systems: one using the sentence or completed thought as the unit of scor- ing, the other relying upon a global rating of all the subject’s utterances as the unit of analysis. Neither approach was fully satisfactory, and early reviews of the scale were justifiably critical of its reliability and validity (Kurtines & Greif, 1974).
In response to these criticisms, Kohlberg and his associates developed a scoring system that is
issues, he devised a series of moral dilemmas. One of the most famous is the dilemma of Heinz and the druggist:
In Europe, a woman was near death from a special kind of cancer. There was one drug that the doctors thought might save her. It was a form of radium that a druggist in the same town had recently discovered. The drug was expensive to make, but the druggist was charg- ing ten times what the drug cost him to make. He paid $200 for the radium and charged $2000 for a small dose of the drug. The sick woman’s husband, Heinz, went to everyone he knew to borrow the money, but he could only get together about $1000 which is half of what it cost. He told the druggist that his wife was dying, and asked him to sell it cheaper or let him pay later. But the druggist said, “No, I discovered the drug and I’m going to make money from it.” So Heinz got desperate and broke into the man’s store to steal the drug for his wife. (Kohlberg & Elfenbein, 1975)
After reading or hearing this story, the re- spondent is asked a series of probing questions. The questions might be as follows: Should Heinz have stolen the drug? What if Heinz didn’t love his wife? Would that change anything? What if the person dy- ing was a stranger? Should Heinz steal the drug any- way? Based on answers to this and other dilemmas, Kohlberg concluded that there are three main levels of moral reasoning, with two substages within each level (Table 9.5). One use of his measurement in- strument, the Moral Judgment Scale, is to determine a respondent’s stage of moral reasoning.1
The Moral Judgment Scale consists of several hypothetical dilemmas such as Heinz and the drug- gist, presented one at a time (Colby, Kohlberg, Gibbs, & others, 1978). In its latest revision, the scale comes in three versions called Forms A, B, and C. Scoring is quite complex, based on the exam- iner’s judgment of responses in relation to exten- sive criteria outlined in a detailed scoring manual (Colby & Kohlberg, 1987). Although there are
1Even though the Moral Judgment Scale has been widely used for empirical research, Kohlberg (1981, 1984) suggests that its most valuable application is for the promotion of self-understanding and the development of moral reasoning in the individual respondent.
TaBle 9.5 Kohlberg’s Levels and Stages of Moral Development
Level 1: Preconventional
Stage 1. Punishment and obedience orientation: The physical consequences determine what is good or bad.
Stage 2. Instrumental relativism orientation: What satisfies one’s own needs is good.
Level 2: Conventional
Stage 3. Interpersonal concordance orientation: What pleases or helps others is good.
Stage 4. “Law-and-order” orientation: Maintaining the social order and doing one’s duty is good.
Level 3: Postconventional or Principled
Stage 5. Social contract-legalistic orientation: Values agreed upon by society determine what is good.
Stage 6. Universal ethical-principle orientation: What is right is a matter of conscience derived from universal principles.
Source: Based on Kohlberg (1984).
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 375
overdose of morphine to hasten her death. What should the doctor do? Three options of the following kind are listed:
______He should give the woman a fatal overdose
______Should not give the overdose
______Can’t decide
The examinee’s choice does not enter directly into the determination of the moral judgment score. The real purpose in forcing a choice is to cause the ex- aminee to think about the importance of various fac- tors in making the decision. Following the choice of proper action, the examinee rates the importance of several factors on a five-point Likert scale: great, much, some, little, or no importance. The factors are distinct for each dilemma. The factors differ in the level of moral judgment they signify, ranging from Kohlberg’s stage 1 through stage 6. In the case of the preceding dilemma, the factors include such matters as follows:
______ Whether the doctor can make it look like an accident
______ Can society afford to let people end their lives when they want to
______ Whether the woman’s family favors giving the overdose or not
These ratings form the basis for generating several quantitative scores that pertain to the moral judgment of the examinee. The most widely used score is the P score, which is a percentage of princi- pled thinking. Reliability of the P score ranges from .71 to .82 in test–retest studies (Rest, 1979, 1986). Validity has been studied by contrasting groups known to differ on principled thinking. For example, graduate students in moral philosophy and political science, general college students, high school seniors, and ninth-grade students were found to differ appro- priately and systematically on the P score. In longitu- dinal studies, significant upward trends were found over six years and four testings. Recently, Rest has recommended a new measure of moral judgment, the N2 index, calculated on the basis of several complex formulas that use both ranking and rating data. The two indices are highly correlated in the .90s. Nonethe- less, in a retrospective analysis of previous studies, the N2 index outperformed the P index by a substantial margin (Rest, Thoma, Narvaez, & Bebeau, 1997).
unparalleled in its clarity, detail, and sophistication (Rest, 1986). Fortuitously, since the moral dilemmas of the Moral Judgment Scale have remained con- stant over the years, it is possible to apply the new scoring system to old data. The capacity to reanalyze old data and compare them with new data is invalu- able in determining the reliability and validity of an existing scale. A most important study in this re- gard has been published by Kohlberg and associates (Colby et al., 1983).
This investigation reports the results of using the new scoring system in a longitudinal study span- ning more than 20 years. The results are impressive and offer strong support for the reliability and valid- ity of the instrument. Test–retest correlations for the three forms were in the high .90s, as were interrater correlations. Longitudinal scores of subjects tested at three- to four-year intervals over 20 years revealed theory-consistent trends. Fifty-six of 58 subjects showed upward change, with no subjects skipping any stages. Furthermore, only 6 percent of the 195 comparisons showed backward shifts between two testing sessions. The internal consistency of scores was also excellent: about 70 percent of the scores were at one stage, and only 2 percent of the scores were spread further than two adjacent stages. Cronbach’s alpha was in the mid-.90s for the three forms. These findings have been corroborated by Nisan and Kohl- berg (1982). Heilbrun and Georges (1990) also report favorably upon the validity of the Moral Judgment Scale, insofar as postconventional development is correlated with higher levels of self-control, as would be predicted from the fact that morally mature per- sons often oppose social pressure or legal constraints. In sum, the Moral Judgment Scale is reliable, inter- nally consistent, and possesses a theory-confirming developmental coherence.
The defining issues Test
The Defining Issues Test (DIT) is similar to the Moral Judgment Scale but incorporates a much sim- pler and completely objective scoring format (Rest, 1979, 1986). The examinee reads a series of moral dilemmas similar to those designed by Kohlberg and then chooses a proper action for each. For example, one dilemma involves a patient dying a painful death from cancer. In her lucid moments, she requests an
376 Chapter 9 • Evaluation of Normality and Individual Strengths
define what is right or moral. (Richards & Davison, 1992, p. 470)
These researchers demonstrate empirically that certain DIT items measure a different construct for conservative religious persons than for the gen- eral population. As a consequence, the validity of the test in these groups is open to question.
Relatively few studies have investigated the re- lationship between level of moral development on the DIT and moral behavior. This is understandable, given that the purpose of the DIT is not directly to predict behavior but to evaluate moral development. Still, it is a reasonable assumption that individuals who receive higher P scores on the DIT should also refrain from moral transgressions such as cheating on tests. A study by Cummings, Maddux, Harlow, and Dyas (2002) investigated this particular relationship by asking 145 college students majoring in education to anony- mously fill out both the DIT and the Assessment of Academic Misconduct (AAM). The AAM is a 41-item measure of misbehaviors such as copying test answers, downloading term papers, retrospectively changing test answers, and so forth. Although these individuals reported an average (but prolific!) level of academic misconduct for college students—fully three-fourths admitted to one or more transgressions—there was absolutely no relationship between scores on the DIT and scores the AAM. Certainly, more research is needed on the connection (or disconnection) between moral reasoning and moral action.
Another concern about the DIT is the dearth of norms pertinent to minority groups. Finally, Westbrook and Bane (1992) argue that the technical manual for the DIT lacks essential details needed to evaluate the adequacy of the test. In spite of the con- cerns listed here, the DIT is a widely respected test, particularly for research on moral reasoning. Thoma (2006) provides a thorough review of research on the DIT.
The assessmenT of sPiriTual and religious concePTs
Within the field of psychology, transcendent topics such as spiritual well-being or faith maturity never have received mainstream attention. Many years
Over 600 articles have been published on the Defining Issues Test (McCrae, 1985). In general, the instrument is considered a useful alternative to Kohlberg’s Moral Judgment Scale, particularly for research on group differences in moral reasoning. However, reviewers do note several cautions about the DIT (Westbrook & Bane, 1992). First, the test uses two moral dilemmas from the Vietnam War and is, therefore, somewhat dated. Many young ex- aminees have little knowledge of (and perhaps no interest in) this topic and may find it difficult to identify with these questions. Another dilemma— the classic case of whether Heinz should steal a drug to save his wife’s life—is also of dubious value since it has been widely publicized and reprinted in col- lege textbooks. A significant proportion of prospec- tive examinees are no longer naive about this moral dilemma.
Richards and Davison (1992) have pressed the point that the DIT is biased against conser- vatively religious individuals. Certainly, it is well established that conservative or fundamentalist religious people tend to score lower than average on the P score of the Defining Issues Test (Getz, 1984; Richards, 1991). According to Richards and Davison (1992), the reason for this is that stage 3 and stage 4 items (unintentionally) possess strong theological implications that cause fundamentalist individuals to endorse the items, thereby lowering their score on the test. Consider items that tap stage 4 reasoning, which is the “law and order” orienta- tion that equates “moral” with doing one’s duty and maintaining the social order. Whereas nonreligious persons might support the laws of the land (and en- dorse stage 4 items) because they believe that legal authorities define what is right and moral, religious minorities such as Mormons believe that support- ing the laws of the land is a theological and religious obligation that flows directly from articles of faith in their religion:
While Mormons place a high value on obeying the law and supporting legal authorities, this value is due to their theological belief that God has commanded them to do so, and not be- cause they believe, as do true Stage 4 thinkers, that the laws of the land or legal authorities
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 377
at the front door. Consider the experience of Viktor Frankl (1963), a Nazi death camp survivor and founding figure of existential psychology. At one point during World War II he had to surrender his coat with a cherished manuscript in the pockets in exchange for the worn-out rags of an inmate sent to the gas chamber:
Instead of the many pages of my manuscript, I found in a pocket of the newly acquired coat a single page torn out of a Hebrew prayer book, which contained the main Jewish prayer, Shema Yisrael. How should I have interpreted such a “coincidence” other than as a challenge to live my thoughts instead of merely putting them on paper?
In the remainder of this topic, we take the view that spiritual and religious dimensions to life often serve constructive purposes and that assessment within these domains is worthy of additional study.
challenges and Purposes of religious and spiritual assessment
Other than personal or scholarly curiosity about religious and spiritual matters, what might be the motivation for religious and spiritual assessment? Further, what is spirituality, and how is it distin- guished from religiousness? It appears evident that some people can be religious without being spiritual, ghost walking through religious traditions with no involvement of heart. But is it possible to be spiritual without being religious? Before we review specific assessment tools, it will prove helpful to examine the distinction between spirituality and religiousness, and to discuss the reasons for assessment in the first place.
According to the Yearbook of American and Canadian Churches (2012), total church member- ship has declined steadily for many years, even though some denominations have increased in pop- ularity. Alongside this general decline in traditional forms of worship, spiritual practices have expanded in popularity, as witnessed by the proliferation of meditation, 12-step, Eastern, yoga, and other broadly spiritual practices. For example, mindful- ness meditation, with roots in Buddhism, is more
ago, Gordon Allport (1950) lamented that the sub- ject of religion “seems to have gone into hiding” among intellectuals and academic researchers:
Whatever the reason may be, the persistence of religion in the modern world appears as an embarrassment to the scholars of today. Even psychologists, to whom presumably nothing of human concern is alien, are likely to retire into themselves when the subject is broached. (p. 1)
The situation is little improved in contempo- rary times. For example, except for a few specialty journals, spiritual and religious topics are virtually absent from the psychological literature.
Yet researchers have no right to retire from the field, given its significance to the average person. Consider these statistics on religious belief in the United States, stable since 1944 when national polls first came into use (Hoge, 1996):
• Belief in God has remained constant at about 92 to 95 percent of the population.
• Belief in the divinity of Jesus Christ has been endorsed by 75 to 80 percent of adults.
• Belief in an afterlife has remained at about 75 percent of the population.
Comparable statistics are not available world- wide, but it seems likely that the percentage of be- lieving individuals (whether Muslim, Buddhist, Hindu, Jew, or other) is very high. Most people em- brace a spiritual perspective in life, and surely this must have some bearing on their adjustment, behav- ior, and outlook.
Unfortunately, the field of psychology, includ- ing the specialty area of testing, largely has main- tained an indifference to this important aspect of human experience. Worse yet, in many intellectual circles the endorsement of spiritual or religious sentiments is seen as evidence of psychopathology. Among others, Sigmund Freud endorsed a cynical view of religion in his aptly titled essay, The Future of an Illusion (1927/1961). Yet for many persons, a connection with the transcendent is essential to meaning in life. This is especially so in times of ex- treme duress, as when personal annihilation knocks
378 Chapter 9 • Evaluation of Normality and Individual Strengths
tools? Richards and Bergin (2005) make a strong case that clinicians need to include spiritual and religious assessment in psychotherapy. They list five reasons for a spiritual-religious assessment of clients, which include: understanding client world view and im- proving the capacity of the therapist to empathize; establishing the impact of spiritual-religious views on the presenting problem; determining if the spiritual- religious views of the client can be used for growth or coping; identifying which spiritual interventions might be useful in therapy; and, recognizing any spiritual doubts that need to be addressed in therapy. These benefits of spiritual-religious assessment can be extended beyond the therapeutic alliance. Even individuals who are functioning within the normal spectrum of personality will benefit from feedback about their spiritual-religious health.
historical overview on spiritual and religious assessment
Interest in the psychology of religion can be traced to the early 1900s when William James (1902) com- posed his masterpiece, The Varieties of Religious Experience. In this book, James catalogued the mani- fold ways in which humans reveal their interest in transcendent matters. His overall conclusion was that religion is “an essential organ of our life, per- forming a function which no other portion of our nature can so successfully fulfill.”
Although many writers have offered psycho- logical analyses of religion since the seminal writings of James, it was not until the 1960s that scales for the assessment of religious variables began to appear (Wulff, 1996). One of the first such measures was the Allport-Ross Religious Orientation scales, which pro- posed to assess two dimensions of religious expres- sion, the intrinsic and the extrinsic (Allport & Ross, 1967). Intrinsically religious persons were thought to live their religion (e.g., to find meaning, direction, outlook), whereas extrinsically religious persons were believed to use their religion (e.g., to seek security, sta- tus, sociability). In his earlier writings on this topic, Allport referred to intrinsic religious expression as a genuine or mature religious orientation, whereas ex- trinsic religious expression was viewed as immature. Later he dropped the mature–immature designations because the labels seemed overly judgmental.
popular than ever (Williams & Penman, 2011). It is recommended for problems with anxiety, depres- sion, pain, hyperactivity, sleep, parenting, stress, tin- nitus, psoriasis, Parkinson’s disease—the list goes on and on. Those who practice mindfulness, for what- ever initial purpose, often embrace it as a way of being in the world, a spiritual discipline.
But what is spirituality, and how is it distin- guished from religiousness? Certainly the two share broad overlap in many cases, but each must pos- sess unique qualities if assessment is to succeed. Kapuscinski and Masters (2010) review the vexing problem of definition and conclude that the terms continue to be used separately but with little agree- ment on meaning. Others think we are beginning to see a consensus in the field:
Despite definitional difficulties, there is agree- ment among researchers that individuals have the capacity to experience spirituality outside the context of religious institutions. Religion is frequently defined by institutional affiliation, whereas spirituality is not. Religion is also of- ten considered more external or mediated by a group, whereas spirituality is more closely associated with personal experience and is less doctrinaire (Masters & Hooker, 2012, p. 2).
Heedful that definitions and distinctions will remain fuzzy, we believe there is merit in developing mea- sures of spirituality and religiousness as separate but overlapping constructs (Hill & Pargament, 2008).
In spite of challenges with definition, efforts to develop measures of spirituality and religious- ness have flourished in recent years. For example, Hill and Hood (1999) compiled information on 125 measures of spirituality/religiosity. Dozens of new scales have been developed since the release of their compendium. The Search Institute, which serves ed- ucators, parents, youth groups, faith communities, and researchers in efforts to create a better world for children, lists 18 easily accessible measures of spiri- tuality, the majority published in recent years (www
.search-institute.org). There is an abundance of available instruments.
The motivations for completing an assessment of spirituality or religiousness might include per- sonal curiosity, but are there other purposes for these
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 379
the Intrinsic scale actually reveal higher levels of authoritarianism, close-mindedness, and preju- dice toward African Americans, gays, and lesbians. Hunsberger (1995) concludes that it is not religion per se that makes for prejudice, nor is it intrinsic/ extrinsic religious orientation. Instead, “it is the way in which religious beliefs are held that seems most directly associated with prejudice, and this is best explained by the tendency for fundamental- ism and right-wing authoritarianism to be closely linked.” Specifically, he links prejudice against mi- norities with authoritarian religious traditions that promote an absolute truth, divide the world into “Good” and “Evil,” and shun complexity or doubt in their belief systems. These aspects of religious expression are not typically measured by paper- and-pencil tests.
religion as Quest
Increasingly, the conceptual basis for the distinction between intrinsic and extrinsic religious orientation has been questioned. Kirkpatrick and Hood (1990) summarized the major theoretical and methodologi- cal criticisms of the scales as follows:
• A lack of conceptual clarity in what the Intrinsic–Extrinsic scales are supposed to be measuring. Are these types of motivation (i.e., the motives associated with religious belief and practice), or personality variables (i.e., pervasive aspects of institutional behavior or involvement), or something else?
• A confusion over the relationship between the Intrinsic–Extrinsic scales. In particular, are these opposite ends of a single bipolar dimen- sion, or do the scales measure separate dimen- sions (so that conceivably some persons could score high on both)?
Other problems cited include weaknesses in the fac- torial structure, reliability, and construct validity of the scales; excessive reliance on a “good religion” versus “bad religion” dichotomy; and the folly of defining and studying religiousness independent of belief content (Kirkpatrick & Hood, 1990).
In response to the limitations of the Religious Orientation scales, Batson and his associates (1993) developed a measure of a third religious orientation
The impetus for development of these scales was Allport’s distressing observation of a positive relationship between religiosity (in certain forms) and authoritarian, bigoted, prejudicial attitudes. As a devoutly religious person, Allport was convinced that intrinsically oriented religious individuals rarely would harbor these attitudes. After all, an essential precept of almost every religious faith is an attitude of love toward one’s neighbors. In the Christian faith, this view is summed up in the famous dictum “Love your neighbor as yourself” (Mark 12:31). Yet the evidence was overwhelming to Allport that at least some religious individuals did reveal hatred, bigotry, and prejudice toward their neighbors. The usual targets of these malicious attitudes were racial minorities, Jews, and homosexual persons, among others. He reasoned that religious persons with in- tolerant attitudes possessed a predominantly extrin- sic religious orientation; that is, their faith served external goals such as status in the community, be- longing to an in-group, and the like. The investiga- tion of this hypothesis (that extrinsically religious persons would be more authoritarian, bigoted, and prejudiced than intrinsically religious persons) re- quired appropriate tools. For this purpose, Allport and colleagues developed the Religious Orientation scales.
Examples of the kinds of items on the 11-item Extrinsic scale and the 9-item Intrinsic scale are as follows:
• The church is important as a place to develop good social relationships. (Extrinsic)
• Sometimes I find it necessary to compromise my religious beliefs for economic reasons. (Extrinsic)
• I try hard to carry my religion over into other aspects of my life. (Intrinsic)
• My religion is important because it provides meaning to my life. (Intrinsic)
Although originally devised in a yes–no format, modern applications of these scales utilize a nine- point continuum from (1) strongly disagree to (9) strongly agree (Batson, Schoenrade, & Ventis, 1993).
Research on the Religious Orientation scales has not provided strong support for Allport’s original hypothesis (Wulff, 1996). In fact, several studies have shown that persons scoring high on
380 Chapter 9 • Evaluation of Normality and Individual Strengths
finding supports the view that the scale is a valid measure of something religious.
• The 32 members of a charismatic Bible study group scored significantly higher (p < .001) on the Quest scale (mean of 5.5) than the 26 members of a traditional Bible study group (mean of 4.6). The charismatic group placed emphasis on religion as a shared search; most prayed with hands raised, and some members spoke in tongues.
Quest is its own dimension of religious expres- sion, and substantial research on the meaning and correlates of this faith orientation has been com- pleted. Batson et al. (1993) summarize research with the Quest scale by noting that it appears to measure a religion of less faith but more works.
Quest arose as a response to the limitations of the Intrinsic and Extrinsic approach to the measure- ment of religious orientation. But this brief 12-item scale possesses its own limitations, chief among them its brevity and factorial simplicity. Several other in- struments have been proposed to measure aspects of religious experience. We survey a few prominent and representative approaches in the following sections.
The spiritual Well-Being scale
The concept of spiritual well-being can be traced to a paper by Moberg (1971) that proposed this form of well-being as an essential component of healthy aging. Spiritual well-being was conceptualized as a two-dimensional construct consisting of a vertical dimension and a horizontal dimension. The vertical dimension concerned well-being in relation to God or a higher power, whereas the horizontal dimen- sion involved existential well-being, which is a sense of purpose in life without any specific religious ref- erence. The challenge of developing a scale to mea- sure these components of well-being was taken up by Ellison (1983) and Paloutzian and Ellison (1982).
Their instrument was designated the Spiritual Well-Being Scale (SWB Scale). The SWB Scale con- sists of two subscales: Religious Well-Being (RWB), which assesses the vertical dimension of well- being in relation to God; and Existential Well-Being (EWB), which measures the horizontal dimension of well-being in relation to life purpose and life
known as Quest. These researchers consider Quest to be a more mature and flexible religious outlook than the intrinsic and extrinsic orientations. Actu- ally, Allport recognized the elements inherent to this orientation but failed to incorporate them in his In- trinsic scale. Religion as Quest is characterized by complexity, doubt, and tentativeness as ways of be- ing religious. Examples of the kinds of items on the 12-item Quest scale are as follows:
• My life experiences have led me to reconsider my religious convictions.
• I find religious doubts upsetting. (reverse scored)
• As I grow and mature, I expect my religious beliefs to change.
• Questions are more important to my religious faith than answers.
Items are scored on the same nine-point con- tinuum from (1) strongly disagree to (9) strongly agree. Results are reported as an average rating. Research with 424 undergraduates interested in re- ligion indicates that Quest is, indeed, a dimension of religious experience independent from both In- trinsic and Extrinsic orientations. Whereas Intrinsic and Extrinsic scores correlated .72, Quest revealed negligible relationships with both scales (–.05 with Intrinsic and .16 with Extrinsic).
But exactly what does the Quest scale measure? The intention of its authors was that it assess “the degree to which an individual’s religion involves an open-ended, responsive dialogue with existential ques- tions raised by the contradictions and tragedies of life” (Bateson et al., 1993, p. 169). The three components of the Quest orientation are (1) readiness to face ex- istential questions without reducing their complexity, (2) self-criticism and perception of religious doubts as positive, and (3) openness to change. But critics have charged that the scale may not measure anything re- ligious at all, that instead it may assess agnosticism, anti-orthodoxy, religious doubt, or religious conflict.
In response to these criticisms, Batson et al. (1993) note the following:
• Students at Princeton Theological Seminary scored significantly higher (p < .001) on the Quest scale (mean of 6.7) than undergradu- ates at the same institution (mean of 5.2). This
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 381
The 35-item ASPIRES scale measures two dimensions, spiritual transcendence and religious sentiments. Spiritual transcendence is further sub- divided into three facets: prayer fulfillment, uni- versality, and connectedness. Religious sentiments consists of two facets: religious involvements, and religious crisis. The overall structure of the ASPIRES scale, with descriptions of dimensions and facets, is shown in Table 9.6. Items resemble I find a sense of peace in the quiet of my prayers, and I follow the pre- cepts of an organized faith. Responses are provided on a 5-point Likert scale (strongly agree, agree, neu- tral, disagree, strongly disagree).
The ASPIRES scale demonstrates strong psychometric qualities. Alpha reliabilities for the facet scales range from .60 (CN) to .95 (PF) with a mean alpha of .82 (Piedmont, 2010). The normative sample consists of nearly 3,000 individuals, ages 17 to 94, from four geographic areas of the Midwest- ern and East Coast regions of the United States. The STS portion of the scale correlates with religious and spiritual variables and incrementally predicts (above and beyond the Big Five dimensions) relevant outcomes such as social support and prosocial be- havior ( Piedmont, 1999, 2001). The test holds up well cross- culturally, revealing a robust factor structure
satisfaction. Each subscale consists of 10 items that are scored from 1 (strongly disagree) to 6 (strongly agree). The items from the two subscales are com- bined on the SWB Scale, with odd-numbered items assessing religious well-being and even-numbered items assessing existential well-being. Some items are worded negatively; these are reverse scored so that a higher score always indicates greater well- being. Examples of SWB-like items include My re- lationship with God helps me through hard times and Life is inherently without meaning (Reverse scored).
The assessment of spirituality and religious sentiments (asPires) scale
The Assessment of Spirituality and Religious Sen- timents (ASPIRES) scale is a recent and promising measure of spiritual and religious variables (Pied- mont, 2010). What makes the test unique is its predictive power above and beyond the Big Five per- sonality factors. In other words, ASPIRES represents an extension of these well established components into a sixth dimension of personality (Piedmont, 1999). The scale also is robust across cultures and useful within nonreligious samples, including ag- nostics and atheists.
TaBle 9.6 Structure and Description of the ASPIRES Scale (Piedmont, 2010)
Scale or Facet Name Measure of
Spiritual Transcendence Scale (STS) The motivational capacity to create a broad sense of personal meaning for one’s life
Prayer Fulfillment (PF) Facet The ability to create a personal space that enables one to feel a positive connection to some larger reality
Universality (UN) Facet The belief in a larger meaning and purpose to life
Connectedness (CN) Facet Feelings of belonging and responsibility to a larger human reality that cuts across generations and groups
Religious Sentiments Scale (RSS) The extent to which an individual is involved in and committed to the precepts, teachings, and practices of a specific religious tradition
Religious Involvements (RI) Facet How actively involved a person is in performing various religious rituals and activities
Religious Crisis (RC) Facet Extent to which a person may be experiencing problems, difficulties, or conflicts with the God of their understanding
Source: Reprinted with permission from Brown, I. T., Chen, T., Gehlert, N. C., & Piedmont, R. L. (2012, October 8). Age and gender
effects on the Assessment of Spirituality and Religious Sentiments (ASPIRES) Scale: A cross-sectional analysis. Psychology of Religion
and Spirituality, online publication.
382 Chapter 9 • Evaluation of Normality and Individual Strengths
of spiritual maturity ever conceived. The Faith Maturity Scale (FMS) arose as a practical tool to serve three research purposes:
1. Provide baseline data on the vitality of faith in mainstream Protestant congregations
2. Identify the contributions of demographic, personal, and congregational variables to faith development
3. Furnish a criterion variable for evaluating the impact of religious education in mainstream denominations
The development of the scale was a time- consuming and careful process that began with a working definition:
Faith maturity is the degree to which a per- son embodies the priorities, commitments, and perspectives characteristic of vibrant and life-transforming faith, as they have been un- derstood in “mainline” Protestant traditions. (Benson, Donahue, & Erickson, 1993, p. 3)
Using open-ended questionnaires with a con- venience sample of 410 mainline Protestant adults, the test developers next identified eight core dimen- sions of faith maturity. Three advisory panels pro- vided ongoing counsel during this stage and the next phase of item writing. These interactions assured that the scale possessed face and content validity.
The resulting FMS is a 38-item test that em- bodies key indicators of faith maturity in eight core areas (Table 9.7). Items are answered on a seven- point scale from 1 = never true to 7 = always true. Based upon the areas assessed, the reader will notice that right belief is only one aspect of a mature faith. In large measure, faith maturity is defined by value and behavioral consequences. As the authors note, the Faith Maturity Scale “parts company with more traditional ways of defining and measuring personal religion.” Yet it does embody the kinds of behav- iors and attitudes that derive from a dynamic, life- transforming faith. These behaviors and attitudes are consistent with the theology found in most re- ligious traditions but are especially pertinent for the particular purpose of assessing faith maturity in the Protestant context.
in diverse religious groups and cultures (Nelson & Piedmont, 2008; Piedmont, Werdel, & Fernando, 2009). The STS component of ASPIRES yields incre- mental validity in the prediction of treatment out- come in spiritually based programs for alcohol and drug abuse (Piedmont, 2004). These findings further support the validity of ASPIRES and also uphold the contention that spirituality supplements the Big Five personality dimensions.
In later writings, Ellison described the SWB Scale as a measure of psychospiritual personality in- tegration and resultant well-being (Ellison & Smith, 1991). According to this view, well-being consists of “the integral experience of a person who is func- tioning as God intended, in consonant relationship with Him, with others, and within one’s self” (p. 36). This is the biblical notion of shalom, which denotes being harmoniously at peace within and without. If this conceptualization is correct, healthy spirituality as measured by the SWB Scale should show positive relationships with independent measures of health and subjective well-being. Literally dozens of studies have investigated this broad-range hypothesis, with generally positive findings.
The one identified shortcoming of the SWB Scale is an apparent low ceiling, especially in religious samples. Ledbetter, Smith, Vosler-Hunter, and Fischer (1991) caution that the clinical usefulness of the scale is limited to low scores (since high- functioning religious persons tend to “top out” on the scale). They also offer suggestions for revision (e.g., rewording items in more extreme directions) toward the goal of increasing the ceiling level of the SWB Scale. Bufford, Paloutzian, and Ellison (1991) have published norms for the test but caution that in many religious samples the typical individual receives the maximum score. This would indicate that the scale is helpful in research but is not useful for distinguishing among individuals with high levels of spiritual well-being.
The faith maturity scale
In 1987, six major Protestant denominations un- dertook a national four-year study of personal faith, denominational allegiance, and their determinants (Benson, Donahue, & Erickson, 1993). Funded in part by the Lilly Endowment, this project spawned what is undoubtedly the most sophisticated measure
www.ebook3000.com
Topic 9A • Assessment Within the Normal Spectrum 383
The FMS is scored as the mean of the 38 items, which yields a potential range of 1 to 7. The average score for 3,040 adults in five Protestant denominations was 4.63, which indicates that the in- strument avoids the “ceiling effect” found on other scales such as the Spiritual Well-Being Scale, dis- cussed previously. The estimated reliability of the scale is very robust across age, gender, occupation, and denomination, with typical coefficient alphas of .88 (Benson et al., 1993). Test–retest reliability was not reported.
The validity of the scale is supported by several lines of evidence, beginning with the careful ap- proach to item selection, by which face validity and content validity were built-in. Construct validity was demonstrated in several ways. First, it was predicted and confirmed that groups presumed to differ in levels of faith maturity would obtain significantly different mean scores on the FMS. Indeed, pastors scored the highest (5.3), followed by church educa- tion coordinators (4.9), teachers (4.7), adults (4.6), and youth (4.1)—each group in respective order scoring significantly lower than the others. Second, pastors’ ratings of the faith maturity of 123 congre- gation members on a 1 to 10 scale correlated very substantially (r = .61) with the FMS scores of these persons, indicating a correspondence between in- dependent expert ratings and self-report. The scale also revealed predictive utility. Specifically, FMS scale scores were strongly related to a variety of pro- social behaviors such as donating time to help those who are poor, hungry, or sick; promoting a greater role for women in the church; and endorsing the use of foreign policy to challenge apartheid.
TaBle 9.7 The Eight Core Dimensions and Sample Items from the Faith Maturity Scale
A. Trusts and believes (5 items)
Every day I see evidence that God is at work in the world
B. Experiences the fruits of faith (5 items)
I feel weighed down by all my responsibilities (reverse scored)
C. Integrates faith and life (5 items)
My faith influences how I think and act every day
D. Seeks spiritual growth (4 items)
I take time to meditate or pray
E. Experiences and nurtures faith in community
(4 items)
I talk with others about my faith
F. Holds life-affirming values (6 items)
I tend to be critical of other persons (reverse scored)
G. Advocates social change (4 items)
I believe the churches of this nation should get involved in political issues
H. Acts and serves (5 items)
I offer significant amounts of time to help others
Note: The sample items are similar to those on the Faith
Maturity Scale.
Source: Based on Benson, P., Donahue, M., & Erickson,
J. (1993). The Faith Maturity Scale: Conceptualization,
measurement, and empirical validation. In M. L. Lynn & D. O.
Moberg (Eds.), Research in the social scientific study of religion
(vol. 5). Greenwich, CT: JAI Press.
384 Chapter 9 • Evaluation of Normality and Individual Strengths
the present). At the individual level, it is about positive individual traits: the capacity for love and vocation, courage, interpersonal skill, aes- thetic sensibility, perseverance, forgiveness, originality, future-mindedness, spirituality, high talent, and wisdom. (Seligman & Csikszentmihalyi, 2000, p. 5)
Also included in positive psychology are civic virtues such as altruism, tolerance, and work ethic.
W ith few exceptions, clinical psychology since World War II has focused on what is wrong with people and how to allevi-
ate or diminish a host of symptoms and syndromes. Research abounds on the assessment and treatment of anxiety, depression, serious mental illnesses, de- mentia, marital discord, drug abuse, mental retar- dation, and brain damage, to name a few areas of significant inquiry.
There is nothing inherently wrong with this extensive body of research on psychopathology. In fact, huge strides have been made in the under- standing and treatment of many conditions that entail serious and crippling emotional pain or other forms of disability. Even so, this one-sided empha- sis from the perspective of disease and repair has led to a relative void of positive perspectives. Consider the results of Table 9.8, which compiles the number of PsychINFO listings conjured up for a variety of terms, some pathological and some positive. The reader will notice that pathological concepts like Depression or Dementia are 50 to 100 times more likely to be the topic of inquiry than positive con- cepts like Resilience or Gratitude.
In recent years, a movement known as positive psychology has emerged to redress this imbalance. A simple definition of positive psychology is the sci- entific and practical pursuit of optimal human func- tioning (Lopez & Snyder, 2003). One of the founders of the movement, Martin Seligman, provides a de- tailed perspective on the movement:
The field of positive psychology at the sub- jective level is about valued subjective ex- periences: well-being, contentment, and satisfaction (in the past); hope and optimism (for the future); and flow and happiness (in
TOPIC 9B Positive Psychological Assessment
Assessment of Creativity
Measures of Emotional Intelligence
Assessment of Optimism
Assessment of Gratitude
Sense of Humor: Self-Report Measures
TaBle 9.8 Number of PsychINFO Listings for a Sampling of Pathological and Positive Terms
Pathological Term Number of Listings
Depression 130,033
Abuse 106,772
Anxiety 113,316
Schizophrenia 74,979
Brain damage 70,235
Addiction 51,969
Mental retardation 39,660
Dementia 29,860
Positive Term Number of Listings
Resilience 5,668
Optimism 4,784
Wisdom 4,712
Altruism 3,502
Genius 1,818
Courage 1,740
Forgiveness 1,667
Gratitude 751
www.ebook3000.com
Topic 9B • Positive Psychological Assessment 385
intelligence or personality—where a few instruments have risen to the top and dominate the field—in the field of creativity there are no acknowledged “gold standards” for assessment. In part, this is because of the criterion problem—the difficulty in defining cre- ativity. Thus, we begin with a foundational question: What is creativity?
Psychologists have sought to understand cre- ativity since at least the early 1900s. For example, John B. Watson, the famous American behaviorist, suggested simplistically that a poem or brilliant essay is the mere product of shifting words around until a new pattern is hit upon (Watson, 1928). Fortunately, Watson’s simplistic views were followed by a large number of more thoughtful formulations. We have quoted below a few perspectives on creativity from eminent researchers:
• If a response is to be called original, it must be to some extent adaptive to reality (Barron, 1955, p. 553).
• We may proceed to define the creative think- ing process as the forming of associative ele- ments into new combinations that either meet specified requirements or are in some way useful (Mednick, 1962, p. 221).
• Creativity can be regarded as the quality of products or responses judged to be creative by appropriate observers, and it can also be re- garded as the process by which something so judged is produced (Amabile, 1983, p. 31).
• Creativity involves bringing something into being that is original (new, unusual, novel, unexpected) and also valuable (useful, good, adaptive, appropriate) (Ochse, 1990, p. 2).
• Creativity is the ability to produce work that is both novel (i.e., original, unexpected) and ap- propriate (i.e., useful, adaptive concerning task constraints) (Sternberg & Lubart, 1999, p. 3).
• Creativity is a specific capacity to not only solve problems but to solve them originally and adaptively (Feist & Barron, 2003, p. 63).
• Creativity is the ability to come up with ideas or artifacts that are new, surprising, and valu- able (Boden, 2004, p. 1).
These conceptual definitions emphasize nov- elty and usefulness of the creative product, but also
In sum, positive psychology is a broad movement linked by the focus on life-affirming concepts. The goal is to bring balance to psychology by helping to build human strengths.
An important element of this movement is positive psychological assessment, which can be defined as the measurement of specific human strengths such as those mentioned above. After all, if a psychological movement proposes to increase human strengths and virtues, it is also obligated to develop measurement approaches for purposes of research and assessment. In recent years, psycholo- gists have paid increasing attention to positive forms of assessment, resulting in dozens of new instru- ments and approaches. In their path-breaking edited book on positive psychological assessment, Lopez and Snyder (2003) compiled 24 chapters, each de- tailing several instruments. In other words, there are now hundreds of instruments available for positive psychological assessment. Some of the constructs measured with psychological tests include hope, emotional intelligence, optimism, romantic love, empathy, forgiveness, gratitude, and wisdom-related performance, to name just a few.
A comprehensive review of positive psycho- logical assessment would entail a textbook in its own right (if not several). The best we can do here is focus on a few key areas of assessment with a small number of tests that illustrate important or interest- ing approaches to positive psychological assessment. In particular, we will review issues involved in the assessment of creativity, emotional intelligence, op- timism, hope, forgiveness, and gratitude.
assessmenT of creaTiviTy
The topic of creativity has fascinated and yet also vexed psychologists and educators for more than a century. Researchers are beginning to understand fundamental elements common to many forms of creativity, yet, a simple definition of creativity remains elusive, and its assessment continues to be problematic. It is no exaggeration to state that hundreds of tests of creativity have been published. Some of these instruments possess respectable psy- chometric qualities, but most are of questionable validity. Unlike other fields of assessment such as
386 Chapter 9 • Evaluation of Normality and Individual Strengths
domains typically in the range of 5 to 10 ( Carson, Peterson, & Higgins, 2005; Kaufman, Cole, & Baer, 2009; Ivcevic & Mayer, 2009). The study by Kaufman (2012) is representative, and we provide modest details here. His investigation was based on the common sense view that layperson perceptions of constructs like intelligence, wisdom, personality, or creativity, when analyzed collectively, embody some degree of practical wisdom (Sternberg, 1985). Participants were 2,318 college students asked to rate an initial collection of 94 items as follows:
Compared to people of approximately your age and life experience, how creative would you rate yourself for each of the following acts? For acts that you have not specifically done, estimate your creative potential based on your performance on similar tasks (Kaufman, 2012, p. 300).
Students rated themselves on a 5-point Likert scale from 1 (much less creative) to 5 (much more creative) on each item. The items were gleaned from several prior research projects. The 94 items coalesced into five factors (from factor analysis), which provided a basis for reducing the scale to 50 items organized into 5 domains of about 10 items each. The emer- gent domains were the following:
• Self/Everyday: Successfully dealing with prob- lems in self and others, teaching creatively. Items resemble Helping friends deal with dif- ficult problems.
• Scholarly: Effectively analyzing problems and coming up with new and creative ideas. Items resemble Finding a new way to think about old problems.
• Performance: Successfully composing lyr- ics and singing a new song in public. Items resemble Making up lyrics and melody for an amusing song.
• Mechanical/Scientific: Efficiently solving a scientific or mechanical problem. Items re- semble Designing and conducting a scientific experiment.
• Artistic: Productively drawing or painting a landscape or still life. Items resemble Crafting a sculpture or piece of pottery.
suggest that creativity is a particular kind of process as well. On these elements, there is broad agree- ment in the field of creativity research. However, going from conceptual definitions to operational definitions has proved to be difficult, to say the least. Prentky (2001) notes that “what creativity is, and what it is not, hangs as the mythical albatross around the neck of scientific research on creativity” (p. 97).
Relevant to assessment, one controversy over- shadows the study of creativity. This is the question whether creativity is general or domain-specific in nature. Kaufman and Baer (2004) articulate the con- cern as follows:
Is there perhaps something we might label c, analogous to the g of intelligence, that tran- scends domains and enhances the creativ- ity of a person in all fields of endeavor? And does it make sense to call someone “creative,” or should attributions of creativity always be qualified in some way (e.g., “a creative story- teller” or “a creative mathematician,” but not “a creative person”)? (p. 4).
In their lengthy review chapter, Kaufman and Baer (2004) acknowledge the complexity of the spe- cific versus general debate, noting that the answer hinges on the definition of creativity and the assess- ment methods employed. But they also render a fi- nal conclusion that that the evidence for c (general creativity) is weak. We agree with their verdict that creativity appears to be domain-specific.
What, then, is the best way to partition the do- mains of creativity? One answer might be to claim that there are as many domains of creativity as there are fields of inquiry or expression, whether in sci- ence, art, economics, service, leadership, entrepre- neurship, or whatever. But this anarchical response rings hollow. People who are creative in one field typically reveal talent in closely allied fields as well. Gifted writers usually can be good poets, if they choose, and vice versa. A creative scientist might ex- cel at mechanical problem-solving as well. The num- ber of domains must be somewhere between huge (nearly infinite), and small (a handful). But creativ- ity is not a single general factor.
Several investigators have derived empiri- cal classifications of creativity, with the number of
www.ebook3000.com
Topic 9B • Positive Psychological Assessment 387
verbal associations. The RAT is a timed, 40- minute paper-and-pencil test with inter item reliability con- sistently above .90. (Mednick & Mednick, 1966). Some examples of the kinds of items encountered on the RAT:
rat–blue–cottage __________________
out–dog–cat __________________
wheel–electric–high __________________
surprise–line–birthday __________________
For each triplet, the examinee must find a fourth word that “fits” in the sense of having rea- sonable (but often remote) associations to the other three words. (The correct answers above are cheese, house, wire, and party.) Competent performance on this test would appear to require a capacity to exam- ine several novel or remote associations at the same time and to search for the one association that is common to all three stimulus words.
Validity studies of the RAT have been mixed in outcome. Early studies were promising and in- dicated that high RAT-scorers tended to receive higher ratings for the creativity of their products (e.g., architectural designs, research projects, sugges- tions, and drawings) than low scorers (Mednick & Mednick, 1966). One early study showed that high RAT-scoring scientists tended to write more re- search proposals, to win more research grants, and to win bigger grants than lower scorers (Gordon & Charanian, 1964). However, later studies indicated complex patterns between RAT scores and other creativity indices. For example, Andrews (1975) found that RAT scores predicted the innovativeness of research for medical sociologists only for a small subsample of the respondents whose environment provided certain “prerequisites” for achieving pay- off from creative ability. Specifically, among those researchers who were responsible for initiating new activities, who hired their own research assistants, who had stable employment and low interference from superiors, the correlation between RAT scores and innovativeness of research was a healthy +.55. But these researchers constituted less than a fourth of the sample; for the remainder of the subjects there was no relationship between the RAT and creativity. These complex and contradictory findings are typi- cal of research on the assessment of creativity.
The new instrument, called the Kaufman Domains of Creativity Scale (K-DOCS), demonstrated strong psychometric qualities, with internal consistency co- efficients of .83 to .86 and test–retest reliabilities (132 participants retested after two weeks) of .78 to .86. In addition to finding a clear-cut five-factor structure for the test, additional evidence of validity was found in the domain scale correlations with Big Five personal- ity dimensions, which were theoretical sensible, for example, Openness to Experience correlated signifi- cantly with all creativity domains except Mechanical/ Scientific (Kaufman, 2012).
We turn now to a brief discussion of instru- ments for the assessment of creativity. Over the years, creativity has been studied in terms of cogni- tive processes, personal characteristics, and behav- ioral products (Batey & Furnham, 2006). We will review these approaches in turn and examine the assessment methods that each has spawned.
creativity as Process
Several theorists and researchers have focused on underlying cognitive processes in their understand- ing of creativity. Of historical interest is Wert- heimer’s (1945) suggestion that creativity arises when the thinker grasps the essential features of a problem and their relation to a final solution—the so-called “aha!” phenomenon. Wallas (1926) theo- rized that such insights often occur after a period of incubation wherein the unconscious mind rear- ranges the features of the puzzle even while the con- scious mind takes “time off” from the problem.
Mednick (1962) proposed that creativity is the capacity to combine remote associations. According to this view, creativity is a matter of novel arrange- ments of unusual associations to a given stimulus. Consider the invention of the grain reaper by McCormick, based on the association between grain and hair (Weber, 1969). It occurred to the inventor that grain is like the hair on a person’s head. Since mechanical clippers are used to cut hair, something like hair clippers could be used to cut grain. We see in this example how a creative invention was devel- oped from a remote association.
Based on his process-oriented view of creativ- ity, Mednick (1962) developed the Remote Associ- ates Test (RAT), a clever index of the remoteness of
388 Chapter 9 • Evaluation of Normality and Individual Strengths
and sensitive, but also embrace negative terms such as argumentative, cynical, egotistical, impulsive, re- bellious, and unconventional. These qualities fit well with the observation of Feist (1999):
One of the most distinguishing characteristics of creative people is their desire and prefer- ence to be somewhat removed from regular social-contact, to spend time alone working on their craft . . . to be autonomous and inde- pendent of the influence of a group. (p. 158)
In addition to the broad generalizations noted above, the particular link between personality char- acteristics and creative behavior also depends on the specific domain of investigation. For example, compared with their less creative counterparts, cre- ative artists tend to be more spontaneous, creative writers tend to be more nonconforming, creative architects tend to be less flexible, and creative engi- neers tend to be better adjusted than other groups (Piirto, 1998). In attempting to predict creative be- havior from personality characteristics, one creative personality type may not fit all creative occupations (Kerr & Gagliardi, 2003). Batey and Furnham (2006) provide an excellent review of the complex literature on creativity and personality.
Recently, Sternberg (2002) has proposed that creative individuals are distinguished not so much by specific traits as by the heartfelt decision to be creative:
I believe that, although creative people differ in an astonishing number of ways, there is, in fact, one key attribute that they all possess. . . . This attribute is the decision to be creative. People who create decide that they will forge their own path and follow it, for better or for worse. The path is a difficult one because people who defy convention often are not re- warded. (p. 376)
This perspective suggests that creative individ- uals will be characterized by a stubborn dedication to their creative endeavors, even when rewards for their activities seem to be lacking.
The opinion that creativity resides within qualities of the person continues to be popular.
Ochse (1990) provides a thorough appraisal of RAT validity. He concludes that the test may pre- dict scores on tests of verbal fluency, but fails to pre- dict creativity in general. In other words, the RAT is not so much a general measure of creativity as a specialized measure of verbal intelligence. Recently, Bowden and Jung-Beeman (2003) published extensive normative data for RAT-type items. Based on 289 university students, their normative data consists of percentage correct for 144 items under four time limits (2, 7, 15, and 30 seconds). They recommend using these normative data to investigate process factors such as incubation, the impact of hints, and techniques to facilitate problem solving.
creativity as Personal characteristics
Guilford (1950) was one of the first researchers to define creativity in terms of the person when he as- serted that “creativity refers to the abilities that are most characteristic of creative people.” His pro- nouncement helped inspire an expansion of research on the personal characteristics of creative persons. Much of this research has relied upon contrasts of peer nominated high- and low-creative persons in various professions (Barron, 1968; Martindale, 1981). In this methodology, colleagues within a field of study nominate other individuals who are high and low in creativity, and their consensus view is used to identify two select groups of individu- als (high-creative, low-creative). These groups are then contrasted on personality measures, including self-checked adjectives and standard personality inventories.
Based on hundreds of studies, a fairly stable set of core characteristics of creative persons has emerged (Barron & Harrington, 1981; Dellas & Gaier, 1970). Interestingly, the distinguishing char- acteristics of creative individuals appear to be largely temperamental, although a certain minimum level of intelligence also is required. Harrington (1975) has captured a not altogether flattering portrait of the creative person in his Composite Creative Per- sonality Scale, which consists of 42 self-checked adjectives (from a larger list) that empirically distin- guish creative from noncreative persons. These ad- jectives include many positive terms such as active, curious, imaginative, inventive, original, resourceful,
www.ebook3000.com
Topic 9B • Positive Psychological Assessment 389
• Transcendence of constraints: A product tran- scends constraints when it goes beyond the traditional.
• Coalescence of meaning: The value of creative products may not be apparent at first, the full significance may only be appreciated with time.
The Jackson and Messick (1968) criteria have proved helpful in delineating the special characteris- tics of a creative outcome, but they do not constitute a psychological measure of creativity. For measures of creativity based on the product-oriented ap- proach, we must examine the seminal studies of Joy Paul Guilford and the various tests inspired by his factor-analytic research.
As the reader will recall from an earlier chap- ter, Guilford (1959, 1985) formulated a structure of
From this perspective, self-report measures are the natural and preferred assessment method (Silvia, Wigert, Reiter-Palmon, & Kaufman, 2012). Table 9.9 summarizes a few promising instruments.
creativity as Product
The most enduring definitions of creativity have used the product as the distinguishing sign of this capacity. According to this approach, creative persons create products (ideas, inventions, writings, artistic outputs, etc.) that meet certain criteria. For example, Jackson and Messick (1968) applied four criteria to creativity:
• Novelty: Creative products are new, or at least represent a new application of the familiar.
• Appropriateness: The product must be appro- priate to the context, not merely novel.
TaBle 9.9 Self-Report Measures of Creativity
Biographical Inventory of Creative Behaviors (BICB) (Batey, 2007)
Based on the implicit assumption that creativity is a general attribute, the BICB consists of 34-items rated yes/no by the respondent. Items consist of behaviorally anchored creative accomplishments “actively involved in” over the last 12 months. Results range from 0 to 34, yielding a single overall score without subscales. Higher scores indicate greater creativity. Domain coverage is broad. Items resemble written a poem, painted a picture, devised a recipe, coached a team, held an office. The scale possesses good internal consistency (a = .74) and correlates appropriately with other measures of creativity (Furnham, Batey, Anand, & Manfield, 2008).
Creative Achievement Questionnaire (CAQ) (Carson, Peterson, & Higgins, 2005)
Innovative in its measurement approach, the CAQ assesses creativity in 10 domains: Visual Arts, Music, Dance, Architectural Design, Creative Writing, Humor, Inventions, and Scientific Discovery. Although an overall score can be obtained, the implicit assumption of the test is that creativity is domain specific. Hence, a high score in one domain is sufficient to demonstrate creativity. Each domain consists of eight items, numbered 0 through 7, representing increasing levels of creative achievement. Most items are binary, but higher numbered items in each domain require a numerical entry. For example, item 7 in Creative Writing might request the number of stories published in literary sources. The entry for this item (for example, “3”) is multiplied by the item number to obtain the score (7 × 3 = 21). This inventive scoring approach allows for the detection of persons with exceptional creativity in one or more domains.
Revised Creative Domain Questionnaire (CDQ-R) (Kaufman, Cole, & Baer, 2009)
Simple but effective in its format, the CDQ-R consists of 21 items in four domains: Drama (e.g., acting, dancing, writing), Math/science (e.g., chemistry, logic, computers), Arts (e.g., crafts, design, painting), and interaction (e.g., leadership, selling, teaching). Respondents are asked to self-rate their creativity in each activity. Items are completed on a six-point scale (no midpoint) ranging from Not at all creative to Extremely creative. The four domain scores are averaged to obtain an overall creativity score. The scale possesses reasonable reliability, with internal consistencies of.71 to .76. for the domains and .82 for the overall scale. Unlike measures of creative accomplishments which are typically skewed, the four domain scores and the overall score reveal approximately normal distributions. Regarding validity, the CDQ-R domain scores reveal theoretically appropriate correlations with Big Five personality dimensions (e.g., Openness to Experience correlates with all four domains; Extraversion correlates with Drama but not Math/Science).
390 Chapter 9 • Evaluation of Normality and Individual Strengths
theories and contributions were highly influential in the field of creativity studies. In particular, Guil- ford’s influence is found in the work of E. Paul Tor- rance (1915–2003), who developed a group of tests still in use today.
The Torrance Tests of Creative Thinking (TTCT) (Kim, 2006; Torrance, 1966) are based loosely on Guilford’s model, although Torrance was more concerned with the interest level of his mea- sures than with their factorial purity. These tests purport to assess a global cognitive construct of cre- ativity—a style of thinking believed to be essential to creative achievements. The TTCT subtests do not assess motivation, expertise, intelligence, or other capacities that could contribute to creative produc- tivity. The test comes in two parallel forms, A and B, which are highly comparable. The comments below refer to both forms.
The TTCT consists of two parts: The TTCT- Verbal and the TTCT-Figural. Suitable for ages 6 through 18 and beyond, the TTCT-Verbal contains six subtests:
Asking Questions
Guessing Causes
Guessing Consequences
Product Improvement
Unusual Uses
Just Suppose
The first three verbal subtests are based on the same stimulus card which shows a simple pen and ink drawing of one or two human-like figures en- gaged in ambiguous activity. A TTCT-like drawing is shown in Figure 9.1. In the first activity, Asking Questions, the child is encouraged to ask questions about the picture. In the second activity, Guessing Causes, the child is told to guess the causes of the action in the picture. In the third activity, Guessing Consequences, the child is instructed to speculate about the immediate and long-term consequences. The time limit for each activity is five minutes.
In the fourth activity of the Verbal subtests, Product Improvement, the task is to suggest improve- ments to a toy that would make it more appealing to children. For example, the child might be shown a picture of a stuffed rabbit and asked to think of ways
intellect model that parceled intelligence into 150 factors aligned upon three dimensions: operations, constructs, and products. One of the operations that emerged from Guilford’s factor analyses was diver-
gent thinking:
Divergent thinking is defined as the kind that goes off in different directions. It makes pos- sible changes of direction in problem solving and also leads to a diversity of answers, where more than one answer may be acceptable. (Guilford, 1959)
Divergent thinking is virtually the opposite of convergent thinking. Convergent thinking is the production of a single correct answer determined by facts and reason. Western civilization places such a heavy emphasis on convergent thinking that we are inclined to dismiss the value of divergent think- ing, even to mock it as undisciplined and, therefore, unproductive. But divergent thinking is essential to creative discovery. Unconstrained, freewheeling thought is the hallmark of the creative person. Tests of divergent thinking are therefore considered excel- lent measures of creativity.
Guilford and his colleagues developed about a dozen experimental measures of divergent thinking (Guilford & Hoepfner, 1971), some of which were subsequently standardized and published as the Christensen-Guilford Fluency Tests. Subtests and items similar to his measures include:
• Alternate Uses: List possible but unusual uses for a common object such as a brick (use it as a door stop, hammer, anchor, or wheel stop)
• Consequences: List possible consequences of a specific hypothetical event, for example, “What would happen if clouds had strings hanging down from them?” (macramé would make a comeback, people would be whisked away, air travel would be hazardous, farmers could winch the clouds down for watering, etc.)
• Ideational Fluency: Name things that belong in a given class such as “Long, thin items” (hair, pin, wire, needle, snake, string, spaghetti, pulled taffy)
Although Guilford’s tests never received wide usage and eventually faded into obscurity, his
www.ebook3000.com
Topic 9B • Positive Psychological Assessment 391
The TTCT-Figural consists of three activities, which are suitable for ages 5 through 18 and beyond:
Picture Construction
Picture Completion
Repeated Figures
The time limit for each activity is 10 minutes. In the first activity, Picture Construction, the child draws a picture using a simple shape (jelly bean or pear) as a starting point. The stimulus shape must become an integral part of the constructed picture. In the second activity, Picture Completion, the exam- inee encounters 10 incomplete figures and is asked to complete a drawing from each and then to name each drawing. An example of a TTCT-like draw- ing (with completion and title) is shown in Figure 9.2. In the last activity, Repeated Figures, the child is provided two or three pages of repeated figures (e.g., circles) and asked to use them in constructing pictures that are then named. For example, the child might draw a rectangle encompassing six circles and name it “swiss cheese.”
Scoring of the TTCT-Figural subtests is based on five norm-referenced measures and 13 criterion- referenced outcomes. The five norm-referenced measures include:
1. Fluency—the raw number of stimuli provided; 2. Originality—the number of statistically infre-
quent drawings;
to change the toy so that others would have more fun playing with it. Unusual Uses, the fifth activity, is a familiar standby in creativity assessment, namely, thinking of unusual uses for a common object such as a brick. The final Verbal subtest is Just Suppose, which involves asking the examinee to list the prob- lems and benefits that might arise from an improb- able situation. For example, the child might be told “Just suppose that clouds had strings hanging down from them—what might be some problems or ben- efits of this situation?”
The verbal subtests are scored according to three criteria:
1. Fluency—the raw number of relevant ideas; 2. Originality—the inventiveness or creativity of
the ideas; 3. Flexibility—the flexibility of categories of
ideas.
Of course, the manual for the TTCT, which is periodically updated for normative data, provides sig- nificant guidance on scoring (Torrance, 1974, 1998).
figure 9.1 Example Stimulus Card Used for the First
Three TTCT-Verbal Subtests
Note: A stimulus card similar to the above is used for
the Asking Questions, Guessing Causes, and Guessing
Consequences subtests.
figure 9.2 Example TTCT-Figural Picture Completion
Drawing with Title
Note: This sample resembles one of the ten incomplete
figures used on the Picture Completion subtest.
392 Chapter 9 • Evaluation of Normality and Individual Strengths
grade-norms are available for more than 50,000 participants, kindergarten through high school. Ap- plications of the test are mainly with school-aged children, although norms are provided for adults as well (Kim, 2006).
comment on creativity Tests
Tests of creativity have served a useful function in highlighting the diversity of skills that make up the whole of intellectual ability. As a consequence of research on creativity, educators and psychologists now realize that an exclusive emphasis on “correct” thinking (i.e., convergent problem solving) is too narrow a focus for education and assessment alike. However, the validity of creativity tests is still an open question. One problem is that definitions of creativity (e.g., Jackson & Messick, 1968, above) do not lend themselves easily to psychometric measure- ment, that is, tests of creativity do not operationalize the construct of creativity very well (Chase, 1985). In part, the failure to operationalize creativity stems from the multifactorial nature of this puzzling abil- ity. Consider this observation: whereas a general fac- tor almost always can be extracted from intelligence and ability tests, it seems clear that there is no cor- responding factor in the realm of creativity. For ex- ample, a creative painter is unlikely to be a creative musician or a creative research scientist. Creativity is almost always specific to the realm in which it is identified. This specificity poses a difficult obstacle to general measures of creativity.
measures of emoTional inTelligence
In the history of psychology, emotions and intelli- gence generally have been viewed as distinct capaci- ties of the individual, each capable of influencing the other, but separate nonetheless. For example, Thomas Chalmers (1833) wrote an early chapter titled On the Connection between the Intellect and the Emotions. Chalmers was a Scottish church leader who catalogued the disruptive influence of emotions on clear thinking. In like manner, the American psy- chologist Henry H. Goddard (1919) proposed a sep- aration of the emotions and intelligence. He argued
3. Abstractness of Titles—the abstraction level of the titles;
4. Elaboration—the provision of details and elaboration;
5. Resistance to Premature Closure—the degree of openness for incomplete figures.
The 13 criterion-referenced measures include a variety of creative strengths expressed in the draw- ings such as emotional fluency, unusual visual per- spective, humor, colorful imagery, and fantasy.
Although scoring of the TTCT is tedious and elaborate—especially for the Figural subtests—ex- perienced testers produce interrater reliabilities in the .90s. Test–retest reliability coefficients are lower, in the range of .50 to .93 (Kim, 2006). Reliability data certainly are strong enough to support the use of the test for group testing and research purposes ( Trefflinger, 1985). However, making individual de- cisions (e.g., admission to special program for gifted children) solely on the basis of TTCT scores could be problematic.
The validity of the TTCT is a more compli- cated question, especially in light of the difficulty of defining the criterion—what is creativity? Yet, the instrument is reasonably predictive of later cre- ative accomplishments, even in the long run. For example, in a sample of 80 participants, the cor- relation between a TTCT creativity index derived from assessment in elementary years and the qual- ity of highest creative achievements in adulthood (40-year follow-up) was a healthy r = .43 (Cramond, Matthews-Morgan, Bandalos, & Zuo, 2005). In this study, the quality of creative achievements was rated blindly from autobiographical materials supplied by the research participants. The correlation, r = .43, was higher than the observed relationship between childhood IQ and adult creativity, r = .32. Creativ- ity as measured by the TTCT appears to be more predictive of certain forms of achievement than intelligence.
Overall, with its 50 years of research and strong psychometric properties, the TTCT is one of the best instruments for creativity assessment. The test has been translated into 35 languages and has spawned more research than any other measure in the field. Among its many strong features, age- and
www.ebook3000.com
Topic 9B • Positive Psychological Assessment 393
Mayer et al. (2008) propose that emotional intelli- gence is a third major subdivision that complements the traditional dichotomy of verbal and perceptual abilities.
To understand how emotional intelligence dif- fers from traditional forms of intelligence, imagine a situation in which you visit a close friend in the hos- pital. He has just emerged from emergency surgery after a serious head injury from a fall. He lies still in bed with his eyes closed. Standing around your friend are anxious family members and a stern-faced doctor. What would you do or say? Would you press forward to join the family members? Would you leave the room and return later? Would you hug or console others? Would you ask the doctor for an up- date? You will need to make these and many other choices in a matter of seconds. Adaptive function- ing in this complex situation would require you to manage your own emotions (maybe you feel strong relief that you are not the one in the hospital bed), understand the subtle emotional signals conveyed by others (perhaps the glassy stare of the sister indicates that you are not welcome at this time), use your emotions to facilitate thinking (maybe your anguish is so strong that you think it wise to remain quiet), and perceive emotions accurately in others (perhaps everyone is quiet because your friend has just drifted off to sleep). Successful navigation of this difficult and painful situation would require high levels of emotional intelligence.
Because of the subtlety and complexity of the construct, the assessment of emotional intelligence has proved challenging. However, with innovative forms of testing such as embodied in the MSCEIT or Mayer-Salovey-Caruso Emotional Intelligence Test (Mayer, Salovey, & Caruso, 2002), progress is be- ing made. This instrument consists of 141 items that yield a total emotional intelligence score as well as two Area scores, four Branch Scores, and eight Task scores. Table 9.10 provides a brief description of the test, which is designed for adults age 17 and older. Normative data are based on a sample of more than 5,000 individuals.
The overall score on the MSCEIT is called the Emotional Intelligence (EI) score. This score is normed to a mean of 100 and standard deviation of 15. The two Area scores (Experiential and Strategic)
that intelligence, properly exercised, can modify and influence emotions for the benefit of the individual.
The first person to hint at a possible union of emotional and intellectual factors was the eminent American psychologist E. L. Thorndike (1920). In a short essay published in Harper’s Magazine for a general audience, Thorndike spoke of three kinds of intelligence: abstract, mechanical, and social. The first two types are well known in assessment and have been validated repeatedly. However, the third kind of intelligence, social intelligence, has proved more elusive. Thorndike defined social intelligence as “the ability to understand and manage people.” An essential part of this ability is the accurate rec- ognition of emotions in others. Unfortunately, early attempts to measure social intelligence proved fruit- less (Thorndike & Stein, 1937). The concept gradu- ally fell out of favor.
Recently, the idea that emotions and intel- lect might constitute a single cluster of intertwined abilities has reemerged in the concept of emotional intelligence, as proposed by Mayer, Salovey, and col- leagues (Salovey & Mayer, 1989–90; Mayer, Salovey, & Caruso, 2008). The notion of emotional intelli- gence has been pursued by other researchers as well (discussed below); however, the Mayer-Salovey model boasts the strongest theoretical and empiri- cal underpinnings, so we begin with their approach. Mayer et al. (2008) define emotional intelligence as follows:
• Managing emotions so as to attain specific goals;
• Understanding emotions, emotional language, and the signals conveyed by emotions;
• Using emotions to facilitate thinking; and • Perceiving emotions accurately in oneself and
others. (p. 507)
These theorists propose that emotional intel- ligence is an instance of traditional intelligence, not something different from it. In other words, emo- tional intelligence (EI) is an important and over- looked subset of abilities that contribute to human efficiency and adaptation. Thus, just as prior re- searchers have documented verbal forms of intelli- gence (e.g., verbal comprehension) and perceptual forms of intelligence (e.g., perceptual reasoning)
394 Chapter 9 • Evaluation of Normality and Individual Strengths
following question, which resembles some found on the MSCEIT:
What emotion(s) might prove helpful to feel when talking with a police officer who has just stopped you for speeding?
Deference not helpful 1 . . . 2 . . . 3 . . . 4 . . . 5 . . . very helpful
Mild anxiety not helpful . . . 1 . . . 2 . . . 3 . . . 4 . . . 5 . . . very helpful
Surprise not helpful . . . 1 . . . 2 . . . 3 . . . 4 . . . 5 . . . very helpful
Irritation not helpful . . . 1 . . . 2 . . . 3 . . . 4 . . . 5 . . . very helpful
The authors of the MSCEIT propose two dif- ferent scoring methods: consensus scoring and
and the four Branch scores (Perceiving, Facilitating, Understanding, and Managing) likewise are normed to these traditional benchmarks. While scores are provided for the eight Tasks (see Table 9.10), the test developers caution against overinterpretation of these elemental scores because of their lower reliabil- ity. The overall EI score demonstrates strong inter- nal reliability, in the low .90s, whereas the reliability of the two Area scores is slightly lower and more variable, typically in the high .80s (Mayer, Salovey, & Caruso, 2002). Test–retest reliability of the overall score is respectable at .86 (Brackett & Mayer, 2003).
An interesting issue with tests of emotional intelligence like the MSCEIT is how to determine the correct answers. After all, the questions involve subtle emotional concepts, for which the “correct” responses are not necessarily obvious. Consider the
TaBle 9.10 Brief Description of the MSCEIT Tasks
EXPERIENTIAL AREA
Perceiving Branch
Faces: Identify from photographs of faces how each person feels on a 1 to 5 scale (e.g., 1 = no happiness, 5 = extreme happiness).
Pictures: Indicate the extent to which images and photographs express various emotions on a 1 to 5 scale (e.g., 1 = not at all, 5 = very much).
Facilitating Branch
Sensations: Compare different emotions to different sensations such as light, color, and temperature on a 1 to 5 scale (e.g., 1 = not at all, 5 = very much).
Facilitation: Specify how certain moods might assist in responding to social situations (e.g., 1 = not useful, 5 = useful).
STRATEGIC AREA
Understanding Branch
Blends: Indicate which emotion (from 5 choices) tends to occur in the presence of a described emotional situation.
Changes: Indicate which emotion (from 5 choices) tends to be the transition state from a described emotional starting point.
Managing Branch
Emotion Management: Rate the effectiveness of alternative actions in achieving a specified emotional state on a 1 to 5 scale (1 = very ineffective, 5 = very effective).
Emotional Relations: Evaluate the effectiveness of alternative actions in achieving a desired outcome involving other people on a 1 to 5 scale (1 = very ineffective, 5 = very effective).
www.ebook3000.com
Topic 9B • Positive Psychological Assessment 395
scores corresponding to low deviance—hold true even after the statistical control of intelligence and personality variables (Rubin, 1999; Trinidad & Johnson, 2002).
In spite of the supportive literature provided by proponents of EI measures, other reviewers maintain a cautious stance about the MSCEIT and similar tests. For example, in a comprehensive re- view of the psychometrics of emotional intelligence, Zeidner, Roberts, and Matthews (2008, p. 71) con- cluded that there has been “irrational enthusiasm surrounding the practical utility of emotional intel- ligence.” They note that evidence regarding the role of EI in occupational success is weak, based largely on anecdotal reports and popular sources like Daniel Goleman’s (1995) book, Emotional Intelligence: Why It Can Matter More than IQ.
Even the developers of the MSCEIT acknowl- edge the potential for misuse of their instrument. Mayer, Salovey, Caruso, and Sitarenios (2003, p. 104) state flatly that “the applied use of EI test- ing must proceed with great caution.” The growing trend to use these instruments in selection of em- ployees is, therefore, disquieting. As Conte (2005, p. 438) notes, managers and organizational leaders “should be wary of making this leap unless more rig- orous discriminant, predictive, and incremental va- lidity evidence for EI measures is shown.”
In addition to the MSCEIT, a few other mea- sures of emotional intelligence have gained recog- nition. One of these is the Emotional Competence Inventory (Sala, 2002), based on Goleman’s (1995) conception of emotional intelligence. The Emotional Competence Inventory (ECI) contains 110 items organized into four clusters: (1) Self- Awareness, (2) Social Awareness, (3) Self-Management, and (4) Social Skills. One appealing feature of this instru- ment is the 360-degree feedback that it yields. In this method, self-ratings, peer ratings, and supervisor ratings are reported separately for comparison and contrast. The ECI is used mainly in large corporate settings for formative evaluation of employees. The publishers have maintained tight proprietary con- trol over the test, which has limited independent re- search on its psychometric qualities.
Another widely used test is the Bar-On Emo- tional Quotient Inventory (Bar-On, 2000), which
expert scoring. In consensus scoring, the majority choices of the normative sample are used to iden- tify the correct options. For example, in the exam- ple above if 67% of the general population circled the number “1” for “irritation” (i.e., it is not help- ful), this answer would be coded as the correct al- ternative. Respondents would receive lower scores to the extent they deviated from this alternative. This method is also called general scoring because the ref- erence point is the general, normative sample.
The second approach, expert scoring, relies on the judgment of experts in the field of emotion to determine the correct options. In particular, the authors used 21 experts attending a conference of the International Society for Research on Emotion. Scoring for this approach relies on the consensus of these experts. Fortunately, the two scoring ap- proaches (general and expert) reveal a very high agreement, on the order of .96 to .98 (Mayer, Salovey, & Caruso, 2002).
The rationale for consensus scoring—whether based on the general population or experts—is that emotions and their expression possess an evolutionary and social basis. Emotions constitute a “signal system” that conveys important information to those around us. For example, the emotion of sadness signals loss and wanting to be comforted; the emotion of anger indicates the individual feels threatened and could respond forcefully; the emotion of happiness conveys an interest in joining others. Individuals who do not “read” emotions in a consensual manner likely will ex- perience difficulty in a broad range of social situations.
The validity of the MSCEIT has been in- vestigated from numerous perspectives, includ- ing factorial, discriminant, and predictive validity. Some results indicate that the instrument measures a unitary skill that can be subdivided into the four branches described above (Mayer, Salovey, Caruso, & Sitarenios, 2003). Further, EI as measured by the MSCEIT reveals generally low correlations with verbal intelligence, general intelligence, and ma- jor dimensions of personality, that is, the construct provides something that goes beyond established measures (Mayer, Salovey, & Caruso, 2004). EI is potentially useful because of its inverse relationship with deviant behaviors such as bullying, substance abuse, and violence. These relationships—high EI
396 Chapter 9 • Evaluation of Normality and Individual Strengths
physical health, although the differences for health are not substantial (Peterson, 2000). How these in- dividual differences arise in personal development is an important and intriguing question that we do not pursue here. Instead we focus on assessment issues, namely, how is optimism measured?
The most widely used instrument is the re- vised Life Orientation Test (LOT-R; Scheier, Carver, & Bridges, 1994). This is an intriguingly simple scale that consists of six scored items and four “filler” items (10 items total). Respondents indicate their extent of agreement with the items on a five-point Likert scale ranging from 1 or “strongly disagree” to 5 or “strongly agree.” Items similar to those found on the LOT-R include:
I h ave a positive outlook and expect the best in life
I d on’t expect good things to happen to me (reverse scored)
I enjoy my family life a great deal (filler)
Of course, negatively worded items are reverse scored. Responses on the six scored items are then summed to yield a total from 6 (highly pessimistic) to 30 (highly optimistic). Even though “pessimist” and “optimist” are categories in popular language, the LOT-R instead provides a score on a continuum, without strict cut-offs. In large samples of respon- dents, the score distribution tends to be skewed toward the optimistic side, but not excessively so (Carver & Scheier, 2003).
Although the theoretical basis for the LOT-R postulates an optimism-pessimism continuum, psychometric analyses by Herzberg, Glaesmer, and Hoyer (2006) with huge samples of adults (N = 46,133) reveal that the optimism and pessi- mism items on the test measure two independent constructs rather than a single, bipolar trait. This is a counterintuitive finding which suggests that optimism and pessimism are partly indepen- dent. Conceivably, an individual could earn high scores on both (or low scores on both), although these outcomes probably are rare. In practice, many researchers now report three scores from the LOT-R: an optimism score based on the posi- tively worded items, a pessimism score based on
is traditionally known by the acronym EQ-i. This 133-item self-report instrument yields an overall EQ score as well as five composite scores: (1) intraper- sonal, (2) interpersonal, (3) adaptability, (4) general mood, and (5) stress management. Reviewers of the EQ-i have noted that the theory behind the test is unclear (Matthews, Zeidner, & Roberts, 2002). Fur- ther, the test appears to overlap substantially with major personality constructs. For example, a corre- lation of r = –.77 with the anxiety scale from Cat- tell’s 16PF is reported (Newsome et al., 2000). The EQ-i appears to demonstrate strong reliability, with test–retest reliability of .85 after one month (Bar- On, 1997). What remains unclear is whether the test measures emotional intelligence as a construct, as it is understood by others (Conte, 2005).
assessmenT of oPTimism
Optimism is another fruitful area for psychometric research and assessment. Typically this construct is viewed as one end of a bipolar continuum, opti- mism–pessimism. The difference between the two ends of the spectrum is captured in the familiar ad- age about the glass of water that is half-full to the optimist and half-empty to the pessimist. Whether this bipolar depiction is an accurate portrayal of the underlying construct(s) is a topic we take up be- low. Nonetheless, it is certainly the starting point for many theorists and for the perceptions of the lay public as well. Carver and Scheier capture why this area of assessment is important: “Optimists are people who expect good things to happen to them; pessimists are people who expect bad things to hap- pen to them. Does this difference among people matter? It certainly does. Optimists and pessimists differ in several ways that have a big impact on their lives. They differ in how they approach prob- lems and challenges they encounter, and they dif- fer in the manner and the success with which they cope with life’s difficulties” (2003, p. 75). In short, optimism and pessimism have to do with people’s expectations for the future. Optimists expect a bet- ter future than pessimists and generally have more confidence in their ability to manage challenges when they arise. Generally, optimists fare better than pessimists in terms of personal adjustment and even
www.ebook3000.com
Topic 9B • Positive Psychological Assessment 397
delving further, difficulties arise. What constitutes a gift? What are the possible sources of a gift? Some gifts are obvious and not debatable, as when neigh- bors deliver a precooked meal to someone who is grieving a loss. Almost everyone would experience gratitude in this situation. But what about viewing a sunrise, taking a hot shower, or seeing a baby smile in the supermarket? Should we experience gratitude for these opportunities as well? In other words, does gratitude require a personal benefactor, or can it be expanded to the countless ways in which life pleas- antly surprises the mindful person?
Regardless of how it is conceptualized, grati- tude is universally recognized as a personal virtue because it promotes social cohesion and provides an inner buffer against the toil and pain of everyday life. In general, people with a grateful disposition ex- perience greater well-being than those without this asset (Emmons et al., 2003). The German-French theologian and physician Albert Schweitzer (1969), who founded a hospital in west central Africa and received the Nobel Peace Prize for his philosophy of “ Reverence for Life,” referred to gratitude as the “ secret of life” (p. 36). Truly, that is a strong state- ment! In general, gratitude has received less attention as a topic of measurement than it deserves. But re- cent efforts are beginning to redress this deficiency.
One such effort is the Gratitude Questionnaire- Six Item Form (GQ-6) developed by McCullough, Emmons, and Tsang, 2002. The GQ-6 is a simple self-report measure of the disposition to experience gratitude (Figure 9.3). The test consists of the six best items from a longer list of statements that ar- ticulate gratitude and appreciation.
The reader will notice that the GQ-6 is based on a Likert-type format with seven alternatives rang- ing from 1 (strongly disagree) to 7 (strong agree). Two items are stated in the reverse (and therefore reverse scored) as a way of inhibiting response bias. The development and choice of specific test items was based on a thorough analysis of the many facets of the grateful disposition (McCullough, Emmons, & Tsang, 2002). The authors determined that grati- tude reflects intensity (feeling more intensely grate- ful), frequency (feeling grateful many times a day), span (grateful for many things), and density (grate- ful to many individuals). Initially, they proposed 39
the negatively worded items, and a total score that combines the two.
An additional finding of the Herzberg et al. study (2006) is that the reliability of the instrument is low (Cronbach alphas of .71 for the Optimism items and .68 for the Pessimism items). Thus, the test is recommended for group research only; it is not suitable for clinical practice with individuals.
A substantial literature points to the general conclusion that LOT-R optimists fare much bet- ter than pessimists on a wide variety of outcome measures (Snyder & Lopez, 2007). For example, in a sample of 275 Japanese college students, LOT-R total scores correlated r = . 39 with social support, and r = −.26 with interpersonal conflict (Sumi, 2006). In a sample of 504 Australian high school students, LOT-R scores correlated r = .55 with self-esteem and r = −.38 with psychological distress (Creed, Patton, & Bartrum, 2002). In other words, for both studies LOT-R total scores modestly predicted good social adjustment.
Steptoe, Wright, Kunz-Ebrecht, and Iliffe (2006) investigated the relationship between LOT-R scores and numerous health behaviors in 128 community-dwelling seniors 65 to 80 years old. Dis- positional optimism as measured by the LOT-R total score was associated with many healthful behaviors, including moderate alcohol consumption, not smok- ing, brisk walking, and vigorous physical activities (women only). Self-rated health and physical health status both were associated with optimism, although the direction of influence would be difficult to deter- mine from this cross-sectional study. The full scale was more consistently associated with these positive relationships than either the optimism or pessimism subscales of the test. Carver and Scheier (2002) re- view additional external correlates of optimism as measured by the LOT-R.
assessmenT of graTiTude
As Emmons, McCullough, and Tsang (2003) ob- serve, gratitude is difficult to define. In part, this is because the concept can be viewed as an attitude, an emotion, a disposition, or a personality trait. A simple definition is that gratitude is a response of thankfulness and joy when receiving a gift. But
398 Chapter 9 • Evaluation of Normality and Individual Strengths
• Simple appreciation, expressed as gratitude to- ward non-social sources.
• Sense of abundance, expressed as the absence of general resentment.
The 42 items of the GRAT are rated on a 1 to 5 scale (strongly agree to strongly disagree). The test possesses excellent reliability for the three subscales and the total score (Thomas & Watkins, 2003), and reveals theory-consistent relationships with external criteria such as spirituality and the absence of mate- rialism (Diessner & Lewis, 2007).
Even though the authors of the GRAT hypoth- esized a multidimensional model in the develop- ment of their test, subsequent research indicates that gratitude might actually be a unitary trait. Wood, Maltby, Stewart, and Joseph (2007) conducted a fac- tor analysis of the three GRAT subscales and nine other indices of gratitude (including the GQ-6), and found a clear one-factor solution. The 12 measures were highly intercorrelated, indicating a single latent construct which the researchers called gratitude/ appreciation. Gratitude is an essential element of human experience that deserves ongoing psycho- metric inquiry.
items to measure these qualities. The GQ-6 is com- posed of the six best items, as determined by fac- tor-analytic procedures performed with test results from two samples: 238 undergraduates and 1,228 adult volunteers surveyed via the Internet. Reliabil- ity of the instrument is good, with coefficient alphas between .82 and .87. Validity of the GQ-6 is based on numerous theory-confirming relationships with other measures. For example, self-ratings on the GQ-6 correlated modestly with external observers’ perceptions of gratitude in the participants. Addi- tional studies indicated that the GQ-6 is positively related to optimism, hope, spirituality, religiousness, forgiveness, empathy, and prosocial behavior. The scale is negatively related to depression, anxiety, ma- terialism, and envy (McCullough et al., 2002).
While the GQ-6 conceives of gratitude as a single dimension, other researchers have proposed a multidimensional model. For example, the Grati- tude, Resentment, and Appreciation Test (GRAT, Watkins, Woodward, Stone, & Kolts, 2003) pro- poses three dimensions to gratitude:
• Appreciation of others, expressed as gratitude toward other people.
Using the scale below as a guide, write a number beside each statement to indicate how much
you agree with it.
1 = strongly disagree
2 = disagree
3 = slightly disagree
4 = neutral
5 = slightly agree
6 = agree
7 = strongly agree
____1. I have so much in life to be thankful for.
____2. If I had to list everything that I felt grateful for, it would be a very long list.
____3. When I look at the world, I don’t see much to be grateful for.*
____4. I am grateful to a wide variety of people.
____5. As I get older I find myself more able to appreciate the people, events, and situations
that have been part of my life history.
____6. Long amounts of time can go by before I feel grateful to something or someone.*
*Items 3 and 6 are reverse scored.
figure 9.3 The Gratitude Questionnaire-Six Item Form (GQ-6) Source: Reprinted with permission of Michael
McCullough and Robert Emmons. Copyright 2002, all rights reserved.
www.ebook3000.com
Topic 9B • Positive Psychological Assessment 399
• Researcher ratings of funniness of monologues produced under stress
• Researcher ratings of using laughter and hu- mor before dental surgery
The CHS is a respected instrument in humor research. Nonetheless, it has faded in use because later instruments (discussed below) provide broader measures of sense of humor.
The Situational Humor Response Question- naire provides a measure of the degree to which the respondent is easily amused and laughs in a wide range of situations (Martin, 1996; Martin & Lefcourt, 1984). The SHRQ consists of 21 items, the first 18 of which describe ordinary life situations such as “You were at a party and the host acciden- tally spilled a drink on you.” Each item is rated on a scale from 1 (“I would not have been particularly amused”) to 5 (“I would have laughed heartily”). The last three items refer to laughing and being amused in general.
As summarized by Martin (1996), the SHRQ reveals adequate psychometric qualities, includ- ing test-retest correlations of around .70 and Cronbach alphas in the vicinity of .70 to .85. An interesting validity criterion used in several stud- ies is the correlation of test scores with observed frequency of laughter, with rs ranging from .30 to .60. As noted by Martin (2003), frequency of laughter is a good validity criterion, but it is not perfect. After all, there is laughter without humor and humor without laughter. Fortunately, the validity evidence for this instrument includes a wide base of diverse studies, such as correlations with rated funniness of monologues produced by participants, and correlations with other humor scales. Another concern about the test is that the humor situations were designed with college stu- dents in mind and may not generalize to other groups. The humor situations date to the 1980s and earlier; some are no longer funny. After all, what is deemed funny shifts over time, is specific to cultures, and is sometimes idiosyncratic. For ex- ample, some viewers find the video clips featured on the television show America’s Funniest Home Videos to be hilarious, whereas others regard this weekly offering with bewilderment or even down- right scorn.
sense of humor: self-rePorT measures
Humor is a broad construct that has many mean- ings. Humor can refer to the characteristics of the material (a funny joke or cartoon) or the responses of the individual (a chuckle or belly laugh). Humor can be constructive when it brings people together, or destructive when it is at someone’s expense. In contemporary Western society, having a sense of humor is generally viewed as a virtue. It is thought that individuals with a “good” sense of humor will more easily befriend others and also will be able to weather the adversities of life with greater balance.
But how do we conceptualize the loose notion of “sense of humor?” Is this an enduring personality trait, an ability to make others laugh, a tempera- mental feature of good cheer, a world view that life is fundamentally absurd, or something else? Martin (2003, p. 315) argues that: “One of the challenges of re- search on humor in the context of positive psychology is to identify which aspects or components of the hu- mor construct are most relevant to mental health and successful adaptation.” His answer is to conceptualize humor as a way of coping with stress and enhancing re- lationships. With this approach, Martin has developed three instruments used widely in humor research: The Coping Humor Scale, the Situational Humor Response Questionnaire, and the Humor Styles Questionnaire.
The Coping Humor Scale was designed to assess the extent to which individuals report using humor to cope with stress (Martin & Lefcourt, 1983). The CHS consists of 7 items similar to “When things are tense I look for something funny to say” or “I think hu- mor is a useful way of coping with problems.” These items are rated on a scale from 1 (strongly disagree) to 4 (strongly agree). There is no neutral point on the scale, which forces the respondent to take a position.
The CHS has good test–retest reliability, with r = .80 over a 12-week period, but only fair inter- nal consistency, with coefficient alphas of .60 to .70 (Martin, 1996). Regarding validity, Martin (2003, p. 317) summarizes a number of robust external cor- relates of the test. CHS total scores correlate strongly with the following constructs:
• Peer ratings of using humor to cope with stress • Peer ratings of not taking one’s self too seriously
400 Chapter 9 • Evaluation of Normality and Individual Strengths
and discriminant correlations of the subscales with appropriate external criteria including well-being, hostility, intimacy, coping, satisfaction with rela- tionships, and major personality variables (Martin et al., 2003).
How do individual differences in humor styles arise? A recent behavioral genetics analysis com- paring HSQ scores of identical and fraternal twins found fascinating differences in developmental in- fluences among the four humor styles (Vernon, Martin, Schermer, & Mackie, 2008). In this study of 300 pairs of identical twins and 156 pairs of fraternal twins, the positive forms of humor (Affiliative and Self-enhancing) were found to display significant genetic influences whereas the negative forms of hu- mor (Aggressive and Self-defeating) arose in greater measure from common environmental influences. The authors offer the following conclusion:
These results may have implications for po- tential therapeutic interventions designed to modify individuals’ sense of humor. Because traits that are mainly influenced by environ- mental factors may be more malleable than those that are mainly influenced by genetic factors, our findings suggest that it may be easier to help people reduce their levels of ag- gressive and self-defeating humor styles than to increase their use of affiliative and self- enhancing humor. This is clearly a topic for further experimental study. (Vernon et al., 2008, pp. 1123–1124)
The lesson here for psychological testing is that the development of good measures such as the HSQ often generates far-reaching consequences.
Recently, Martin and colleagues have devel- oped a new humor instrument that represents the culmination of decades of research. The Humor Styles Questionnaire (HSQ, Martin, Puhlik-Doris, Larsen, Gray, & Weir, 2003) assesses four dimen- sions that convey individual differences in uses of humor:
• Affiliative: Use of humor to entertain others and facilitate relationships.
• Self-enhancing: Use of humor to cope with stress and uphold a positive outlook during difficult times.
• Aggressive: Use of mocking, manipulative, put-down, or disparaging humor.
• Self-defeating: Use of humor for undue self- disparagement, ingratiation, or defensive reply.
The HSQ includes 32 self-descriptive state- ments (8 for each subscale) that depict specific uses of humor. For example, items on the Affiliative scale might resemble: “I like to tell silly jokes based on word play.” Items on the Aggressive scale might re- semble: “I like to poke fun at people when they make mistakes.”
The first two styles, Affiliative and Self- enhancing, embody constructive and healthy uses of humor. The last two styles, Aggressive and Self- defeating, involve unhealthy uses of humor that distance the individual from others. For each item, respondents indicate agreement or disagreement on a 7-point scale ranging from 1 (totally disagree) to 7 (totally agree). The HSQ reveals excellent psycho- metric properties, with strong internal consistencies of the subscales (around .80), and good test–retest re- liabilities (.80 to .85). Validity is based on convergent
www.ebook3000.com
- Chapter 8: Foundations of Personality Testing
- Topic 8A: Theories of Personality and Projective Techniques
- Personality: an Overview
- Psychoanalytic Theories of Personality
- Origins of Psychoanalytic Theory
- The Structure of the Mind
- The Role of Defense Mechanisms
- Assessment of Defense Mechanisms and Ego Functions
- Type Theories of Personality
- Type A Coronary-Prone Behavior Pattern
- Phenomenological Theories of Personality
- Origins of the Phenomenological Approach
- Carl Rogers, Self-Theory, and the Q-Technique
- Behavioral and Social Learning Theories
- Trait Conceptions of Personality
- Cattel'’s Factor-Analytic Trait Theory
- The Five-Factor Model of Personality
- Comment on the Trait Concept
- The Projective Hypothesis
- A Classification of Projective Techniques
- Association Techniques
- The Rorschach
- Comment on the Rorschach
- Completion Techniques
- Sentence Completion Tests
- Rotter Incomplete Sentences Blank
- Construction Techniques
- The Thematic Apperception Test (TAT)
- The Picture Projective Test
- Children's Apperception Test
- Other Variations on the TAT
- Expression Techniques
- The Draw-A-Person Test
- The House-Tree-Person Test (H-T-P)
- Case Exhibit 8.1: Projective Tests as Ancillary to the Interview
- Topic 8B: Self-Report and Behavioral Assessment of Psychopathology
- Theory-Guided Inventories
- Personality Research Form
- State-Trait Anxiety Inventory
- Factor-Analytically Derived Inventories
- Eysenck Personality Questionnaire
- Comrey Personality Scales
- Criterion-Keyed Inventories
- Minnesota Multiphasic Personality Inventory-2 (MMPI-2)
- MMPI-2 Interpretation
- Technical Properties of the MMPI-2
- Millon Clinical Multiaxial Inventory-III (MCMI-III)
- Personality Inventory for Children-2 (PIC-2)
- Behavioral Assessment
- Behavior Therapy and Behavioral Assessment
- Exposure-Based Methods
- Cognitive Behavior Therapies
- Self-Monitoring Procedures
- Structured Interview Schedules
- Assessment by Systematic Direct Observation
- Analogue Behavioral Assessment
- Ecological Momentary Assessment
- Chapter 9: Evaluation of Normality and Individual Strengths
- Topic 9A: Assessment Within the Normal Spectrum
- Broad Band Tests of Normal Personality
- Myers-Briggs Type Indicator (MBTI)
- California Psychological Inventory (CPI)
- Neo Personality Inventory-Revised (NEO PI-R)
- Stability and Change in Personality
- Personality Stability and Change in Middle and Late Life
- The Assessment of Moral Judgment
- The Moral Judgment Scale
- Stages of Moral Development
- Critique of the Moral Judgment Scale
- The Defining Issues Test
- The Assessment of Spiritual and Religious Concepts
- Challenges and Purposes of Religious and Spiritual Assessment
- Historical Overview on Spiritual and Religious Assessment
- Religion as Quest
- The Spiritual Well-Being Scale
- The Assessment of Spirituality and Religious Sentiments (ASPIRES) Scale
- The Faith Maturity Scale
- Topic 9B: Positive Psychological Assessment
- Assessment of Creativity
- Creativity as Process
- Creativity as Personal Characteristics
- Creativity as Product
- Comment on Creativity Tests
- Measures of Emotional Intelligence
- Assessment of Optimism
- Assessment of Gratitude
- Sense of Humor: Self-Report Measures