Psych

Reeb79

GenderDifferences.pdf

Home >Psychology homework help >Psych

Psychological Bulletin 2000, Vol. 126, No. 5, 703-726

Gender Differences in Moral Orientation: A Meta-Analysis

Sara Jaffee and Janet Shibley Hyde University of Wisconsin—Madison

C. Gilligan's (1982) critique of L. Kohlberg's theory of moral reasoning and her assertion that two modes

of moral reasoning (justice and care) exist have been the subject of debate within the field of psychology

for more than 15 years. This meta-analysis was conducted to review quantitatively the work on gender

differences in moral orientation. The meta-analysis revealed small differences in the care orientation

favoring females (d = -.28) and small differences in the justice orientation favoring males (d = .19).

Together, the moderator variables accounted for 16% of the variance in the effect sizes for care reasoning

and 17% of the variance in the effect sizes for justice reasoning. These findings do not offer strong

support for the claim that the care orientation is used predominantly by women and that the justice

orientation is used predominantly by men.

The 1982 publication of Carol Gilligan's In a Different Voice

marked one of those rare moments when social science research

breaches the ivied walls of academia and captures the public's

imagination. Gilh'gan's assertion that females and males speak in

different moral voices—a care voice characterized by the desire to

maintain relationships and to respond to others' needs and a justice

voice characterized by considerations of fairness and equity—

resonated with readers' experience (Mednick, 1989). Ms. maga-

zine named Gilligan its 1984 Woman of the Year, lauding her for

research that "created a new appreciation for a previously uncata-

logued female sensibility, as well as possibilities for new under-

standing between the genders" (Van Gelder, 1984, p. 37). In 1996,

Time magazine included Gilligan among its "Time 25"—25 inno-

vative Americans with the ability "to show us the world anew, to

educate and entertain us, to change the way we think about

ourselves and others" ('Time 25," 1996, p. 54). Wrote the editors,

How likely is it that a single book could change the rules of psychol-

ogy, change the assumptions of medical research, change the conver-

sation among parents and teachers and developmental professionals

about the distinctions between men and women, boys and girls? (p. 66)

More than 15 years after the publication of this influential work,

are its theses supported by available scientific data?

Gilligan's (1982) research on female moral reasoning chal-

lenged cognitive-developmental stage theories of moral develop-

ment on two fronts. First, she argued for a broader conceptualiza-

tion of moral reasoning that encompassed a care orientation as well

as a justice orientation. The care orientation is characterized by a

focus on maintaining relationships, responding to the needs of

others, and a responsibility not to cause hurt. The justice orienta-

Sara Jaffee and Janet Shibley Hyde, Department of Psychology, Uni-

versity of Wisconsin—Madison.

We would like to acknowledge the contributions of Rose Jadack and

MaryBeth Nolan to this project. The Graduate School of the University of

Wisconsin provided financial support.

Correspondence concerning this article should be addressed to Janet

Shibley Hyde, Department of Psychology, University of Wisconsin—

Madison, 1202 W. Johnson Street, Madison, Wisconsin 53706.

tion is concerned with principles of fairness and equity such as

those assessed in conventional measures of moral reasoning (Gil-

ligan, 1982). Although acknowledging that males and females

could use either a justice or a care perspective, Gilligan asserted

that care reasoning was used predominantly by females and justice

reasoning was used predominantly by males (Gilligan, 1982; Gil-

ligan & Attanucci, 1988). Second, she argued that Kohlberg's

Moral Judgment Interview (MJI; Colby et al., 1987)—the most

widely used measure of moral reasoning—was gender biased

because it was validated on an all-male sample and because its

scoring scheme characterized considerations of care and response

as less sophisticated than considerations of justice and fairness.

Over the last 15 years, Gilligan's (1982) work has inspired a

wealth of empirical research and criticism, some of which has

supported her claims and some of which has not. At the same time,

researchers have seldom agreed on how Gilligan's care and justice

orientations should be defined or how they should be measured.

Consequently, it has been difficult to compare the results of these

studies. The goal of this article is to evaluate Gilligan's assertions

that (a) there are gender differences in Kohlbergian moral stage

and (b) the care and justice orientations are gender related. The

former assertion has been reviewed meta-analytically by several

investigators, and those results are summarized here. Meta-

analysis was used in the present paper to evaluate the latter

assertion. In addition, a number of complexities surrounding the

study of gender and moral reasoning are reviewed. The following

sections present Gilligan's theory of moral reasoning, as well as

the criticisms of this work.

Gilligan's Theory of Moral Reasoning

Gilligan's (1977, 1982) theory of moral reasoning stemmed

from Kohlberg's cognitive-developmental stage theory of moral

development. Kohlberg (1969, 1976, 1984) proposed that individ-

uals progress through a sequence of invariant and universal stages

of moral reasoning. These six stages are grouped into three levels,

each of which represents a qualitative advance in the individual's

ability to understand and integrate diverse points of view (Kohl-

berg, 1976).

703

704 JAFFEE AND HYDE

Early research on Kohlbergian moral development reported that

Stage 3 was the modal stage for females and Stage 4 was the modal

stage for males (Fishkin, Keniston, & MacKinnon, 1973; Haan,

Smith, & Block, 1968; Holstein, 1969; Kohlberg & Kramer, 1969;

Poppen, 1974). Stage 3 reasoning is characterized by the desire to

maintain relationships and to meet others' expectations, and

Stage 4 is characterized by a law-and-order mentality in which

laws are upheld so as to maintain the social order. These findings

led some researchers to accuse Kohlberg's theory of gender bias

(Gilligan, 1977, 1982; Haan, 1978). Specifically, Gilligan (1977,

1982) argued that because Kohlberg derived his theory of moral

development from an all-male sample, he neglected to recognize a

distinctively female mode of moral reasoning—one that is char-

acterized by a desire to maintain relationships and a responsibility

not to cause hurt. This care orientation contrasts with a distinc-

tively male mode of moral reasoning—the justice orientation—

that is based on the abstract principles of justice, fairness, and

individualism captured by Kohlberg's MJI. Unlike Kohlberg's

moral stages, the care and justice orientations do not represent

cognitive structures that develop in a stagelike sequence. Instead,

they represent frameworks that can be modified by experience and

through which individuals interpret and resolve moral problems.

However, Gilligan argued that the care orientation's emphasis on

maintaining relationships led to its classification as a Stage 3

response in Kohlberg's scoring scheme.

Origins and Development of Moral Orientations

Gilligan and colleagues asserted that the care and justice orien-

tations are rooted in early childhood experiences of attachment and

inequality that foster, respectively, a relational and an individual-

istic self-concept (Gilligan & Attanucci, 1988; Gilligan & Wig-

gins, 1987). Because children are born into a position of inequality

and cannot survive without an attachment figure, all children are

exposed to the conditions that form the basis of both moral

orientations. Thus, all individuals have the capacity to understand,

experience, or implement either moral orientation. The reason that

males and females come to use one mode of moral reasoning over

the other is that these experiences of attachment and inequality are

differentially reinforced in a society dichotomized by gender.

Working from Chodorow's (1978) neopsychoanalytic account of

gender identity development, Gilligan proposed that because

women are the primary caretakers in most cultures, girls' self-

concepts are based on a definition of themselves as similar to and

connected with their mothers whereas boys' self-concepts are

rooted in their difference and separation from their mothers. More-

over, boys identify with their fathers, who may be perceived as

authority figures who hold power over them. Thus, the experience

of attachment and connection is more salient to girls, whereas the

experience of inequality and separation is more salient to boys.

These initial experiences of attachment and inequality may be

confirmed in later childhood and adolescence, resulting in an

association between gender and moral orientation (Gilligan &

Wiggins, 1987). Gilligan and Wiggins (1987) concluded:

The sex difference question, when framed in this way, does not carry

the implication that one sex is morally superior, nor does it imply that

moral behavior is biologically determined. Instead, it draws attention

to two perspectives on morality. To the extent that biological sex, the

psychology of gender, and the cultural norms and values that define

masculine and feminine behavior affect the experience of equality and

attachment, these factors presumably will influence moral develop-

ment, (p. 282)

Empirical Tests of Gilligan's Developmental Theory

These claims about the origins and development of moral voice

are largely untested. Benenson, Morash, and Petrakos (1998) ob-

served 41 mother-child dyads (children were 4 and 5 years old) in

a play setting and found that, compared with boys, girls were

physically closer to their mothers, engaged in more mutual eye

contact with their mothers, and were rated higher on global enjoy-

ment. Although the authors cited these data as supportive of

Chodorow's (1978) theory, it is not clear whether emotional close-

ness, as measured in this study, matches Chodorow's concept of

parental identification. Moreover, because emotional closeness

between fathers and their children was not studied, it is impossible

to determine whether the nature of children's relationships with

their fathers differs from their relationships with their mothers.

Other researchers have interpreted Gilligan to mean that gender

differences in moral orientation stem from parental socialization

practices that foster a relational self-concept in females and an

individualistic self-concept in males (Lollis, Ross, & Leroux,

1996; Walker, 1997). Lollis and colleagues (1996) observed par-

ents' interventions in property conflicts among 2-4-year-old sib-

lings. They found that mothers intervened more often and used

more care reasoning than fathers, although parents did not differ in

their use of justice reasoning. However, there was no evidence that

girls received more care-oriented interventions nor that boys re-

ceived more justice-oriented interventions from either parent.

Thus, the authors concluded that boys and girls were not being

socialized differently in this respect. This argument, however,

cited a social learning explanation for the development of gender

differences in moral reasoning, whereas Gilligan posited a neo-

psychoanalytic account in which the origin of gender differences

in moral reasoning lies in the child's sense of identification with

each parent.

Various researchers have called into question the testability of

Gilligan's claims (Walker, 1995), citing the difficulties inherent in

measuring constructs like parental identification, equality/inequal-

ity, and attachment/detachment. Empirical tests of Gilligan's

claims would indeed prove challenging. Such studies might de-

scribe how the connected and individuated self-concepts are dif-

ferentially reinforced for girls and boys and how the self-concept

changes over time. They also might elucidate the mechanisms by

which parental identification leads to gender differences in the

self-concept and, ultimately, to gender differences in moral

orientation.

Levels of Care Reasoning

Originally, Gilligan (1982) proposed that the care orientation

developed in three phases, the first characterized by an exclusive

focus on one's own needs (i.e., caring for oneself only), the second

characterized by self-sacrifice and a focus on others' needs, and

the third characterized by the ability to balance others' needs and

one's own needs. This sequence was derived from her interviews

with 29 women, diverse in age, ethnic background, and socioeco-

nomic status (SES), who were considering whether to undergo an

GENDER DIFFERENCES IN MORAL ORIENTATION 705

abortion. Gilligan (1982) conducted a follow-up study 1 year after

the initial abortion interview. Although many of the women had

traversed at least two of the levels, none of the 21 women had

progressed across all three, and many showed regression. Gilligan

seems to have dropped this developmental sequence in her more

recent work (Gilligan, Brown, & Rogers, 1990).

Skoe and Marcia (1991) developed and validated the Ethic of

Care Interview (ECI) to assess Gilh'gan's levels of care reasoning.

The ECI is negatively related to measures of authoritarianism and

positively related to measures of ego identity, cognitive complex-

ity, role-taking, and MJI scores (Skoe & Diessner, 1994; Skoe,

Pratt, Matthew, & Curror, 1996). Although some evidence exists

that females score higher than males on the ECI (Skoe & Gooden,

1993; Skoe et al., 1996), this finding has not been replicated

consistently (Skoe & Diessner, 1994; Sochting, Skoe, & Marcia,

1994). Unfortunately, the ECI has not been used to answer impor-

tant questions about the developmental progression of care rea-

soning or mechanisms of change. Although Skoe and Diessner

(1994) reported a positive association between ECI scores and age,

this finding was based on cross-sectional data. Skoe et al. (1996)

reported moderately stable levels of care reasoning across a 4-year

period in mid- to late adulthood, but it would be informative to

explore longitudinally the development of care reasoning at earlier

ages.

There has been little discussion of how or why individuals move

from one level of care to the next. Although Gilligan (1982)

suggested that individuals progress through these levels in times of

crisis, it is not clear if one reaches the highest level and remains

there or if each new crisis causes the cycle to start anew (Sichel,

1985). The latter would suggest that the levels of care reasoning

represent a process by which individuals resolve moral problems

rather than a developmental sequence in which each level repre-

sents an advance in moral maturity. Skoe and Marcia (1991)

suggested that advances in moral thought are based on questioning

previously held beliefs and formulating new, more inclusive po-

sitions, and they compared this to cognitive-developmental pro-

cesses of disequilibration and accommodation. However, such a

cognitive-developmental account of care reasoning has not been

assessed empirically.

Measures of Moral Orientation and Methods of Coding

A number of measures have been developed to assess moral

orientation. These generally fall into two categories: interview

measures and objective measures. The majority of these measures

were designed to assess Gilligan's conceptualization of the care

and justice orientations.

Interview Measures

Interview measures usually ask participants to describe a moral

dilemma from their own lives. Responses are coded for the pres-

ence of justice and care reasoning. A number of coding schemes

have been developed. For instance, Lyons's (1982) coding scheme calculates the number of care and justice considerations (each idea

the participant presents in discussing the dilemma) and classifies

moral orientation as care or justice on the basis of the modal

response. Moral orientation is recorded as "split" if the participant

voices as many care considerations as justice considerations. Other

researchers have modified this coding scheme to better differenti-

ate care and justice reasoning (Gilligan & Attanucci, 1988; Krebs,

Vermeulen, Denton, & Carpendale, 1994). Responses are classi-

fied as care only or justice only if all the considerations can be

categorized as care- or justice-based. If at least 75% of the con-

siderations can be categorized as care- or justice-based, the re-

sponse is classified as care focus or justice focus, respectively. If

less than 75% of the considerations can be categorized as care- or

justice-based, the response is categorized as mixed. Finally, Brown

and colleagues (1988) developed a coding system in which a

response narrative is read four separate times, each time from a

different standpoint. In the first reading, attention is focused on the

story being told. In the second reading, all references to the self are

underlined. In the third and fourth readings, attention is focused on

moral voice: fust the care perspective, then the justice perspective.

Finally, the reader completes a summary worksheet in which

his/her interpretation and summary of the text are substantiated by

quotes from the interviews. These summaries allow the reader to

code for the presence of care and justice reasoning and the pre-

dominance of one or the other moral orientation. Brown et al.

maintained that by reading the narrative from many different

standpoints, the reader remains open to the possibility that the

same statement can be interpreted in different ways.

The Fable Interview (Johnston, 1988) is another interview mea-

sure that taps Gilligan's moral orientation construct. However,

instead of asking participants to discuss dilemmas from their own

lives, participants respond to dilemmas embedded within two of

Aesop's fables. The first solution to each fable is considered the

spontaneous solution. Respondents are then probed to determine if

they can provide an alternative solution to the dilemma. If, for

instance, the spontaneous solution is a care solution, respondents

are encouraged to offer a justice solution. If they do so, they are

then asked to decide which is the better solution. A modified

version of Lyons's (1982) coding scheme is used to classify

respondents as care, justice, or split reasoners. The unit of analysis

in the fable coding is the entire solution offered by the respondent

(as opposed to the individual considerations in each response). One

advantage of the Fable Interview is that comparisons can be made

between individuals because all are responding to the same

dilemmas.

Eisenberg and colleagues (Eisenberg, Lennon, & Roth, 1983;

Eisenberg-Berg, 1979) and Kohlberg and his colleagues (Colby et

al., 1987) have developed coding schemes for individuals' re-

sponses to prosocial moral dilemmas and MJI dilemmas, respec-

tively. Eisenberg's scheme calculates the frequency with which an

individual uses the various types of prosocial reasoning in re-

sponding to a prosocial dilemma. In Kohlberg's scheme, orienta-

tion scores are generated on the basis of the content of an indi-

vidual's response to a dilemma. These interview measures are not

meant to tap Gilligan's conceptualization of care or justice

reasoning.

Self-Administered Objective Measures

Objective tests of moral reasoning and moral orientation have

also been developed. Some of these, including the Care/Response

Orientation Scale (CROS; Atunzu, 1986), the Measure of Moral

Orientation (MMO; Liddell, 1990), and the Moral Orientation

Scale (Yacker & Weinberg, 1990), were designed to assess the

706 JAFFEE AND HYDE

care and justice orientations as conceptualized by Gilligan. Others,

such as the Prosocial Moral Reasoning Objective Measure (Carlo,

Eisenberg, & Knight, 1992), were designed for other purposes,

although they have been used to assess care and/or justice reason-

ing. In these tests, respondents are asked to generate their own

moral dilemma or to respond to a hypothetical one. They are then

presented with multiple responses representing examples of care

and justice reasoning and asked to pick the one they would use or

prefer in resolving the dilemma. These tests produce continuously

distributed care and/or justice orientation scores.

Finally, as discussed above, the ECI (Skoe & Marcia, 1991)

assesses Gilligan's levels of care reasoning. The ECI consists of

four dilemmas administered in a structured interview format. One

of these dilemmas is generated by the respondent, and the other

three are standard interpersonal dilemmas. The ECI can be scored

according to level or according to the total score across the four

dilemmas.

Criticisms of the Moral Orientation Construct

Gilligan has been accused of oversimplifying Kohlbergian

moral reasoning in her description of the justice orientation (Puka,

1991; Walker, 1989; Walker, de Vries, & Trevethan, 1987). Critics

have argued that the justice orientation comes closest to the rigid

law-and-order reasoning of Kohlberg's Stage 4, and Gilligan has

been criticized for ignoring the contention that justice and rights

exist in the context of social responsibilities and obligations. Thus,

at the higher stages of moral reasoning, individuals follow rules

only if those rules benefit the common good (Kohlberg, Levine, &

Hewer, 1983).

Critics have also argued that too little work has been done to

characterize or validate the existence of Gilligan's care and justice

orientations other than to demonstrate that these modes of moral

reasoning are present in individuals' responses to moral dilemmas

(Walker, 1989; Walker et al., 1987). Consequently, it is not clear

whether moral orientations are used consistently across situations

and over time or how moral orientations relate to each other or to

moral behavior.

Intraindividual Consistency

Gilligan's assertion that most individuals prefer one mode of

reasoning over the other implies a high level of intraindividual

consistency in their use (Gilligan & Attanucci, 1988; Langdale,

1986). Indeed, Gilligan and Attanucci (1988) reported that two

thirds of their participants focused on only one orientation in their

discussion of a self-generated moral dilemma. Langdale (1986)

demonstrated that, within a single dilemma, approximately 87% of

participants used a single orientation and that the number of

individuals with a predominant justice or care orientation (as

opposed to mixed orientation) was significantly greater than

chance. The number of people who showed a predominant justice

or care orientation across different dilemmas was also significantly

greater than chance.

Other researchers, however, have been unable to replicate these

results (Krebs et al., 1994; Pratt, Golding, & Hunter, 1983; Pratt,

Golding, Hunter, & Sampson, 1988; Walker et al., 1987; Wark &

Krebs, 1996). For example, Wark and Krebs (1996) reported that

only 9% of their college-age participants obtained the same moral

orientation score across three types of dilemmas and only 29%

obtained the same or an adjacent score. Walker and colleagues

(1987) asserted that consistency would be demonstrated by indi-

viduals who used the same orientation 75% of the time or more.

Less than 20% of participants met this consistency criterion across

self-generated and hypothetical dilemmas. Within a single di-

lemma, only about half met the criterion. Similarly, Pratt and

colleagues (1988) reported that only 60% of their participants used

the same orientation across two self-generated dilemmas. In sum-

mary, there is little evidence that moral orientations are used

consistently within or between dilemmas.

There is mixed evidence regarding the extent to which moral

orientations are used consistently over time. Walker (1989) fol-

lowed children and adults over a 2-year interval and found that half

the respondents evidenced a different orientation at the follow-up

than at the initial interview. However, Skoe and colleagues (1996)

collected two waves of data, 4 years apart, from a sample of

middle-aged and elderly adults and reported that care reasoning

levels were moderately stable within mid- to late adulthood. It

should be noted that these longitudinal analyses addressed differ-

ent issues. Whereas the Walker analysis addressed the question of

whether individuals use different moral orientations over time, the

Skoe et al. analysis assessed changes in level of care reasoning

over time.

Validity

There is qualified evidence for the construct validity of Gilli-

gan's moral orientations. Pratt, Diessner, Hunsberger, Pancer, and

Savoy (1991) reported that the justice and care orientations were

associated with variations in the self-concept of the sort described

by Gilligan (1982) and Lyons (1983), such that those who had

more individuated self-concepts tended to discuss justice-oriented

personal dilemmas. However, their results did not clarify whether

males were more likely to report an individuated self and females

were more likely to report a connected self. Lyons, however,

reported that women were more likely than men to report a

connected self, men were more likely than women to report an

individuated self, and that men and women also differed in pre-

dicted ways in their use of the care and justice orientations.

Finally, Liddell and colleagues (LiddeU, 1998; Liddell, Halpin, &

Halpin, 1992) reported that scores on a measure of connected

self-concept were positively related to a standardized measure of

care reasoning and scores on a measure of a rights-oriented self-

concept were positively related to a standardized measure of

justice reasoning.

There is limited evidence for the convergent validity of the

moral orientation constructs. Liddell (1998) compared the MMO

(Liddell, 1990) with semistructured interviews coded according to

Lyons's (1982) protocol. Care and justice scores, as measured by

the MMO, showed positive but nonsignificant correlations with,

respectively, care and justice as measured by the semistructured

interview.

Other evidence does not support predictions derived from Gil-

ligan's work. Because Gilligan argued that justice reasoning is

favored at the highest stages of Kohlberg's framework, it is ex-

pected that the care orientation would be negatively associated and

the justice orientation would be positively associated with Kohl-

bergian moral stage. Although Krebs and colleagues (1994) re-

GENDER DIFFERENCES IN MORAL ORIENTATION 707

ported that moral stage scores were negatively correlated with the

care orientation for men, Pratt and colleagues (1988) reported a

positive relationship between care reasoning and moral stage for

women and no association between the two for men. Walker and

colleagues (1987) found no relationship between moral stage and

moral orientation when the latter was assessed with hypothetical

dilemmas but found a positive relationship between the care ori-

entation and moral stage when orientation was assessed with

self-generated dilemmas. Moreover, individuals at the highest

stages of moral reasoning were more likely to use both care and

justice reasoning in their response to dilemmas.

In summary, there is little evidence to support the notion that

individuals use a particular moral orientation consistently over

time and situations. Although there is some evidence that moral

orientations are associated with related constructs and measures,

these predicted relationships are not obtained consistently. Finally,

there is virtually no evidence regarding the extent to which moral

orientations develop over time. This pattern of evidence contrasts

sharply with Kohlberg's account of moral reasoning and develop-

ment, for which there is ample evidence of reliability, validity, and

stage sequence. Arguably then, these are not comparable phenom-

ena and, if so, accounts of one need not encompass the other, nor

can they be criticized for disregarding the other (Puka, 1991).

What Constitutes Moral Maturity?

As various critics have pointed out, it is not always clear what

Gilligan believes is the relationship between the justice and care

orientations or what constitutes moral maturity (Auerbach, Blum,

Smith, & Williams, 1985; Flanagan & Jackson, 1987; Mason,

1990; Puka, 1991; Sichel, 1985; Walker, 1995). Most commonly,

Gilligan has drawn on the metaphor of the ambiguous figure to

illustrate her discussion of moral reasoning. The care orientation

and the justice orientation are framed as two ways of seeing a

moral problem. Choosing to see the problem from one perspective

may lead one to neglect the ways in which the problem might be

solved from the other perspective (Gilligan & Wiggins, 1987). At

times, Gilligan has asserted that these perspectives are incompat-

ible alternatives to one another but are both adequate from a

normative point of view (Gilligan, 1986b, 1987). In other places,

she has argued that they complement one another and that each is

deficient without the other (Gilligan et al., 1990). Thus, it is not

clear whether moral maturity is characterized by the ability to

integrate and balance the justice and care perspectives, to maintain

them in a complementary tension, or whether there is a morally

mature care orientation in the absence of a morally mature justice

perspective. Notably, her more recent discussions favor the notion

that the justice and care perspectives are maintained in comple-

mentary tension (Gilligan et al., 1990).

Even granting that moral maturity is characterized by the ability

to maintain both the justice and the care perspective in some sort

of tension, this formulation does not resolve the question of how

individuals should solve moral problems—the question of whether

and when either moral voice should take precedence (Sichel,

1985). Although it is possible and, Gilligan might suggest, neces-

sary to perceive moral problems from multiple perspectives, moral

action requires choice, and Gilligan's framework offers few guide-

lines as to which moral voice is more adequate in which situations.

This, however, is an ethical issue and not one that empirical

research can address.

Evidence for and Against Gender Differences

in Moral Reasoning

Gender Differences in Kohlbergian Moral Stage

Gilligan's claim that the MJI is gender biased has been con-

vincingly debunked. Critical reviews of the moral development

literature have failed to find evidence that Kohlberg-based mea-

sures yield gender differences in moral reasoning scores. Instead,

these reviews have found that gender differences in moral reason-

ing are small to nonexistent (Rest, 1979; Thoma, 1986; Walker,

1984, 1991, 1995). Rest (1979) conducted a critical review of 17

studies with 20 independent comparisons of male and female

participants on the Defining Issues Test (BIT; Rest, 1979) and

found only two significant gender differences, both favoring

females.

Walker's (1984) critical review and meta-analysis of 79 studies

that measured moral reasoning with the MJI found small gender

differences favoring males only in adulthood. Sex accounted for

only one twentieth of one percent of the variability in moral

reasoning development. Walker noted that gender differences ap-

peared most frequently in studies that confounded gender with

education or occupational status and in studies that used the earlier,

less reliable versions of the Kohlberg scoring manual (see Baum-

rind, 1986, for a critique of Walker and Walker, 1986b, for a reply;

see also Walker, 1991, 1995). Walker (1991) updated this review

and obtained similar results.

Thoma (1986) conducted a meta-analysis of gender differences

in moral reasoning in 56 samples that were administered the DIT

(Rest, 1979) and found a small effect favoring females. Gender

differences accounted for less than 0.5% of the variance in DIT

scores, whereas age and education accounted for nearly 53% of the

variance in DIT scores. Thoma concluded that females are not at

a disadvantage when measured by the DIT.

Finally, Colby and Damon (1983) pointed out that Kohlberg's

model has now been validated on a sample of males and females.

They found that females passed through the same stages in the

same order as males (Colby, Kohlberg, Gibbs, & Lieberman,

1983). When occupation and education were controlled, gender

differences in moral reasoning level disappeared. Thus, they con-

cluded that Gilligan's allegations of gender bias in Kohlberg's

theory are unwarranted.

Gilligan's (1986a) response to critics has been to suggest that a

lack of gender differences on the MJI may simply demonstrate that

females learn to use the justice orientation as effectively as males.

Because the MJI codes only those data that fit within its stage

definitions, considerations of care may still be ignored. Although

this is a valid point, it begs the question of what sort of data would

disconfirm Gilligan's hypothesis.

It is important to point out that the studies cited above do not test

Gilligan's assertion that males are more likely than females to use

the justice orientation. The research cited above demonstrates that

there are minimal gender differences injustice reasoning stage, but

does not speak to the question of whether there are gender differ-

ences in the use of the justice orientation.

708 JAFFEE AND HYDE

Gender Differences in Moral Orientation

Although the extant literature fails to find gender differences in

stage scores on Kohlberg-based measures, the research on gender

differences in moral orientation is less conclusive. Most research-

ers have now acknowledged that more than one mode of moral

reasoning exists (Kohlberg et al., 1983), but there is considerable

controversy as to whether those moral orientations can be reliably

associated with gender. Whereas some researchers have found

evidence for Gilligan's claim that care reasoning is used predom-

inantly by females and justice reasoning predominantly by males

(Gilligan & Attanucci, 1988; Johnston, 1988; Yacker & Weinberg,

1990), other researchers have found gender differences in care

reasoning only (Galotti, Kozberg, & Farmer, 1991; Garmon, Bas-

inger, Gregg, & Gibbs, 1996; Gibbs, Arnold, & Burkhart, 1984;

Liddell, Halpin, & Halpin, 1993; Wark & Krebs, 1996), and still

others have failed to find gender differences in the use of either

care or justice reasoning (Beal, Garrod, Ruben, & Stewart, 1997;

Friedman, Robinson, & Friedman, 1987; Walker et al., 1987).

Moreover, many researchers have found evidence that gender

differences in moral orientation are moderated by other variables,

such as dilemma content (Walker et al., 1987; Wark & Krebs,

1996,1997) and social class (Beal et al., 1997; Puka, 1989; Tronto,

1987).

Given the widely disparate findings regarding gender differ-

ences in the use of justice and care reasoning, there is a need for

a review of this body of research. Meta-analysis provides an

appropriate tool for synthesizing the research on gender differ-

ences in moral orientation and for determining the direction and

the actual magnitude of the effect. Importantly, meta-analysis also

allows the researcher to examine how the size of the effect might

be moderated by other variables. We hypothesized that the effect

size for gender differences in moral orientation might be moder-

ated by the following variables, each of which is addressed in turn:

(a) age, (b) SES, (c) moral orientation construct, (d) type of

dilemma, (e) coding scheme, (f) scale, (g) gender of the protago-

nist, and (h) publication status. Although Walker (1995) provided

an excellent narrative review of some of these moderators, he did

not subject the studies to quantitative meta-analysis.

Moderator Variables

Age

Gilligan and colleagues (Gilligan, 1982; Gilligan & Wiggins,

1987) proposed that gender differences in moral orientation

emerge in early childhood and persist across the life course.

However, some cross-sectional research has suggested that gender

differences in moral orientation are moderated by age. Walker and

colleagues (Walker, 1989; Walker et al., 1987) reported gender differences in care reasoning favoring women among adults who

discussed self-generated dilemmas but failed to find such gender

differences among 1st-, 4th-, 7th-, and lOth-grade children. Sim-

ilarly, in a study of young, middle, and older adults, Pratt and

colleagues (1988) reported that gender differences in moral orien-

tation were significant only for the middle-age group. Pratt and

colleagues proposed that family organization during the period of

active parenting (i.e., middle age) leads to increased gender-role

polarization (Guttman, 1985) and thus heightens gender differ-

ences in moral orientation. Consistent with this hypothesis, they

found that mothers were significantly less justice-oriented than

fathers, but this gender difference failed to emerge among nonpar-

ents. However, other cross-sectional studies have failed to find

evidence that gender differences in moral orientation are moder-

ated by age (Craft, 1992; Galotti et al., 1991; Garrod, Beal, & Shin,

1990; Langdale, 1986; Pratt et al., 1991).

In a longitudinal study of prosocial moral reasoning that fol-

lowed a cohort of children from age 4.5 to age 20, Eisenberg and

colleagues (1987) reported that gender differences in sympathetic

and role-taking reasoning, both of which have been likened to

Gilligan's care orientation, emerged at approximately age 11 or 12.

However, Walker (1989) failed to find evidence among children of

gender differences in moral orientation at either of two assess-

ments in a short-term longitudinal study. Gender differences in

moral orientation emerged at both time points for adults who

described dilemmas from their own lives. Meta-analysis is needed

to sort out the inconsistencies in these cross-sectional and longi-

tudinal findings.

Socioeconomic Status

Some researchers have proposed that gender and subordinate or

minority status are confounded in studies of moral orientation

(Puka, 1989; Tronto, 1987). They have argued that subordinate

status and the condition of powerlessness promote an inherent

concern with others because those others, in part, determine one's

outcomes (Tronto, 1987). However, other researchers have argued

that low social status promotes a concern with fairness and rights

because these rectify the social inequalities experienced by minor-

ity or subordinate groups (Beal, et al., 1997). Garrod and Beal

(1993) found that children from a rural, working-class community

were somewhat more likely to emphasize rights considerations

than were children from a more affluent university suburb. Simi-

larly, Gilligan and Attanucci (1988) reported that minority students

were more likely than nonminority students to adopt a rights

perspective.

These hypotheses might be interpreted to mean that SES exerts

a main effect on moral reasoning, in which case gender differences

in both care and justice reasoning should be smallest for lower

class groups. Alternatively, they might be interpreted to mean that

gender and social class interact, such that gender differences in

care reasoning (but not justice reasoning) should be greatest

among lower class groups because low-SES females will be more

likely to use the care orientation as a function of both their gender

and their social class.

Moral Orientation Construct

Most measures of moral orientation base their definition of the care and justice orientations on Gilligan's description of these

constructs. However, some researchers have taken a different

approach to measuring moral orientation. For instance, Eisenberg

and her colleagues (Eisenberg et al., 1983; Eisenberg-Berg, 1979) have focused on prosocial moral reasoning, which is defined as

"reasoning about conflicts in which the individual must choose

between satisfying his or her own wants or needs and those of

others in contexts in which laws, punishments, authorities, formal

obligations, and other external criteria are irrelevant or deempha-

sized" (Eisenberg-Berg, 1979, p. 128). Eisenberg and her col-

GENDER DIFFERENCES IN MORAL ORIENTATION 709

leagues have characterized types of prosocial moral reasoning,

several of which (e.g., concern with other's physical, material, or

psychological needs; role-taking; sympathetic orientation) load on

a single factor that represents other-oriented reasoning and are

conceptually similar to Gilligan's description of the care orienta-

tion (Eisenberg, Fabes, & Shea, 1989).

Other researchers claim that moral orientation can be measured

at the aspect level or the orientation level in Kohlberg's scoring

scheme (de Vries & Walker, 1986; Garmon et al., 1996). Unlike

Kohlberg's stages, these moral orientations are more closely re-

lated to the content rather than the structure of moral reasoning. It

has been argued that the perfectionism orientation (emphasizing

dignity and autonomy, good conscience and motives, and harmony

with self and others) and the utilitarian orientation (emphasizing

welfare or happiness consequences for oneself and others) are

consistent with the care orientation, whereas the normative orien-

tation (emphasizing duties and rights) and the fairness orientation

(emphasizing justice) are consistent with the justice orientation

(Walker, 1986a, 1995). Other researchers have focused on the

aspects that characterize each stage of moral reasoning (Garmon et

al., 1996). Specifically, they have pointed to similarities between

empathic role-taking, which is a Stage 3 aspect that is character-

ized by empathic references to another's psychological or emo-

tional welfare, and the care orientation.

There is some disagreement as to whether Kohlberg's orienta-

tions are adequate measures of the care orientation. Smetana

(1984) pointed out that stage and aspect scores are confounded in

Kohlberg's scoring system. For instance, the perfectionist orien-

tation can be scored only at the higher levels of moral reasoning.

Nevertheless, because a number of studies have assessed gender

differences in care and justice reasoning using Kohlberg's moral

orientations, these studies were included in the meta-analysis.

Type of Dilemma

Some researchers have suggested that gender differences in the

use of care and justice reasoning might be accounted for by the

kinds of dilemmas used to elicit moral reasoning (Clopton &

Sorell, 1993; Ford & Lowery, 1986; Gilligan, 1982; Langdale,

1986; Mednick, 1989; Walker, 1989; Walker et al., 1987; Wark &

Krebs, 1996, 1997). One possibility is that when dilemmas are

self-generated, females discuss dilemmas that deal with personal

concerns (e.g., dilemmas dealing with relationships between the

participant and close others) and males discuss impersonal dilem-

mas (e.g., conflicts between others or involving generalized others,

such as clients or students). Because personal dilemmas are inher-

ently concerned with relationships, they may elicit care reasoning,

whereas impersonal dilemmas may elicit considerations of fairness

and reciprocity (Walker, 1989; Walker et al., 1987). Indeed, sev-

eral researchers have found that women generate more personal

real-life dilemmas than men do and men generate more impersonal

real-life dilemmas (Pratt et al., 1988, 1991; Skoe & Diessner,

1994; Skoe et al., 1996; Walker et al., 1987; Wark & Krebs, 1996,

1997). When type of dilemma is controlled, gender differences in

moral orientation are eliminated, suggesting the important role of

situational factors in moral orientations (Walker, 1989; Walker et

al., 1987; Wark & Krebs, 1996). Consistent with this possibility,

Wark and Krebs (1997) reported that prosocial dilemmas evoked

more care-based moral judgments than Kohlberg dilemmas or

dilemmas dealing with transgressions or rule violations and these,

in turn, evoked more justice-based moral judgments than prosocial

dilemmas did. Other researchers, however, have found no evidence

of gender differences in dilemma content (Ford & Lowery, 1986;

Peter & Gallop, 1994). Given these conflicting findings, meta-

analysis is needed to determine whether there are systematic trends

in research outcomes.

Coding Scheme

As discussed in the section on measures of moral orientation,

several coding schemes have been developed to code for justice

and care reasoning in individuals' responses to moral dilemmas.

These schemes differ on the basis of the unit of analysis (e.g.,

considerations vs. entire solution) and whether they are objective

or interpretive schemes. Given this heterogeneity, coding scheme

was included as a moderator variable.1

Scale

Measures of moral reasoning can also be distinguished by their

scale of measurement. In the coding schemes mentioned above,

moral orientation can be analyzed as a categorical variable (i.e.,

respondents can be classified as having a care or justice orienta-

tion) or as a continuous variable (e.g., a "percent care" score is

computed for each participant on the basis of the percentage of

care-based considerations in each response). Some continuous

measures rate respondents along a care continuum and a justice

continuum, yielding independent measures of each orientation.

Because the effect sizes that form the basis of a meta-analysis

are computed with means and standard deviations, moral orienta-

tion measures that yield continuously scaled scores may provide

more reliable effect size estimates than measures that describe the

proportion of respondents who fall into one or the other moral

orientation category. We wanted to be able to estimate the mag-

nitude of the effect size for gender differences in moral orientation

when only the most reliable estimates of the effect size were

included in the analysis, as well as when all effect sizes were

included (this issue is elaborated in the Results section). Conse-

quently, scale was included as a moderator variable.

Gender of the Dilemma's Protagonist

Most of the research on gender of the protagonist has dealt with

its role as a potential moderator of gender differences in moral

stage. These studies have addressed the possibility that women

may score lower on Kohlberg's MJI (Colby et al., 1987) because

they are unable to identify completely with the male protagonists

of the dilemmas (Holstein, 1976; however, it is important to note

1 It is important to note the distinction between coding scheme and moral

orientation measure. The same measure (e.g., the MJI) might be scored

according to any of several different coding schemes. The moderator

analyses do not explore how the magnitude of the gender difference in

moral orientation differs as a function of the measures themselves (e.g., the

Fable Interview or the MMO) because of heterogeneity in the coding

schemes used to score the same measure across studies and because there

were too few of certain measures (e.g., the CROS) to allow for meaningful

interpretation of the moderator analyses.

710 JAFFEE AND HYDE

that not all the MJI dilemmas feature male protagonists). Such

studies have provided inconsistent evidence that gender of the

protagonist has an effect on moral judgment level (Bussey &

Maughan, 1982; Freeman & Giebink, 1979; Garwood, Levine, &

Ewing, 1980; Krebs et al., 1994; Lonky, Roodin, & Rybash, 1988;

Orchowsky & Jenkins, 1979; Turiel, 1976).

Relatively few studies have examined the extent to which the

gender of the protagonist moderates gender differences in moral

orientation. Albrecht (1989) found that dilemmas with male pro-

tagonists elicited significantly more justice reasoning than dilem-

mas with female protagonists. However, Beal et al. (1997) reported

a preference for care reasoning regardless of the protagonist's

gender. Similarly, Krebs et al. (1994) failed to find differences in

moral orientation depending on whether participants responded to

Kohlberg's dilemmas from their own perspective or from a third-

person perspective. Because of these inconsistencies, gender of the

protagonist was included as a moderator variable.

Publication Status

Because of a bias toward pubk'shing significant findings, it is

possible that an overall effect size based only on published data

would overestimate the population effect size for gender differ-

ences in moral orientation (Rosenthal, 1979). To address this

possibility, unpublished studies were included in the meta-

analysis, and publication status was included as a moderator

variable.

Goals

The overall goal of this meta-analysis was to determine whether

the much-heralded claims of Gilligan are supported by empirical

evidence. To this end, we evaluated Gilligan's (1982) assertion

that there are gender differences in the use of care and justice

reasoning. Furthermore, we investigated whether potential gender

differences in moral orientation might be moderated by other

variables.

Method

Sample of Studies

The sample of studies came from two sources: (a) The PsycLIT, ERIC,

and Dissertation Abstracts computerized databases were searched simul-

taneously for the years 1966-1998, using the key words (care reason*) or

(care orient*) or (ethic* of care) or (prosocial moral*) or (moral orient*),

which yielded 741 citations, and (b) the reference sections of those studies

that were drawn from the databases and included in the meta-analysis were

searched for citations that did not appear in the database search. This

strategy yielded an additional 16 citations. It is important to note that the

computerized database search did not use the search term (gender and care

reason*) as this might have led to a selective sampling of studies that found

significant gender differences.

In the case of computerized literature searches, abstracts were inspected

and included if they met the following criteria: (a) The study was empirical,

(b) there were at least five males and five females in each sample, (c) the

study did not report data that had already been reported elsewhere, (d) the

measure assessed moral orientation, and (e) age and gender were not

confounded (e.g., gender differences in moral reasoning between mothers

and sons).

Of the 757 citations elicited by the database and reference section

searches, 180 met inclusion criteria. Although this may seem like a small

proportion of the total number of citations, it is important to underscore that

our search term resulted in a very large number of citations that were

nonempirical commentaries on Carol Gilligan's work or citations regarding

the ethic of care in nursing. Of the PsycLIT and ERIC studies, 25 were not

empirical, 2 included data that were reported elsewhere, 371 did not assess

moral orientation, 10 reported on same-sex samples, and 37 were dupli-

cates (i.e., the same study appeared in both the PsycLFT and ERIC

databases). Of the dissertations, 132 were excluded because they did not

assess moral orientation or because they were nonempirical.

In the second phase of the search, copies of the 114 papers and 66

dissertations were obtained to ensure that they met inclusion criteria and

included enough information to compute an effect size. In those cases in

which an author reported that moral orientation had been measured but

failed to report enough statistical information to compute an effect size, a

letter or e-mail was sent to the author at the address specified for reprints

or at a more recent address found in the American Psychological Associ-

ation 1997 Membership Directory or in the Society for Research in Child

Development 1996 Directory of Members.2 There were 29 cases in which

this was necessary, and we received 23 responses to these requests for

additional data, 11 of which included the requested data. In the 18 cases in

which no data were available (12 cases in which the author responded that

data were not available plus 6 cases in which the author did not respond at

all), effect sizes were estimated as zero for 10 of the studies on the basis

of the fact that the author reported a nonsignificant gender difference in

moral orientation.

Of the 180 studies and dissertations that were deemed eligible on the

basis of the abstracts, 113 yielded enough information to compute an effect

size or met inclusion criteria once a copy of the study itself had been

obtained and examined. This sample comprised 70 published and 43

unpublished studies. Of the 67 studies that were excluded, 7 could not be

located, 2 used same-sex samples, 9 did not assess moral orientation, 1 was

not empirical, 17 were unpublished studies that did not include enough

information to compute an effect size, 8 were published studies for which

an effect size could not be computed or estimated even after contacting the

author, 15 were studies in which the data were reported elsewhere, and in

one case, there was not enough information to compute an effect size and

the author's address could not be located. In addition, 7 dissertations were

unavailable from UMI.

It is possible to obtain several independent effect sizes from a single

article if, for example, data from several age groups are reported (e.g., in

a cross-sectional design). These groups can be regarded as separate sam-

ples (L. V. Hedges, personal communication, 1987). The result was 113

usable sources, yielding 160 independent effect sizes for gender differences

in care orientation and 95 independent effect sizes for gender differences in

justice orientation. In the case of the care orientation analysis, this repre-

sented the testing of 5,783 males and 6,654 females. In the case of the

justice orientation analysis, this represented the testing of 3,831 males

and 4,307 females. This compares favorably with other reviews of gender

differences in moral reasoning, namely, Walker's (1984) critical review of

gender differences in moral stage in which he analyzed gender differences

from 66 studies involving 6,780 participants.

Five studies that used the ECI and measured levels of care reasoning

were analyzed separately because the level at which one reasons about a

given dilemma is conceptually distinct from the amount of care or justice

reasoning used in solving the dilemma.

2 Letters were mailed only to the authors of published studies as it was

too difficult to locate the authors of dissertations.

GENDER DIFFERENCES IN MORAL ORIENTATION 711

Coding the Studies

For each study, the following information was recorded: (a) all statistics

on gender differences in moral orientation, including means and standard

deviations or t, F, or r and (b) the number of male and female participants.

The following potential moderator variables were also coded. It is impor-

tant to note that some of the categories within each moderator variable

were collapsed in the analyses because of insufficient numbers of effect

sizes per category.

Age. The age(s) of the participants were recorded. If the article re-

ported no age but reported that participants were undergraduates or stu-

dents in an introductory college course, age was set equal to 19. If a grade

level was reported, 5 years were added to that level to yield the age (e.g.,

third graders were recorded as 8-year-olds). If the sample comprised

nonadults and the age range exceeded 10 years, the sample age was

recorded as mixed. Otherwise, the midpoint of the age ranges was

recorded.

Socioeconomic status. The SES of the participants was recorded as

lower class, lower middle class, middle class, upper middle class, upper

class, or unreported or mixed, on the basis of how participants were

classified in the studies themselves. SES for university students and faculty

was estimated as middle class.

Moral orientation construct. We recorded whether researchers mea-

sured moral orientation as conceptualized by Gilligan (1982), by Kohlberg

(Colby et al., 1987), or by Eisenberg (Eisenberg, et al., 1983; Eisenberg-

Berg, 1979).3

Type of dilemma. Dilemmas were coded as standard hypothetical

dilemmas if they were hypothetical dilemmas from traditional measures of

moral reasoning (e.g., the Heinz dilemma) or if they were dilemmas that

were likely to be unfamiliar to the participants or not relevant to their

everyday lives. Hypothetical dilemmas that involved issues of direct rele-

vance to participants (e.g., dilemmas about cheating that were administered

to university students) were coded as real-life hypothetical dilemmas.

These were dilemmas that participants might expect to face at some point

hi their lives. Dilemmas that the participants generated from their own lives

were coded as self-generated dilemmas. Finally, measures in which dilem-

mas were not used or participants were asked to rate how much they used

care or justice reasoning to solve moral problems in the absence of any

specific dilemma were coded as no dilemma. Hypothetical dilemmas

(standard or real-life) that were designed to elicit care reasoning were

coded as hypothetical care dilemmas, and hypothetical dilemmas (standard

or real-life) that were designed to elicit justice reasoning were coded as

hypothetical justice dilemmas. For instance, Eisenberg's prosocial moral

reasoning dilemmas (Eisenberg et al., 1983; Eisenberg-Berg, 1979) were

coded as hypothetical care dilemmas because the issue of caring is salient

and is in conflict with self-interest or responsibility to the self. Finally,

self-generated dilemmas were coded as self-generated care or self-

generated justice if they were designed to elicit care or justice reasoning,

respectively.

Coding scheme. We recorded whether moral orientation was scored

according to (a) the coding scheme developed by Brown and colleagues

(Brown et al., 1988), (b) Lyons's coding scheme (Lyons, 1982), (c) a

modified version of Lyons's coding, (d) Kohlberg's coding scheme (Colby

et al., 1987), (e) Eisenberg's coding scheme (Eisenberg et al., 1983;

Eisenberg-Berg, 1979), (f) close-ended objective responses, or (g) any

other coding scheme.

Scale. It was also noted whether these coding schemes yielded con-

tinuous care and/or justice scores or whether participants were categorized

as using a care or justice orientation on the basis of their responses to the

dilemmas.

Gender of the protagonist. The protagonists of the various dilemmas

were male, female, or the same sex as the participant, or the dilemmas

included both male and female protagonists. It was also noted if the

participant was the protagonist of the dilemma (e.g., in self-generated

dilemmas), if there was no protagonist (e.g., cases where participants were

asked to rate the extent to which they used care and justice reasoning to

solve moral dilemmas), or if the gender of the protagonist was not indicated

in the dilemma.

Publication status. It was noted whether studies were published or

unpublished.

Interrater Agreement

All studies were coded by Sara Jaffee. A random sample of 30% of the

studies was rated by Janet Hyde to obtain estimates of interrater agreement.

Interrater agreement was computed for ratings of the eight moderator

variables (age, SES, moral orientation construct, type of dilemma, coding

scheme, scale, gender of protagonist, and publication status) and ranged

from 85% to 100%. Discrepancies were resolved by discussion and

consensus.

Computation of Effect Size

The effect size computed was d, defined as the mean for males minus the

mean for females, divided by the mean within-sex standard deviation

(Hedges & Becker, 1986). Thus, positive values of d indicate that males

used more justice or care reasoning than females, and negative values of d

indicate that females used more justice or care reasoning than males.

Depending on the statistics available for a given study, formulas pro-

vided by Hedges and Becker (1986) and Morris and DeShon (1997) were

used for the computation of d and the homogeneity statistics. The exception

was studies that reported the proportion of males and females who were

classified as care reasoners, justice reasoners, or some combination of the

two (e.g., studies that used Lyons coding or followed the coding scheme of

Brown et al., 1988). In these cases, chi-square statistics were used to test

for the association of gender and moral orientation. However, chi-square

statistics are not easily translated into effect sizes. So that we could include

these studies in the meta-analysis, the moral orientation classifications

were converted into percent care scores. Care reasoners were assigned a

percent care score of 75%, justice reasoners were assigned a percent care

score of 25%, and mixed justice and care reasoners were assigned a percent

care score of 50%. This strategy allowed us to derive an effect size from

the mean percent care scores. These percent care cutoffs roughly match

those used by researchers who have modified Lyons's (1982) coding

scheme (Gilligan & Attanucci, 1988; Krebs et al., 1994). This strategy has

also been used before by Krebs and colleagues (1994) to create a contin-

uous measure of care reasoning. Because the studies were coded according

to whether they yielded a continuous score or a categorical score that

needed to be converted by means of the strategy described above, we were

able to test explicitly whedier the magnitude of the effect size for gender

differences in care orientation differed as a function of our own coding

strategy.

In 39 of the independent samples that measured care reasoning and in 15

of the independent samples that measured justice reasoning, the author did

not include enough information to compute an effect size and did not or

could not respond to written solicitations for additional data. In all these

cases, the authors reported that the gender difference in moral orientation

was nonsignificant. In these cases, the effect size d was estimated as zero.

Consequently, analyses were conducted twice, first with the entire sample

of effect sizes and then with the subsample of effect sizes estimated as zero

excluded.

3 Prosocial moral reasoning is scored at a number of levels. Because

young children generally do not use higher levels of moral reasoning,

gender differences in needs-oriented reasoning were computed for the

prosocial moral reasoning studies of children (11 years or younger). Gen-

der differences in Level 4 reasoning (sympathetic, role-taking, positive or

negative affect regarding the consequences to others) were computed for

the prosocial moral reasoning studies of adolescents and young adults.

712 JAFFEE AND HYDE

A random sample of 30% of the effect sizes was computed indepen-

dently by Sara Jaffee and Janet Hyde. There were discrepancies in 5% of

the d values; these were resolved. All values of d were corrected for bias

in estimation of the population effect size, using the formula provided by

Hedges (1981). The complete listing of all studies, with effect sizes and

moderator variable codes, is provided in Table 1.

Results

The results are divided into three sections. The first reports the

results for a meta-analysis of gender differences in the care orien-

tation. The second reports the results for a meta-analysis of gender

differences in the justice orientation. The third reports the results

for a meta-analysis of gender differences in level of care reasoning

as measured by the ECI (Skoe & Marcia, 1991). Whereas the first

two meta-analyses assess how much males and females differ in

the extent to which they used or endorsed the care and justice

orientations respectively, the third analysis examines whether gen-

der differences exist in the level of care reasoning attained by male

and female participants.

Magnitude of Gender Differences in Care Orientation

For 73% of the 160 independent samples, the gender difference

in care reasoning as reported in the study was not statistically

significant. A power analysis revealed that a sample size of 31

would be required to detect a moderately sized effect of .50 for

gender differences in care reasoning with a = .05. Seventy-three

percent of the independent samples included samples sizes of that

magnitude. Thus, it is unlikely that the nonsignificant gender

differences in care reasoning were the result of low power to detect

differences.

The overall effect size d for gender differences in care orienta-

tion was -.28, indicating a small gender difference favoring fe-

males. A homogeneity analysis using procedures specified by

Hedges and Becker (1986) indicated that the effect sizes were

nonhomogeneous, #T = 438.43, p < .001. Therefore, analyses

were conducted to determine the extent to which the magnitude of

the effect size was moderated by other variables. The results of the

moderator analyses are presented in Table 2.

Age. For purposes of the moderator analysis, participants were

categorized into five age groups. Participants who were 11 years

old or less were classified as children, and participants who were

older than 11 years and less than or equal to 19 years (but were not

university students) were classified as adolescents. University stu-

dents were coded as such. Participants who were aged 20 to 49 (but

were not university students) were classified as younger adults,

and participants who were 50 years or older were classified as

older adults. In cases where the sample included nonadults and the

reported age range exceeded 10 years, participants were classified

as mixed age.

For all age groups, the effect size indicated that females used

more care reasoning than males. However, the magnitude of the

effect size differed among age groups, HK = 70.91, p < .001. For

children and university students, the overall effect sizes indicated

a small gender difference (d = -.08 and d = -.18, respectively), but

for adolescents, the effect was moderate in size (d = -.53). For

younger adults, the overall effect size d was -.33. The small

number of samples within the older adult and mixed-age groups

suggests that these effect sizes should be interpreted with caution.

Socioemnomic status. As there were relatively few effect

sizes at certain levels of the SES variable, SES was collapsed into

four categories: lower class (comprising lower class and lower

middle class), middle class (comprising middle class and upper

middle class), upper class, and mixed/unreported. The homogene-

ity analysis revealed that the magnitude of the effect sizes differed

significantly as a function of SES category, HB = 9.05, p < .05.

Although females consistently scored higher than males on care

reasoning across the SES groups, the magnitude of the effect size

increased from -.08 to -.42 as SES increased from the lower-class

group to the upper-class group.

Moral orientation construct. The homogeneity analysis re-

vealed that the magnitude of the effect sizes differed significantly

as a function of the moral orientation construct, HB = 17.29, p <

.001. The magnitude of the effect size d was small to moderate in

size for studies that defined the care orientation according to

Gilligan's description or defined it as the orientation or aspect

level in Kohlberg's scheme (-.32 and -.25, respectively). However,

for studies that conceptualized care reasoning as prosocial moral

reasoning, the magnitude of the effect size was close to zero.

Type af dilemma. As there were relatively few of certain types

of dilemmas, these types were collapsed into six categories: (a)

standard hypothetical dilemmas, (b) real-life hypothetical dilem-

mas, (c) self-generated dilemmas, (d) moral reasoning measures

that did not contain a dilemma, (e) dilemmas meant to elicit care

reasoning (both hypothetical and self-generated), and (f) dilemmas

meant to elicit justice reasoning (both hypothetical and self-

generated). The homogeneity analysis revealed that the magnitude

of the effect sizes differed significantly as a function of type of

dilemma, HB = 63.07, p < .001.

For both standard and real-life hypothetical dilemmas, the mag-

nitude of the effect sizes was small and indicated that females used

more care reasoning than males (-.19 and -.20, respectively).

Similarly, the effect sizes for dilemmas designed to elicit care

reasoning and dilemmas designed to elicit justice reasoning were

-.17 and -.18, respectively. The magnitude of the effect size for

self-generated dilemmas was moderate in size (d = -.37), and

measures that did not include a dilemma yielded the largest dif-

ference favoring females (d = -.57).

Coding scheme. The homogeneity analysis revealed that the

magnitude of the effect sizes differed significantly as a function of

coding scheme, HB = 61.49, p < .01. The effect size was close to

zero for studies that used Eisenberg's (Eisenberg et al., 1983;

Eisenberg-Berg, 1979) coding scheme, and the largest effect size

(d = -.61) was noted for studies that used a modified version of the

Lyons coding scheme (Lyons, 1982). Other measures produced

effect sizes that were small to moderate in magnitude and favored

females.

Scale. The homogeneity analysis revealed that the magnitude

of the effect sizes differed significantly as a function of scale,

HB = 7.46, p < .01. For the studies that used continuously scaled

measures, the magnitude of the effect size d was -.26. For studies

that used categorical measures, the magnitude of the effect d was

-.38, indicating a somewhat larger difference favoring females.

Gender of the protagonist. Because there were only two stud-

ies in which the protagonists were all female and one study in

which the gender of the protagonist was not specified, these were

not included in the moderator analysis. Homogeneity analyses

revealed that the magnitude of the effect sizes differed signifi-

GENDER DIFFERENCES IN MORAL ORIENTATION 713

cantly as a function of the protagonist's gender, HB = 84.88, p <

.001. When the protagonist was male or when the participant was

the protagonist, the effect size was small to moderate in magnitude

and favored females (d = -.36 and d = -.27, respectively). When

there were both male and female protagonists, however, the gender

difference in care reasoning was virtually nonexistent (d = -.03).

Similarly, when the protagonist was the same gender as the par-

ticipant, the magnitude of the effect size indicated a small differ-

ence (d = -.11). However, in cases where there was no protagonist,

the magnitude of the effect size d was -.57.

Publication status. The magnitude of the effect size for gender

differences in care orientation did not differ significantly as a

function of whether or not the study was published, HB = .13, m.

Regression Analysis for Care Orientation

Because the overall homogeneity of variance analysis indicated

that the effect sizes were nonhomogeneous, a multiple regression

analysis was conducted to determine how much of the variance in

effect sizes was accounted for by the moderator variables. More-

over, because many of the moderator variables are highly corre-

lated, it is important to control for the effect of other moderators

when determining the extent to which any given one accounts for

variance in the magnitude of the effect sizes. The corrected effect

size for gender differences in care reasoning was the criterion

variable (Hedges, 1981). The eight moderator variables (age, SES,

moral orientation construct, type of dilemma, coding scheme,

scale, gender of protagonist, and publication status) were entered

in stepwise fashion. Because of their categorical nature, all vari-

ables except age and SES were contrast coded, and the contrasts

are specified in Table 3.

Overall, the moderator variables accounted for a significant 16%

of the variance in the effect sizes for gender differences in care

reasoning, F = 7.25, p < .001. The effect of neither age nor SES

was significant. Two contrasts were specified to determine

whether moral orientation construct was a significant predictor of

the magnitude of the effect sizes for gender differences in care

reasoning. The first contrast compared studies that used Eisen-

berg's construct of prosocial moral reasoning with those that

characterized care reasoning according to Gilligan's or Kohlberg's

conceptualization of the care construct. Because prosocial dilem-

mas are designed to elicit care reasoning, it was expected that the

magnitude of the effect size for gender differences in care reason-

ing would be smaller for this group compared with the other two.

However, the regression analysis revealed that this contrast was

not a significant predictor of the magnitude of the effect size for

gender differences in care reasoning. A second contrast was con-

structed to compare studies that defined care reasoning according

to Gilligan with those that defined care reasoning at the element or

aspect level in Kohlberg's scheme. It was predicted that the mag-

nitude of the effect size for gender differences in care reasoning

would be greater for the former group because, according to

Gilligan (1982), Kohlberg's scoring scheme does not adequately

capture the care orientation and encourages use of the justice

orientation instead. The regression analysis revealed that this con-

trast was not a significant predictor of the magnitude of the effect size for gender differences hi care reasoning.

Two contrasts were specified to determine whether type of

dilemma was a significant predictor of the magnitude of the effect

sizes for gender differences in care reasoning. The first contrast

compared self-generated dilemmas with those that effectively stan-

dardized the dilemma content for participants (i.e., standard hypo-

thetical dilemmas, real-life hypothetical dilemmas, and dilemmas

designed specifically to elicit care or justice reasoning). This

contrast tested the hypothesis that the magnitude of the effect size

for gender differences in care reasoning would be greater when

participants were allowed to generate their own dilemmas. This

contrast was a significant predictor of the magnitude of the effect

sizes for gender differences in care reasoning, |3 = -.21, p < .05.

A second contrast compared studies that did not include a dilemma

with those that did to explore whether the presence of a dilemma

accounted for variation in the magnitude of the effect size. This

contrast was also significant, fi = -.29, p < .001.

Two contrasts were specified to determine whether coding

scheme was a significant predictor of the magnitude of the effect

sizes for gender differences in care reasoning. The first contrast

compared measures that are scored objectively with those that are

not and hypothesized that the magnitude of the effect size would

be smaller for objectively scored studies. A second contrast com-

pared coding schemes in which responses are matched to criterion

responses (see, e.g., Colby et al., 1987; Eisenberg et al., 1983;

Lyons, 1983) with those in which interpretive techniques are used

(see, e.g., Brown et al., 1988; modified Lyons, 1983). Neither

contrast was significant.

The contrast comparing studies that yielded a continuous moral

orientation score with studies that yielded a categorical moral

orientation score was significant, |3 = .18, p < .05.

Two contrasts were specified to test the effect of gender of the

protagonist on the magnitude of the effect sizes for gender differ-

ences in care reasoning. The first contrast compared studies in

which the participant served as the protagonist in the dilemma with

those in which there was a fictional protagonist. This contrast

tested whether a difference in the perspective from which the

participant responded to the dilemma influenced the magnitude of

the effect size. A second contrast was specified to compare studies

that used dilemmas with male protagonists with studies that used

dilemmas in which the protagonists were not exclusively male or

were the same gender as the respondent. Neither contrast was a

significant predictor of the magnitude of the effect size for gender

differences in care reasoning.

Finally, the regression analysis revealed that publication status

did not have a significant effect on the magnitude of the effect size

for gender differences in care reasoning.

Because the moderator variables were highly correlated, squared

semipartial regression coefficients were computed to determine

how much of the variance in the magnitude of the effect sizes was

uniquely accounted for by each of the moderator variables when

the other moderators were controlled. Table 3 indicates that the

contrast comparing self-generated dilemmas with standard hypo-

thetical dilemmas, real-life hypothetical dilemmas, or dilemmas

designed to elicit care or justice reasoning accounted for 4% of the

variance in the effect sizes for care reasoning. The contrast com-

paring studies in which there was no dilemma with those in which there was a dilemma uniquely accounted for 8% of the variance in

the effect sizes for care reasoning. Finally, the contrast comparing

studies that yielded categorical versus continuous outcomes

uniquely accounted for 3% of the variance in the effect sizes for care reasoning.

714

Table 1

Effect Size Estimates and Moderator Variable Codes

JAEFEE AND HYDE

Study

Abaris (1990)

Abide (1994)

Akman (1991)

Albrecht (1989)

Arvizu (1995)

Atunzu (1986)

Barnett, Quackenbush, & Sinisi (1995)

Bamett, Quackenbush, & Sinisi (1995)

Beal, Garrod. Ruben, & Stewart (1997)

Beal, Garrod, Ruben, & Stewart (1997)

Bollerud (1987)

Brown, Tappan, Gilligan, Miller, & Argyris

(1989)

Carlo, Eisenberg, & Knight (1992)

Carlo, Roller, & Eisenberg (1998)

Carlo, Roller, Eisenberg, DaSilva, & Frolich

(19%)

Carney (1992)

Cassidy, Chu, & Dahlsgaard (1997)

Castor-Scheufler (1994)

Clopton & Sorell (1993)

Cole (1987)

Conley, Jadack, & Hyde (1997)

Craft (1992)

Crown & Heatheringlon (1989)

Curror (1994)"

Dekovic & Gerris (1994)

Derry (1989)

de Vries & Walker (1986)

Dezoll (1992)

Dezolt(1992)

Dezolt (1992)

Diamonti (1993)

Dickey, Rroll, & Jenkins (1987)

Diederichs (1993)

Dohrenwend (1995)

Donenberg & Hoffman (1988)

Dossetl (1989)

Dossett(1989)

Dossetl (1989)