the grade only

visvim

Armann-MFF-2012.pdf

Home >Psychology homework help >the grade only

Vision Research 63 (2012) 69–80

Contents lists available at SciVerse ScienceDirect

Vision Research

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / v i s r e s

Male and female faces are only perceived categorically when linked to familiar identities – And when in doubt, he is a male

Regine Armann ⇑, Isabelle Bülthoff Max Planck Institute for Biological Cybernetics, Tübingen, Germany

a r t i c l e i n f o

Article history: Received 21 July 2011 Received in revised form 21 February 2012 Available online 14 May 2012

Keywords: Face perception Categorical perception Sex (gender) Identity

0042-6989/$ - see front matter � 2012 Elsevier Ltd. A http://dx.doi.org/10.1016/j.visres.2012.05.005

⇑ Corresponding author. Address: Max Planck Institu Spemannstrasse 38, 72076 Tübingen, Germany. Fax:

E-mail address: [email protected]

a b s t r a c t

Categorical perception (CP) is a fundamental cognitive process that enables us to sort similar objects in the world into meaningful categories with clear boundaries between them. CP has been found for high- level stimuli like human faces, more precisely, for the perception of face identity, expression and ethnic- ity. For sex however, which represents another important and biologically relevant dimension of human faces, results have been equivocal so far. Here, we reinvestigate CP for sex using newly created face stim- uli to control two factors that to our opinion might have influenced the results in earlier studies. Our new stimuli are (a) derived from single face identities, so that changes of sex are not confounded with changes of identity information, and (b) ‘‘normalized’’ in their degree of maleness and femaleness, to counteract natural variations of perceived masculinity and femininity of faces that might obstruct evidence of cat- egorical perception. Despite careful normalization, we did not find evidence of CP for sex using classical test procedures, unless participants were specifically familiarized with the face identities before testing. These results support the single-route hypothesis, stating that sex and identity information in faces are not processed in parallel, in contrast to what was suggested in the classical Bruce and Young model of face perception.

Besides, interestingly, our participants show a consistent bias, before and after perceptual normaliza- tion of the male–female range of the test morph continua, to judge faces as male rather than female.

1. Introduction

When we look at the world around us, we do not see gradual transitions between elements, be they different wavelengths of light, or different face expressions. Instead, the visual system carves our environment into separate, meaningful categories, like red or yellow colors and sad or smiling faces, via the cognitive pro- cess called categorical perception (CP). This process is fundamental to complex behavior, since it spares us from having to learn anew each time we encounter unknown objects or individuals and thus helps to reduce the overwhelming number of entities in the world to more manageable proportions (e.g., Harnad, 1987, chap. 1; Rosch et al., 1976).

For the specific case of face perception, CP has been found using continua of images (morphs) created by morphing between realis- tic human faces of different (familiar) identities (Beale & Keil, 1995), expressions (Calder et al., 1996), and races (Levin & Ange- lone, 2002). However, on the question whether the facial dimen- sion ‘‘sex’’ is also perceived naturally as one of two different categories, i.e. male and female faces, conflicting psychophysical

ll rights reserved.

te for Biological Cybernetics, +49 7071 601 616. (R. Armann).

results have been reported so far (Bülthoff & Newell, 2004; Campanella, Chrysochoos, & Bruyer, 2001).

Campanella and colleagues showed CP for sex (Campanella, Chrysochoos, & Bruyer, 2001) using an image-morphing procedure to generate continua of face stimuli in which sex information was varied linearly between male and female faces. Additionally, how- ever, their face stimuli were morphs between different (opposite- sex) identities. Furthermore, only few face pairs were used, and the same stimuli appeared many times per task, so that partici- pants were being familiarized with the faces in the course of the experiment. The CP effect could thus result from categorical per- ception of the familiar test face identities (as in, e.g., Beale & Keil, 1995) rather than from CP for male and female faces.

Bülthoff and Newell likewise investigated if male and female faces are discrete categories at the perceptual level, and whether familiarization plays a role in the categorical perception of sex (Bülthoff & Newell, 2004). They used a morphing algorithm to cre- ate artificial sex continua not only between male and female faces, but also based on single face identities that are created by changing only the sex of a face while keeping its identity constant. When using these sex continua and while increasing the number of origi- nal face identities (from 6 to 12) to reduce a potential familiariza- tion effect, the authors could not find CP for sex. The effect only appeared when participants were either familiarized with the

http://dx.doi.org/10.1016/j.visres.2012.05.005

mailto:[email protected]

http://dx.doi.org/10.1016/j.visres.2012.05.005

http://www.sciencedirect.com/science/journal/00426989

http://www.elsevier.com/locate/visres

70 R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80

endpoint (i.e., most male and most female) faces of the morph con- tinua or trained to classify all faces of the continua as male or fe- male using a feedback procedure.

So the question whether or not there is CP for sex as a dimen- sion of human faces remains open. As suggested by Leopold and colleagues (Leopold, Bondar, & Giese, 2006), both the time to learn as well as the storage capacity in the brain for faces can be spared by applying common transformations (changes in e.g. scale, viewing angle, expression) not to each face identity, but in- stead to the ‘‘template’’, or reference, to which incoming face stimuli are compared. In the same vein, one can assume that the brain compares newly encountered faces to a male and fe- male face reference, if ‘‘male’’ and ‘‘female’’ are discrete categories at the perceptual level. Classifying the faces of unknown individ- uals by their sex seems to be a prerequisite for social behavior and communication. Moreover, since it has been shown that dif- ferent facial expressions and races are perceived as discrete cate- gories (Calder et al., 1996; Levin & Angelone, 2002), it seems surprising that there should be no CP for sex.

Is it possible, however, that these other CP effects are also a result of confusing race/expression manipulations with identity changes and of familiarizing participants with the test face iden- tities, as Bülthoff and Newell (2004) suspected when revisiting CP for sex while controlling for these potential confounds? Calder and colleagues (Calder et al., 1996) used continua between differ- ent expressions performed by the same person, thus manipulat- ing only expression-relevant information in the faces. The authors did not use enough face identities to rule out the possi- bility that being familiar with the test identities might lead to or enhance a categorical effect. They took care, however, in ruling out that the effect depends on knowing the endpoints of the test continua, by also testing along continua including three different expressions performed by the same person. Furthermore, the authors discuss some differences between their discrimination data and an earlier study on CP for expressions by Etcoff and Ma- gee (1992), where line drawings instead of photo-realistic faces were used. Calder and colleagues argue that these line drawings contain just sufficient information to identify expressions, but lack most of the additional more idiosyncratic cues that the photo-realistic faces provide. The categorical effect was neverthe- less found in the original study, making it unlikely that being familiar with the test face identities was what triggered it in the study by Calder and colleagues (Calder et al., 1996). As to CP for faces of different races, Levin and Angelone (2002) mor- phed between individual faces, manipulating race information with identity information, which makes it difficult to attribute a CP effect to a change in one or the other. However, the authors prevented participants from memorizing individual (unfamiliar) face identities in the course of the experiment, by mixing con- tinua within testing blocks. Note that they tested discrimination and classification of individual faces, not race categories directly. However, having precluded participants from showing CP for familiar face identities, they nevertheless find categorical percep- tion, and primarily on cross-race continua, suggesting that it re- sults from previously defined race categories.

However, unlike these categorical effects for expressions and race, CP for sex, as reported in Campanella, Chrysochoos, & Bruyer, 2001; completely disappeared when identity and sex information were manipulated independently and when familiarization with the test face identities was not provided (Bülthoff & Newell, 2004). The Bülthoff and Newel study rather suggests that process- ing of the sex of a face is directly linked to processing of the face’s identity (as proposed before by Ganel and Goshen-Gottstein (2002) and Rossion (2002)). Yet, it seems counterintuitive to not have dis- crete perceptual categories for male and female faces, given the biological and social relevance of this face characteristic.

Therefore, here, we revisit CP for sex using new face stimuli to deal with another potential confound that, to our opinion, might have influenced the results of earlier studies – and one that has never been raised or controlled for in earlier studies on categorical perception of faces. The face stimuli in former studies were gener- ated from 2D images (Campanella, Chrysochoos, & Bruyer, 2001) or 3D head scans of original face identities (Bülthoff & Newell, 2004). Face continua were either generated by morphing one face identity with another identity of the other sex (Bülthoff & Newell, 2004; Campanella, Chrysochoos, & Bruyer, 2001), or by manipulating the sex of a male or female face, while keeping its identity constant (Bülthoff & Newell, 2004). This was done using 3D laser scans of real heads and the ‘‘Morphable Model’’ of Blanz and Vetter (Blanz, 2000; Blanz & Vetter, 1999). Since each face in this database is rep- resented as a high-dimensional vector in correspondence to a ref- erence (the average) head, we can first calculate an average male and an average female face of the whole face population, then cal- culate the difference between these two, the so-called ‘‘sex vector’’, and apply this vector onto each individual face. With both proce- dures, depending on how strongly male or female the original faces look, the continua derived from them vary in the range of ‘‘male- ness’’ and ‘‘femaleness’’ they cover. Hence morph levels, as they are calculated relative to the original face of each continuum, are not comparable across continua. Even if there is a category bound- ary between male and female at the perceptual level, its position between the extremes would vary for each individual face morph continuum. Averaging performance in CP tasks over continua based on faces with different levels of perceived masculinity and femininity might thus cancel out any evidence for CP.

To avoid the problem of having potentially different locations of the sex boundary for each continuum, we equated the level of male- ness and femaleness of all face identities by modifying the original faces before creating test continua. By using ‘‘normalized’’ endpoint faces, all continua should cover a similar range of maleness and femaleness and the category boundary between male and female should then be located at the same place along all face continua, with similar steps in between. We performed extensive rating experiments (as specified in the methods section) to carefully cre- ate and choose these ‘‘controlled’’ male and female endpoint faces. By doing this, variations of femininity and masculinity of the end- points of each continuum and – as a consequence – variation of the location of the category boundary was kept to a minimum. An alternative to equating femininity and masculinity of the endpoint faces before creating continua would be to adjust the continua after the experiment, according to the category boundary that partici- pants’ performance reveals. By pre-equating, however, we make sure that (1) the morphing steps along the continua are of equiva- lent size, and that (2) for each morph level, the same number of data points is collected and entered into the analysis.

Once the ‘‘blurriness’’ of the location of the category boundary was reduced to a minimum, our goal was to test if CP for sex does occur naturally, without reference to identity-related facial infor- mation. To this end, we followed the classical procedure to define categorical perception, as described for example in Beale and Keil (1995), Etcoff and Magee (1992), and Bülthoff and Newell (2004). In brief, a classification task was used to locate the potential cate- gory boundary between male and female faces. A discrimination task using pairs of stimuli from different positions along the stim- ulus continua was used to test if faces from one side of the bound- ary were indeed perceived as more similar to each other than to faces on the other side of the boundary, as expected for CP.

Classification and discrimination tasks were performed in four sub-experiments, only differing in the familiarization procedure that participants went through before the actual testing phase. In a ‘‘naïve’’ experiment, participants were tested on CP for sex without previous exposure to any faces. By this, we tested whether

R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80 71

evidence for naturally occurring CP for sex was obscured by the methods used in earlier studies as explained above. Three ‘‘non- naïve’’ CP experiments involved three different types of familiar- ization prior testing. The aim of these manipulations of classical CP experiments, explained in the following, was to shed more light on the question of a potential link between the processing of a face’s sex and its identity.

According to the classical functional architecture of face pro- cessing proposed by Bruce and Young (1986), sex processing in faces is a parallel function to individual face recognition, and as a consequence, sex categorization of faces is not influenced by face familiarity. If this is true, the presence or absence of a categorical effect for the sex of faces should not depend on familiarization with the respective face identities. Levin and Beale showed in their study about CP for newly learned face identities (Levin & Beale, 2000) that short familiarization with previously unknown faces is sufficient to result in CP for continua of those identities while no CP is found for strictly unfamiliar faces. In the same logic, CP for sex should occur after familiarization with only sex-specific facial information, if the prototypical appearance of male and female faces is considered to be a priori ‘‘unknown’’ and the two categories have to be learned. If sex categories are, however, ‘‘linked’’ to indi- vidual face identities, CP for sex should only occur for familiar face identities. In the three non-naïve experiments, we thus tested what specific sex and identity information from faces is necessary for CP for sex to occur.

In the first non-naive CP experiment, participants were famil- iarized with the average male and female faces. By comparing the performance of naïve participants with the results of this experiment we wanted to find out whether information about the typical appearance of male and female faces, that is distilled by morphing a high number of faces of each sex together, while idi- osyncratic information specific to one or another face identity is re- moved, transfers in some way to unfamiliar faces. As mentioned before, when tested on morph continua between individual famil- iar face identities, participants show categorical perception (e.g., Levin & Beale, 2000). Analogous to these findings, familiarization with the male and female endpoints of a sex morph continuum should lead to CP, if sex and identity are two independent dimen- sions of human faces.

The male and female average faces are created by morphing to- gether all individual face identities in our database (i.e., around 200). Since male and female faces naturally vary in how strongly male and female they look, the averages consist of information de- rived from faces of different levels of perceived masculinity and femininity. They might thus not represent the same ‘‘symmetric endpoints’’ for sex continua as the ones we obtained by the rating experiments (described in the methods section) – although this would be surprising, considering that each average was created by morphing nearly 100 faces. If we nevertheless consider the pos- sibility that, for example, the average male face looks less strongly male than the endpoints of our continua, we might expect a poten- tial category boundary between the two averages to be shifted away from the symmetrical boundary between normalized male and female identities (note however that this shift would be a result of the familiarization procedure with the averages and should thus be the same on all symmetrical test morph continua).

In the second non-naïve CP experiment, another familiarization procedure was introduced to overcome this discrepancy in the male–female range between familiarization and test stimuli, and to further examine the interplay of sex- and identity-related infor- mation in sex perception. Here, familiarization with sex informa- tion was done using male and female face identities which had the same perceived degree of maleness and femaleness than the test faces, but were not used in the following CP tests. These faces provided participants with information about the appearance of

male and female faces that also specified an individual identity. With this type of familiarization we could test whether sex-related information in faces can induce CP for sex if this information is linked to idiosyncratic facial information but has to transfer to un- known face identities.

Finally, in the third non-naive CP experiment, we tested whether prior knowledge about someone’s identity (and sex) has an effect on the perception of the sex of face images of that person. Here, participants learned the actual endpoint faces of the sex con- tinua that were used to test CP subsequently. This experiment served as a control, since it has been shown before that familiariza- tion with the endpoint identities of a sex continuum leads to CP for sex.

To summarize, the aim of this study is to clarify two old but still open questions in face perception, i.e., (i) whether male and female faces are perceived categorically, and (ii) how sex and identity information interact in face recognition. Classical classification and discrimination tasks are used in four sub-experiments, where participants are either naïve, or familiarized with male and female average faces, additional male and female face identities, or the test identities themselves before the actual CP testing phase.

2. Methods

Three rating experiments were performed, successively, to se- lect appropriate face stimuli for the main experiment. Design and procedure of these ratings are presented first in the order they were performed, as each was based on the results of the previous one. Methods for the main CP experiments are described subsequently.

2.1. Creating normalized face stimuli

We used 3-dimensional laser scans of real heads from the data- base of the Max Planck Institute for Biological Cybernetics (http:// faces.kyb.tuebingen.mpg.de) and the Morphable Model of Blanz and Vetter to create all face stimuli (for more details see Blanz, 2000; Blanz & Vetter, 1999; Vetter & Poggio, 1997). The general method to create sex continua based on one single identity is to calculate first a ‘‘sex vector’’ from the whole face database, i.e., the difference between male average and female average is calcu- lated. Using this sex vector, an opposite-sex version of each origi- nal face can be generated. Between these endpoints (original face and opposite-sex version), morphs are created at regular intervals. Computationally, original female faces are 100% female and origi- nal male faces are 0% female. We will use this notation in percent- age of femaleness (i.e., percentage of contribution of the female face identity to the morph) to describe all morphs used in this study, with 100% denoting a face derived from a scan of a female head (or a male identity feminized by 100% of the sex vector, see below). Fig. 1 (upper row) shows an example of such a one-identity morph continuum from female to male.

To choose endpoint faces equated in their perceived level of male- and femaleness for generating the sex morph continua, we conducted three consecutive rating experiments described in the following.

2.1.1. Rating 1: rating original male and female faces Images of 95 male and 95 female original faces (i.e., derived

from laser scans without sex manipulations) collected in the MPI face database were rated for masculinity or femininity. The hair of these faces is cropped (at the hairline) and the faces are devoid of make-up, glasses or facial hair. The faces were presented turned to the right by 20�, in a 24-bit color format and on a grey back- ground. Images subtended approximately 8 by 6� of visual angle,

http://faces.kyb.tuebingen.mpg.de

Fig. 1. Sex Morph Continua. Upper row: Morph continuum created by applying the sex vector on one single (here: female) face identity. Numbers indicate the percentage of contribution of the female face to create the morph. Lower row: Extended morph continuum, created by morphing towards female by another 40% of the sex vector. For more details, see text.

72 R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80

and the average viewing distance was 57 cm. The experiments were presented on a Windows PC using the Matlab PsychToolbox (� 1984–2007 The MathWorks, Inc., Version 7.4.0) on a color monitor.

In each trial, one face was shown as long as needed for the par- ticipants to answer, there was no time pressure. Participants used the arrow keys on a keyboard to move a slider on the screen, to rate the faces on a 7-step scale, ranging from ‘‘very female’’ (1) to ‘‘very male’’ (7). They were told to answer with ‘‘ambiguous’’ (slider step 4) in case they could not decide if a face was that of a woman or a man. The next trial was initiated when the participant entered a response.

Eighteen paid volunteers recruited via the MPI Subject Database performed this rating experiment. Ratings (see Fig. 2a) for female faces were distributed over the whole scaling range, while male faces were very rarely rated as female; their ratings were mainly restricted to the male end of the rating scale. This male bias does not seem to be a phenomenon specific to our database, as other collections of face stimuli have been shown to evoke the same

Fig. 2. Rating experiments. Participants rated the maleness and femaleness of 190 faces f (b) Rating of original male and feminized female faces (140% super females; for details se to the rating scale from ‘‘very female’’ (1) to ‘‘very male’’ (7). ‘‘Unambiguous’’ levels 2 ‘‘ambiguous’’. For more details see text.

perception bias, even when the database consisted of silhouettes of faces in profile (e.g., Davidenko, 2007). The bias is suggested to come from the lack of additional information one is used to see in everyday life, like hair or makeup, in this kind of stimuli. While most male faces were rated as ‘‘normally male’’ or even ‘‘very male’’ (6 or 7 on the rating scale), only few female faces fell within the equivalent range (‘‘normally female’’, ‘‘very female’’) at the fe- male end of the scale.

2.1.2. Rating 2: rating feminized female faces and original male faces To obtain a sufficient number of female faces with femaleness

ratings comparable to the ratings for the male faces, we feminized all 95 original female faces from our database used in Rating 1. Male faces were not modified. Each female face identity was morphed along the sex vector, away from the male end of the continuum. After some informal pilot testing, we decided to morph the female faces 40% away from their original endpoint, as we seemed to ob- tain ratings comparable to the ratings for male faces in Rating 1.

rom our face database on a 7-step scale. (a) Rating of original male and female faces. e text). (c) Rating of the sex opposite faces of (b). Numbers 1–7 on y-axis correspond and 6 are highlighted by dashed lines. Rating level 4 (dotted line) corresponds to

R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80 73

By doing that, we generated computationally ‘‘super female’’ faces that are 140% female, compared to the 100% original female faces.

A group of 18 new participants were asked to rate those super female faces mixed with the unmodified male faces in the same way as described for Rating 1. As is visible in Fig. 2b, now the rat- ings for the female faces are more densely grouped and shifted to- wards the female end of the scale, while the ratings for male faces show the same pattern as in Rating 1.

2.1.3. Rating 3: rating opposite-sex versions For the CP experiments, we wanted symmetrical face continua.

This means that every endpoint identity used to create a single- identity sex morph continuum has to be perceived as ‘‘normally male’’ or ‘‘very male’’ on one end of the continuum and ‘‘normally female’’ or ‘‘very female’’ on the other end. Therefore, we also ac- quired ratings for the maleness or femaleness of the other-sex ver- sion of every face identity (so far, ratings were only obtained for faces with their original sex, see Rating 2). Here, the male versions (0% female) of the original female faces (140% female) were rated, and the female versions (140%) of the original male faces.

Twenty participants rated these male (i.e., modified females) and female (i.e., modified males) faces as described in Rating 1. Re- sults are shown in Fig. 2c. The pattern is the same as in Rating 1: Ratings for male faces are mainly restricted to the male end of the scale, while female faces are often classified as ‘‘ambiguous’’ or even male.

2.1.4. Choosing ‘‘symmetrical’’ stimuli The rating scale consists of 3 ‘‘zones’’: female (1–3), ambiguous

(4) and male (5–7) ratings. The middle ratings 2 for female and 6 for male faces were defined as ‘‘unambiguous’’ values of maleness and femaleness. The mean ratings for each face (for the original male or the feminized female face and their other-sex versions, respectively) were thus compared to the values 2 or 6 using a 2- sided t-test. All faces with mean ratings significantly different from these values were excluded. Among the remaining faces, there were identities that had both their female and male version within range of the unambiguous values for both sexes. We chose 20 of those faces (10 male and 10 female original identities) to create symmetrical sex continua with endpoint faces within the desired range of maleness and femaleness for the main experiment.

2.1.5. Sex-specific sex perception Since differences between male and female observers have been

reported in the face perception literature, we compared the rating results of male and female participants, but found no significant ef- fect of observers’ sex in any of the ratings.

2.2. Main experiment: CP for the sex of faces

In the following, we first describe methods and procedures common to all four CP experiments; particular details of each pro- cedure are then stated in separate sections.

2.2.1. Stimuli Male and female versions of ten male and ten female face iden-

tities selected through the rating procedures described above were used as endpoint faces to generate one-identity sex morph con- tinua. All endpoint faces had been rated as ‘‘unambiguously’’ male and female. Between these endpoints, thirteen equally spaced morph faces were created (i.e., including the original female face at 100%). Of each face identity, there were thus fifteen stimuli, ranging from male (0%) to female (140%), and our continua covered a range of 140% morph distance altogether (instead of 100% as in classical sex continua). See Fig. 1 (lower row) for an example of such an ‘‘extended’’ sex morph continuum.

2.2.2. Design (a) discrimination task Previous studies in our lab with one-identity sex continua have

revealed that participants are performing at around chance level in a classical XAB match-to-sample paradigm when the two face images are 30% apart in morph distance (Bülthoff & Newell, 2004). Increasing the morph distance to make the task easier de- creases the number of possible face pair combinations within one continuum, which makes it harder to measure a peak in perfor- mance and thus show evidence for CP. However, extending the con- tinua by four morph images here enabled us to increase the morph distance in a pair to 40% without reducing the number of combina- tions within one continuum.

Moreover, two other modifications of the classical CP para- digm were made: (1) Instead of the usual ABX or XAB task we used a simultaneous same-different task, to avoid the memory load that the match-to-sample paradigm requires (see Calder et al., 1996). By doing this, we made sure that our discrimination data does not reflect short-term memory capacity, but a percep- tual phenomenon. (2) Earlier studies in our lab have shown that a same-different task with faces manipulated along the sex dimension only is somehow unintuitive to participants and might induce image-matching rather than a sex judgment (Armann & Bülthoff, 2009). Therefore, here, we used a more accessible and, as we found earlier, easier task: Our participants had to answer, for every stimulus pair, whether one of the two faces was more feminine or masculine than the other, or whether they were ex- actly identical.

Newly learned faces or familiarization with face stimuli in the course of an experiment lead to categorical perception (e.g., Vivi- ani, Binda, & Borsato, 2007). We thus tried to keep exposure to each of the identities used to create our sex continua to a mini- mum. Of each one-identity continuum, every participant saw only some of all possible pair combinations. Given that there are fif- teen face stimuli in each continuum, and pairs were shown at a morph distance of 40%, there were eleven possible ‘‘different pairs’’ for each identity (0–40, 10–50, 20–60, 30–70, 40–80, 50– 90, 60–100, 70–110, 80–120, 90–130, 100–140). Five of these pairs were pseudo-randomly chosen for each identity as ‘‘differ- ent pairs’’, and five morph images of the remaining pairs were chosen for ‘‘same pairs’’, counterbalanced across participants. Every participant was shown ten ‘‘different pairs’’ at every possi- ble morph level combination along the continuum (while identi- ties varied across morph levels and participants), and only these were entered into the analysis. Together with the ‘‘same pairs’’, there were 210 trials in total (110 ‘‘different trials’’).

A trial consisted of two face images from the same continuum shown next to each other on the screen, separated by approxi- mately 8 cm. Participants were asked to decide for each face pair whether one of the faces was more male or female than the other or if they were exactly identical, as fast and as accurately as possi- ble. Participants pressed one button on the keyboard to answer that there was a difference between the two faces (no matter which one of the faces was more male/female), and another one to answer that they were exactly the same. Response buttons were counterbalanced across participants.

Each trial started with a 500 ms fixation cross. The face images then appeared and remained on the screen for 2500 ms. Partici- pants could only respond during that time; otherwise no response was recorded and the experiment continued. They were informed about this time constraint and could experience it in a training phase preceding the experimental block. An inter-trial interval of 1000 ms followed a participant’s response before the start of the next trial. Trial order was randomized. The training phase con- sisted of 10 trials with feedback. The training face pairs (shown in random order) covered the range of the sex continua and the training identities were not used again in the experiment.

74 R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80

2.2.3. Design (b) classification task A classification task consisting of a two-alternative forced-

choice (2AFC) paradigm immediately followed the discrimination task. All images from all continua were used in the classification task, yielding a total of 300 trials (20 identities � 15 face images). Participants were shown one face per trial and asked to decide as fast as possible if it was male or female, by pressing the key ‘‘w’’ and ‘‘z’’ on the keyboard (response buttons for ‘‘male’’ and ‘‘female’’ responses counterbalanced across participants). Each face was pre- ceded by a 500 ms fixation cross. The face image remained on the screen for 1500 ms. Participants could only respond during that time; otherwise no response was recorded and the experiment continued. An inter-trial interval of 1000 ms followed a partici- pant’s response (or the end of the presentation time) before the start of the next trial. Trial order was randomized.

2.2.4. Procedure The experiments were presented on a Windows PC using the

Matlab PsychToolbox (� 1984–2007 The MathWorks, Inc., Version 7.4.0) on a color monitor. Distance to the monitor was about 57 cm, and each stimulus face measured approximately 8 by 6� of visual angle. All participants, apart from those in the first (naïve) experi- ment started with a familiarization procedure which is described in the respective experimental sections below. All participants per- formed the same-different task first, followed by the classification task. Participants received self-timed breaks after every 100 trials in both tasks. Instruction was written on the screen at the begin- ning of a task and the experimenter was present at that time for answering potential questions.

2.2.5. Data analysis Since all continua have been controlled for ranging between

endpoints of similar perceived masculinity and femininity, we ana- lyzed and present data averaged across all participants and face identities. Percent of correct ‘‘different’’ judgments for ‘‘different trials’’ are presented for the same-different task, and percent of ‘‘male’’ judgments for all trials for the classification task.

Referring to classical studies on CP (e.g., Beale & Keil, 1995; Levin & Beale, 2000) our approach was twofold: (i) From partici- pants’ responses in the classification task, the category boundary is detected from the expected shift in judgment from one category to the other. We determined the face pair straddling the male–fe- male boundary, i.e., pairs including one individual face classified as male on more than 66% of trials and another classified as female on more than 66% of trials. (ii) We then tested for significant categor- ical perception effects by entering the accuracy data from the discrimination task into a Repeated Measures ANOVA, using a deviation contrast to compare accuracy on cross-boundary pairs to the mean accuracy on all pairs combined. Increased discrimina- tion accuracy on the cross-boundary pair was taken as an indica- tion of CP.

2.2.6. Experiment 1: naïve participants (‘‘naïve’’ condition) Seventeen participants (8 male) accomplished the two tasks

assessing CP as described above. All participants were naïve as to the purpose of the experiments and unfamiliar with the faces of the MPI face database. The total duration of the experiment was about 1 h.

2.2.7. Experiment 2: familiarization with the average faces (‘‘averages’’ condition)

Participants were familiarized with the male and the female average faces before the main experiment. They were shown images of both faces from different viewpoints and in different sizes, and with a written name to emphasize the faces’ sex (see Fig. 3 for exemplar familiarization stimuli). The male average face

was called ‘‘John’’, the female average ‘‘Lisa’’. Along with each face image, participants were asked to answer a question regarding a character trait (e.g., ‘‘how intelligent is Lisa?’’, ‘‘how happy do you think John is?’’), using a 7-step scale on the screen and the ar- row keys on the keyboard. The questions were randomly chosen from a list of 46 different character traits. Time to look at the faces and to respond was not restricted, and participants were not aware of following experiments. The task consisted of a total of 50 trials (25 trials per average face, respectively). After familiarization, par- ticipants performed an old/new recognition task with the two average faces presented in random order, intermixed with 32 dis- tracter faces from our database that were not used in subsequent experiments.

Twenty-two participants (11 male) went through the familiar- ization procedure and subsequently performed both CP tasks. All of them could correctly identify the average faces after familiariza- tion and were naïve as to the purpose of the familiarization and the following tasks. The total duration of the experiment (including the same-different and the classification task) was about 1 h 10 min.

2.2.8. Experiment 3: familiarization with other faces of the same sex range (‘‘otherIDs’’)

As in the ‘‘averages’’ training procedure (see Experiment 2), each of the four ‘‘other identity’’ faces was shown in different sizes and from different viewpoints and each was associated with a sex- specific name tag (‘‘John’’, ‘‘Thomas’’, ‘‘Lisa’’, ‘‘Mary’’). Along with the face images, in each trial, participants were asked questions regarding character traits, as in Experiment 2, and had to answer using a slider and a 7-step scale on the screen. The training proce- dure consisted of 80 trials (20 trials per face identity). Images of all four different identities were intermixed and shown in randomized order. After familiarization, participants performed an old/new recognition task with the four learned faces presented in random order, intermixed with 32 distracter faces from our database that were not used in subsequent experiments.

Nineteen participants (8 male) completed this experiment. All of them could correctly identify the four learned faces in the old/ new task and were naïve as to the purpose of the familiarization and the following tasks. Total duration of the experiment (includ- ing the same-different and the classification task) was about 1 h 15 min.

2.2.9. Experiment 4: familiarization with the endpoint faces (‘‘testfaces’’ condition)

Here, participants were familiarized with all endpoint faces (i.e., the originals and opposite-sex versions) that were used in the sub- sequent CP task. To cope with the much higher number of faces to remember, the face images were shown without names, but the pronouns ‘‘he’’ and ‘‘she’’ in the questions on the screen clearly pointed to the sex of each face. Each face identity was shown four times as female and four times as male version, yielding 160 trials altogether. Images of all identities were intermixed and shown in randomized order. Familiarization was done as in Experiment 3: For each face, participants had to answer a question regarding a character trait, using a slider and a 7-step scale on the screen. Sub- sequently, participants were asked to identify the faces they had seen before, intermixed with the 32 distracter faces used in Exper- iment 3.

Eighteen participants (9 male) performed this experiment, all of them naïve as to the purpose of the familiarization and subsequent experiments. Four participants (2 male) were removed from the anal- ysis, because their identification performance of the learned faces was below 75% correct or because they did not complete both tasks. The familiarization procedure was followed by the same-different and the classification task. Total duration of the experiments was about 1 h 30 min.

Fig. 3. Familiarization stimuli. In each familiarization procedure, participants were shown images of faces (averages or male and female identities) from different viewpoints and in different sizes. Each face was shown with a written name or the pronouns ‘‘he’’ and ‘‘she’’ to emphasize the faces’ sex.

R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80 75

3. Results

The four graphs in Fig. 3A show the results of the classification task – averaged across all participants – for each experimental con- dition to allow better comparison. The ordinate represents the per- centage of ‘‘male’’ responses for all trials, the abscissa the morph levels along the sex morph continuum. Classification data was fit- ted using the psignifit toolbox version 2.5.6 for Matlab (see http:// bootstrap-software.org/psignifit/) which implements the maxi- mum-likelihood method described by Wichmann and Hill (2001). To compare classification data across conditions, the inflection point (the point of subjective equality, PSE, at 50% performance) and the smallest morph difference participants were able to dis- criminate (the just noticeable difference, JND) were calculated for each curve.

Fig. 3B shows, below each classification curve, the results of the same-different task from the same condition. The ordinate shows percent of correct ‘‘different’’ judgments (only different trials were analyzed). Along the abscissa, pairs are defined as follows: [morph level left face + morph level right face, divided by 20]. Pair 2, for example, defines a face at morph level 0%, shown alongside a face at morph level 40%; pair 5 defines a pair including a face at level 40% and a face at level 60%.

3.1. Category boundary

Classically, the location of the category boundary is predicted from participants’ judgments in the classification task, i.e., it would be expected at the point along the morph continuum where per- ception of the sex of the face stimuli changes abruptly from male to female.

Cursory visual inspection of the classification curves does not reveal a sharp step between male and female categories. We used the classical method used for example by Beale and Keil (1995) to predict performance in the same-different task. To that end, we de- fined 33% and 66% cut-offs for the category boundary, where one face morph was classified as female in 33% of the cases and as male in 66% of the cases, and another one as female in 66% and as male in 33% of the cases (see Fig. 4A, solid lines). If stimuli along a con- tinuum are perceived categorically, a peak in accuracy would be expected in the same-different task for the pair that straddles the boundary. The pair that (approximately) corresponds to the loca- tion of these cut-offs, is pair 9 (including morph levels 70 and 110) in the averages and the otherIDs condition, and pair 8 (includ- ing morph levels 60 and 100) in the naïve and the testfaces condi- tion. These pairs are highlighted by dotted vertical lines in Fig. 4B. As crossing the category boundary should cause an increase in accuracy, we would also expect pairs close to the boundary, that might only partially straddle the boundary (which is not sharply defined here, see above), to be more easily discriminable than fully within-category pairs.

3.2. Peak in performance

To test our predictions, we entered the data from each same- different task into a Repeated Measures ANOVA, using a deviation contrast to compare the performance on every pair against the

grand mean of all pairs. In the ‘‘naïve’’ condition, performance is slightly higher around the very center of the continuum (however, not exactly the predicted boundary), but the increase is not signif- icant. In the averages condition, although overall performance is better than in the naïve condition, and although there seems to be a peak at pair 9 (the predicted cross-boundary pair), the ANOVA does not reveal any significant differences between pairs. Perfor- mance data in the otherIDs condition shows slightly higher accu- racy around the predicted location of the category boundary. The effect is however only significant for pair 7 [ F(1, 18) = 10.496, p < 0.005], i.e., a pair consisting of a face at 50% and one at 90% morph level. The peak in performance would be expected around the 90% morph level. The discrimination data from the testfaces condition shows a clear peak in performance, with pair 7, [F(1, 17) = 10.961, p = 0.004], 6 [F(1, 17) = 15.814, p = 0.001], and 5 [ F(1, 17) = 7.521, p = 0.014] each being discriminated more easily than the mean of all pairs along the continuum. While this peak is also slightly shifted away from the predicted location towards the male end of the continuum, it is clear evidence for better dis- crimination between pairs that (at least partly) straddle the bound- ary between male and female, compared to pairs closer to both ends of the sex continuum.

3.3. Comparison across conditions

Fig. 5 shows several measures from both tasks to compare per- formance directly across conditions. One-sample t-tests reveal that the PSEs of all curves differ significantly from 70%, i.e., the actual center of the morph continua [naïve: t(16) = �.3.348, p < 0.006; averages: t(21) = �6.969, p < 0.001; otherIDs: t(18) = �5.386, p < 0.001; testfaces: t(17) = �4.636, p < 0.001]. Note that, since our morph continua have been chosen for their symmetrical percep- tual range from male to female, the PSE would be expected to coin- cide with the center at 70% (as each continuum ranges from 0% to 140%).

The testfaces curve has the PSE value closest to 70% (59.30, SE = 2.31), second comes the PSE of the naïve (58.98, SE = 3.28) curve. The PSE values of both other curves (averages: 52.37, SE = 2.53; otherIDs: 48.87, SE = 3.92) depart more strongly away from the middle of the continua, towards the female end (i.e., above 70% morph level, see Fig. 4). This shift on all curves indicates that morphs on the female side of the continuum are judged as male, even though the endpoint faces of the continua have been chosen as being symmetrically male and female based on the rat- ing experiments.

A One-way ANOVA with condition (naïve, averages, otherIDs, testfaces) as between-factor [F(3, 72) = 3.414, p = 0.22] and Bonfer- roni-corrected pairwise post hoc tests show that the ‘‘just notice- able difference’’ (JND) of the naïve, averages and otherIDs curves do not differ from each other [all p > 0.05], while the testfaces curve has a significantly lower JND [naïve vs. testfaces: p = 0.003; averages vs. testfaces: p = 0.039; otherIDs vs. testfaces: p = 0.035]. This is in line with the overall shape of the curves: As mentioned before, only the testfaces curve resembles a step-like function, indicating two categories with a ‘‘switch’’ in between, while the three other curves indicate a rather linear change in perception from female to male.

http://bootstrap-software.org/psignifit/

Fig. 4. Results from the CP Experiments. (A) Classification data, fitted using the Matlab toolbox psignifit. The x-axis indicates percentage of classification as male, the y-axis indicates morph level. Dotted lines indicate the PSE, dashed lines the center of the morph continuum at 70% morph level. Category cut-offs are indicated at 33% and 66% male classification (see text for details). (B) Percentage of correct responses for all ‘‘different trials’’, from the same-different tasks. Horizontal dashed lines indicate chance level, dotted vertical lines the approximate location of the cross-category pair, determined from the classification data. Face pairs are defined as follows: [(morph level left face + morph level right face)/20]. Pair 2, for example defines a face at morph level 0%, shown alongside a face at morph level 40%; pair 5 defines a pair including a face at level 40% and a face at level 60%.

76 R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80

It is visible from Fig. 4B that performance is better in the three conditions where participants had to go through a familiarization procedure before testing. If we compare the performance on the pair with the highest accuracy from every condition (‘‘maximum performance’’) in Fig. 5, A One-way ANOVA with condition (naïve, averages, otherIDs, testfaces) as between-factor [F(3, 72) = 10.007, p < 0.001] and Bonferroni-corrected pairwise post hoc tests reveal that this difference between the maximum performance in the naïve condition and the maximum performance in all other condi- tions is significant [ all p < 0.03]. Maximum performance in the averages, otherIDs and testfaces condition do not differ signifi- cantly from each other.

3.4. Sex differences

Since differences between male and female observers have been reported in the face perception literature, analyses were also done separately for male and female participants. No significant effect of observers’ sex in any of the analyses was found, and this factor was thus removed from the analyses altogether.

3.5. Summary of results

Classically, the indication for categorical perception of male and female faces would be a switch in perception between faces of both categories in a classification task, and higher discrimination perfor- mance for pairs straddling the location of that switch (i.e., the cat- egory boundary). Only the classification curve of the testfaces

condition resembles the expected step-like function, while the curves of the other three conditions show a linear change from male to female with no visible category boundary whatsoever. Only in the testfaces condition we also clearly find higher discrim- ination performance around the predicted morph level (although shifted a bit towards the male end of the continuum). Despite of the otherID condition also showing higher performance for one face pair partly straddling the potential category boundary, this pair is shifted even further away from the predicted location; it is also not in line with the results from the classification experi- ment, i.e., a linear rather than step-like change between the male and female category. In all conditions, observers show a bias to judge a face as male rather than female, even though the endpoint faces have been chosen on the basis of ratings to make sure they are perceived as unambiguously male or female.

4. Discussion

The main purpose of this study was to create optimal experi- mental conditions and stimuli for finding categorical perception of the sex of human faces. Unfamiliar faces were manipulated in their sexual appearance, while individual identity information was kept constant. Additionally, the degree of perceived maleness and femaleness of the endpoint faces used to create continua was strictly controlled. Nevertheless, we found no evidence for natu- rally occurring CP for the sex of faces. This is in accordance with the study by Bülthoff and Newell (2004), where the authors used the same face database (although different degrees of male- and

Fig. 5. Comparison of results from all conditions. PSE and JND values are from the curves that were fitted to the classification data. The PSE would be expected to be at around the 70% morph level, i.e., at the very center of the normalized morph continua. The deviation from 70, towards the female end of the range, thus indicates the ‘‘male bias’’. Maximum performance is the performance from the pair that was discriminated best in the same-different task in each condition.

R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80 77

femaleness were not taken into account), and found no CP for sex when the identity of the faces was kept constant while only sex information was morphed between male and female. These find- ings raise the question what, if not sex information alone, triggers the emergence of CP for male and female faces, as it has been re- ported in earlier studies (e.g., Campanella et al., 2001). To test whether the effect was rather due to familiarization with the test face identities in the course of the experiment than to categorical perception of the sex of unfamiliar faces, we carried out three mod- ified CP experiments that included a familiarization phase before the discrimination and classification tests: Familiarization was done with the average faces, additional male and female faces, and the test faces themselves.

When familiarized with sex-related information about the ‘‘typ- ical’’ appearance of male and female faces provided by the average faces or by other individual face identities, participants performed better overall in the same-different task, compared to naïve partic- ipants (see Fig. 4B). This is not surprising, as during the familiariza- tion process they get used to the characteristics of faces from our database as well as to the range of stimuli. However, despite higher performance than in the naïve condition, participants did still not show a ‘‘step’’ in classification responses, indicating the existence of a category boundary between male and female (see Fig. 4A). The fact that the JND (see Fig. 5) of the naïve, averages and also the otherIDs curves are all the same also indicates that familiariza- tion with average faces does not lead to categorical perception of

male and female faces, and neither does familiarization with other individual identities. There was no significant peak at the corre- sponding pairs in discrimination performance along the morph continuum in the same-difference tasks of the naïve, averages and otherIDs conditions (Fig. 4B).

Familiarization with a male and female average face provides participants with information about the average appearance of how faces of both sexes look like, independently of traits that are characteristic to the face of a specific person. One might think, since these faces do not represent real people, that sex-related informa- tion from the average faces is not as ‘‘accessible’’ to the observer as information from individual face identities. If, however, sex and identity are both independent dimensions in face space, then training with both category prototypes on the dimension sex (male and female) should lead to CP, analogous to CP for a continuum of two familiar face identities (e.g., Beale & Keil, 1995). The present study shows that familiarization with the averages for male and fe- male faces does not result in CP along the sex continuum. Our re- sults rather suggest that the perception of somebody’s sex is not independent of the perception of the same person’s identity. More evidence for this interpretation comes from the performance data for the otherIDs condition in this study: Here, familiarization is done with individual faces; hence appearance of male and female faces (and the difference between the two) is linked to idiosyncratic character traits. When tested on new unfamiliar faces, however, participants do not show clear evidence for CP, indicating that the

78 R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80

sex information, although learned, cannot be extracted or separated from the corresponding identities and transferred to new faces.

One could argue that, unlike the otherIDs faces or the testfaces themselves, the male and female averages do not represent the ac- tual end of the sex morph continua from female to male. They were created by averaging all 95 male and 95 female faces and thus indi- cate the central tendency of male and female characteristics of hu- man faces. Since the endpoint faces for the sex continua were chosen based on ratings that judge them as ‘‘normally male’’ and ‘‘normally female’’, we can suppose that the averages would fall into a similar range on the male–female rating scale. Even if we as- sume that this is not the case, then we would still expect CP for male and female faces after familiarization with the averages. The category boundary might then simply not lie at the center, but at a slightly different location along the symmetrical sex con- tinua used for testing. However, this is only possible if sex-specify- ing information can be abstracted from specific faces and transferred to unfamiliar identities. Yet, as discussed above, there is no sign for CP emerging after familiarization with the male and female averages, be it at the expected or at any other location.

Only participants who were familiarized with the identities of the test faces show a clear discrimination performance pattern that would be expected for categorical perception, as well as a steeper classification curve (Fig. 4, testfaces). The latter is in line with a sig- nificantly smaller JND of the testfaces classification curve com- pared to the other three conditions (see Fig. 5). The accuracy data from the same-different task of the testfaces condition reveals a significant peak around the predicted level, although shifted a lit- tle bit towards the male end of the continuum. This slight discrep- ancy has been reported before in CP studies and has been suggested to result from the fact that the discrimination task eval- uates an ability that is more related to a perceptual state, while the classification task taps more into cognitive processes (see e.g., Sigala, Logothetis, & Rainer, 2011). Note that participants in the testfaces condition had to memorize a much higher number (20 faces in 30 min max) of face identities than participants in the averages (2 faces in 10 min max) and the otherIDs (four faces in 15 min max) conditions, while they had less time for every iden- tity. Thus even the shorter exposure and probably less good encod- ing of each single identity leads to a considerable difference in discrimination and classification, compared to the other condi- tions, as expected for CP.

To summarize, our results indicate that sex and identity infor- mation in faces is not processed in parallel, as was suggested in the classical Bruce and Young model of face perception (Bruce & Young, 1986). Rather, the perception of the sex of a face seems clo- sely linked to the perception of its identity, as stated in the single- route hypothesis. This hypothesis is based on earlier findings that show, for example, that participants in a classification task could not selectively attend to either sex or identity without being influ- enced by the other, irrelevant, dimension (Ganel & Goshen-Gott- stein, 2002). In another study, participants were quicker at judging the sex of familiar faces compared to unfamiliar ones, indi- cating that identity perception influences the perception of the sex of faces (Rossion, 2002).

The influence of face familiarity on sex categorization might seem surprising, as one of the first things we can say about a per- son we do not know is whether it is a man or a woman. This ‘‘cat- egorization’’ of just everybody in our environment into male or female is also of biological and social significance. Moreover, in high-level adaptation studies, it has been shown that face identi- ties are encoded relative to sex-specific rather than relative to a generic norm (Jaquet & Rhodes, 2008), suggesting that the brain in- deed stores a general representation of what is male and female in faces. However, it seems that this representation is not indepen- dent from a representation of every idiosyncratic face identity

and that information regarding the two traits is processed, at least partly, in an interconnected way.

Another explanation could be that it is simply the appearance of our stimuli (and other similar stimuli generally used in face per- ception studies) that is responsible for the lack of CP: They are rather natural looking, but are devoid of any external feature that could serve as a cue, like hair, glasses or make-up. It has been sug- gested, however, that unfamiliar faces are often processed in terms of these external cues, and that when we get familiarized with a face (or a person), our focus moves more to inner facial features which might then play a more important role in recognition (e.g., Ellis, Shepherd, & Davies, 1979; O’Donnell & Bruce, 2001; Stacey, Walker, & Underwood, 2005). Maybe as long as a person is not familiar to us, accurate sex judgments are mostly made on the ba- sis of external rather than inner-face information. Hair-style, make-up, or even cultural ‘‘accessories’’ like a head scarf would thus serve as the essential (and sufficient) cue to sex, without tak- ing more subtle physiognomic cues into account at all. If we per- sonally know somebody, on the other hand, we already have a clear representation of their sex. It would therefore make sense at this level, in line with the results of the current study, to not pro- cess sex information independently of someone’s identity (and to not consider ambiguous cases of unclear sex) but to have a com- mon representation of all relevant information concerning that person.

Brain imaging data shows that it is surprisingly difficult to find sex-specific neural responses to faces: The responses are weak and widely distributed across the whole face network (Kaul, Rees, & Ishai, 2010). A possible interpretation of these results, in accor- dance with the present study, is that not only some specific neu- rons, but most face-selective neurons in the brain do have information about the sex of the face stimulus they are responding to, and that this information is not specifically extracted and gath- ered separately from the main character trait of each face, i.e., its individual identity.

Sex is not the only socially and biologically relevant character- istic of faces, and as pointed out in the introduction section, expression (Calder et al., 1996) and race (Levin & Angelone, 2002), have been shown to be perceived categorically. The results of the current study then raise the question why these characteris- tics would be perceived categorically, independently of certain face identities, when sex is not? As to expressions, it is worth noting that unlike sex and race, they are not a ‘‘stable’’ characteristic, in the sense that an individual belongs to either one or another cate- gory. All expressions are in principle present in every individual, at different times, and it thus seems reasonable to categorize each expression as belonging to one or another category, independent of the individual showing it at a certain moment. In addition, expressions naturally occur in all possible intensities and combina- tions, they can be very clear or rather subtle, or a mixture of two or even more at the same time. Our perceptual system is thus dealing with ‘‘continua’’ in between clear prototypes of expressions, and above all with classifying ambiguous variations of expressions, independently of individual faces, all the time.

The characteristic race, on the other hand, seems to be on a level with sex, with people usually being classified into one or another category, and not changing their race over a lifetime. It would thus seem unnecessary to process and store information about a per- son’s race independent from their identity, just as it seems unnec- essary to do that for someone’s sex. However, in contrast to sex, human races include many more than two categories, and there exist a lot of natural ‘‘hybrid forms’’ in between with in fact rather unclear boundaries. In reality, classification by race is not as straightforward as testing CP on a continuum between a typical ‘‘Asian’’ and a typical ‘‘Caucasian’’ face, for example. Furthermore, race is not even a clearly biologically defined characteristic, given

R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80 79

that by the criteria that biologists normally use to apply the con- cept of ‘‘subspecies’’ or ‘‘races’’, humans do not qualify (see for example Cosmides, Tooby, & Kurzban, 2003). However, race classi- fication is of high social relevance in everyday life, and belonging to one or the other can have important implications. That might ex- plain why, although race judgments seem to be less clear than sex judgments in theory, the brain nevertheless makes the effort to classify unfamiliar people into seemingly well-defined race cat- egories. What is more, and possibly another reason for CP for race (but not sex) of faces is that, unlike what has just been said about external features that might guide sex judgments in everyday life, there are not many external features or attributes apart from the physiognomy of the face itself that can be used to classify someone as ‘‘East Asian’’ or ‘‘African’’.

Our morph continua have been normalized prior to the CP experiment and thus all cover the same symmetrical range from male to female. Variations in the location of the category bound- ary or – if there is none – the point of subjective equality (PSE) on every continuum and for every participant should thus be min- imized. Surprisingly, we still observed a very consistent male bias, i.e., a deviation of the PSE of the classification curves from the ac- tual center of the morph continuum at 70% (see Fig. 5), towards the female end of the continuum, in every condition. This bias re- veals that participants are more likely to say that a face is male than female when in doubt, and even when the stimuli are taken from within the female side of the continuum (i.e., where partic- ipants should not have any doubt at all). This phenomenon could of course be a sign that our morph algorithm is manipulating the face stimuli in a non-linear way. In this case, a bias observed in the PSE might just reflect the non-linearity of the continua that were created between the endpoint faces. However, we choose unambiguously male and female faces for the endpoints of the continua, based on the results from the ratings. It is important to note that these endpoint faces were not manipulated anymore after the rating experiments. They at least should thus be classi- fied as clearly male and female in the main experiment, since this was our selection criteria. A look at Fig. 4A, however, reveals that even the female endpoint faces, i.e., the faces at 140% morph level in every graph, are not classified as female all the time; in the naïve, the averages and the otherID condition, even the most fe- male faces are rated as female in less than 88% of the cases (naïve: 87.65, SE = 4.28: averages: 87.86, SE = 2.28), even in only 83% of the cases in the otherIDs condition (SE = 2.57). Only in the testfac- es condition, female judgments for the most female faces reach 95% (SE = 1.69). Independent samples t-tests show that the differ- ence in these judgments for female endpoint faces between the testfaces condition and all other conditions is significant [all p < 0.04]. Male judgments at the male end of the continuum (at 0% morph level) are always above 96% and do not differ between conditions.

Interestingly, Troje and colleagues (Troje, Sadr, & Geyer, 2006), when investigating adaptation aftereffects in the perception of the sex of male and female point-light walkers, made the same rather unexpected discovery: A bias toward seeing more male than female walkers in a set of stimuli before adaptation, even though they then choose their stimuli to control for this bias and thus keep the ‘‘sex range’’ of the walkers symmetrical, remained almost constant.

These findings suggest that there is a perceptual or cognitive bias to answer ‘‘male’’ when in doubt about a person’s sex. In the case of human faces this phenomenon has been suggested to result from an anatomical lack of distinctly female features (e.g., Enlow, 1990) in faces in general. Answering ‘‘female’’ while classifying a face’s sex would thus be a ‘‘no male traits’’ response. In line with what we discussed above, apart from the fact that body shape def- initely plays a role in defining somebody’s sex, this bias could im- ply that external features, i.e., hairstyle, makeup, clothing, maybe

even behavior, might be used as cues to a person’s sex, more than just physical appearance of the face itself. Interestingly, all these external features are defined and shaped by culture and thus are not biologically ‘‘hardwired’’ and universal. It could also be that misclassifying a male person as female has generally proved to be potentially more dangerous than misclassifying a woman as a man in the history of humans. We do not want to overspeculate here; however, our results, together with results from other cur- rent studies (e.g., Troje, Sadr, & Geyer, 2006) suggest that there might be more to this male bias, in addition to the stimulus-driven bias that is usually found in face stimuli when they are deprived of external information.

Acknowledgments

We want to thank Mario Kleiner for assistance with the Face Database and Johannes Schultz for help with data analysis. This re- search was supported financially by the Max Planck Society.

References

Armann, R., & Bülthoff, I. (2009). Gaze behavior in face comparison: The roles of sex, task, and symmetry. Attention, Perception, & Psychophysics, 71(5), 1107–1126.

Beale, J. M., & Keil, F. C. (1995). Categorical effects in the perception of faces. Cognition, 57(3), 217–239.

Blanz, V., & Vetter, Th. (1999). A morphable model for the synthesis of 3D faces. In Proceedings of SIGGRAPH 99, pp. 187–194.

Blanz, V. (2000). Automatische Rekonstruktion der dreidimensionalen Form von Gesichtern aus einem Einzelbild. Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften, Eberhard-Karls-Universitaet Tuebingen.

Bruce, V., & Young, A. (1986). Understanding face recognition. The British Journal of Psychology, 77(3), 305–327.

Bülthoff, I., & Newell, F. N. (2004). Categorical perception of sex occurs in familiar but not unfamiliar faces. Visual Cognition, 11(7), 823–855.

Calder, A. J., Young, A. W., Perrett, D. I., Etcoff, N. L., & Rowland, D. (1996). Categorical perception of morphed facial expressions. Visual Cognition, 3(2), 81–117.

Campanella, S., Chrysochoos, A., & Bruyer, R. (2001). Categorical perception of facial gender information: Behavioural evidence and the face-space metaphor. Visual Cognition, 8(2), 237–262.

Campanella, S., Hanoteau, C., Seron, X., Joassin, F., & Bruyer, R. (2001). Categorical perception of unfamiliar facial identities, the face-space metaphor, and the morphing technique. Visual Cognition, 10(2), 129–156.

Cosmides, L., Tooby, J., & Kurzban, R. (2003). Perceptions of race. Trends in Cognitive Sciences, 7(4), 173–179.

Davidenko, N. (2007). Silhouetted face profiles: A new methodology for face perception research. Journal of Vision, 7(4).

Ellis, H. D., Shepherd, J. W., & Davies, G. M. (1979). Identification of familiar and unfamiliar faces from internal and external features – Some implications for theories of face recognition. Perception, 8(4), 431–439.

Enlow, D. H. (1990). Facial growth. Philadelphia, PA: WB Saunders. Etcoff, N. L., & Magee, J. J. (1992). Categorical perception of facial expressions.

Cognition, 44(3), 227–240. Ganel, T., & Goshen-Gottstein, Y. (2002). Perceptual integrality of sex and identity of

faces: Further evidence for the single-route hypothesis. Journal of Experimental Psychology: Human Perception & Performance, 28, 854–856.

Harnad, S. (1987). Psychophysical and cognitive aspects of categorical perception: A critical overview. In S. Harnad (Ed.), Categorical perception: The groundwork of cognition. New York: Cambridge University Press.

Jaquet, E., & Rhodes, G. (2008). Face aftereffects indicate dissociable, but not distinct, coding of male and female faces. Journal of Experimental Psychology – Human Perception and Performance, 34, 101–112.

Kaul, C., Rees, G., & Ishai, A. (2010). Perception of gender is a distributed attribute in the human face processing network. Journal of Vision, 10(7), 707.

Leopold, D. A., Bondar, I., & Giese, M. A. (2006). Norm-based face encoding by single neurons in the monkey inferotemporal cortex. Nature, 442, 572–575.

Levin, D. T., & Angelone, B. L. (2002). Categorical perception of race. Perception, 31(5), 567–578.

Levin, D. T., & Beale, J. M. (2000). Categorical perception occurs in newly learned faces, other-race faces, and inverted faces. Perception & Psychophysics, 62(2), 386–401.

O’Donnell, C., & Bruce, V. (2001). Familiarisation with faces selectively enhances sensitivity to changes made to the eyes. Perception, 30(6), 755–764.

Rosch, E., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8(3), 382–439.

Rossion, B. (2002). Is sex categorization from faces really parallel to face recognition? Visual Cognition, 9(8), 1003–1020.

80 R. Armann, I. Bülthoff / Vision Research 63 (2012) 69–80

Sigala, R., Logothetis, N. K., & Rainer, G. (2011). Own-species bias in the representations of monkey and human face categories in the primate temporal lobe. Journal of Neurophysiology, 105(6), 2740–2752.

Stacey, P. C., Walker, S., & Underwood, J. D. M. (2005). Face processing and familiarity: Evidence from eye-movement data. British Journal of Psychology, 96, 1–17.

Troje, N. F., Sadr, J., & Geyer, H. (2006). Adaptation aftereffects in the perception of gender from biological motion. Journal of Vision, 6(8), 850–857.

Vetter, T., & Poggio, T. (1997). Linear object classes and image synthesis from a single example image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7), 733–742.

Viviani, P., Binda, P., & Borsato, T. (2007). Categorical perception of newly learned faces. Visual Cognition, 15(4), 420–467.

Wichmann, F. A., & Hill, N. J. (2001). The psychometric function: I. Fitting, sampling, and goodness of fit. Perception & Psychophysics, 63(8), 1293–1313.

Male and female faces are only perceived categorically when linked to familiar identities – And when in doubt, he is a male

1 Introduction
2 Methods

2.1 Creating normalized face stimuli

2.1.1 Rating 1: rating original male and female faces
2.1.2 Rating 2: rating feminized female faces and original male faces
2.1.3 Rating 3: rating opposite-sex versions
2.1.4 Choosing “symmetrical” stimuli
2.1.5 Sex-specific sex perception

2.2 Main experiment: CP for the sex of faces

2.2.1 Stimuli
2.2.2 Design (a) discrimination task
2.2.3 Design (b) classification task
2.2.4 Procedure
2.2.5 Data analysis
2.2.6 Experiment 1: naïve participants (“naïve” condition)
2.2.7 Experiment 2: familiarization with the average faces (“averages” condition)
2.2.8 Experiment 3: familiarization with other faces of the same sex range (“otherIDs”)
2.2.9 Experiment 4: familiarization with the endpoint faces (“testfaces” condition)

3 Results

3.1 Category boundary
3.2 Peak in performance
3.3 Comparison across conditions
3.4 Sex differences
3.5 Summary of results

4 Discussion
Acknowledgments
References