Theory Research Paper #2

brendaxo
Article1.pdf

Establishing Cross-Cultural Measurement Equivalence of Scales Associated with Face-Negotiation Theory: A Critical Issue in Cross-Cultural Comparisons Courtney Vail Fletcher, Masato Nakazawa, Yea-Wen Chen, John G. Oetzel, Stella Ting-Toomey, Shau-Ju Chang & Qin Zhang

To extend face-negotiation theory (FNT) to romantic relationship contexts, we examined the patterns of responses to six survey instruments among five countries. We intend this initial paper to serve as a tutorial of testing measurement invariance— specifically by elaborating on: (a) how to establish measurement equivalence and (b) how to test measurement invariance using multigroup Confirmatory Factor Analysis. The results highlight the importance of establishing equivalency and routinely testing invariance in cross- and intercultural research. This is especially important given the often-extreme equivalency challenges faced when collecting data in non- Western cultures using scales developed in Western countries.

Keywords: Face-Negotiation Theory; Conflict Styles; Measurement Equivalency; Cross-Cultural Communication; Measurement Invariance

Courtney Vail Fletcher is at the University of Portland. Masato Nakazawa and Yea-Wen Chen are at Ohio University. John G. Oetzel is at University of Waikato. Stella Ting-Toomey is at California State University, Fullerton. Shau-Ju Chang is at National Taiwan Normal University. Qin Zhang is at Fairfield University. A sincere thank you to the participants in Taiwan, China, Ethiopia, Uganda, and the United States who participated in this study. We could not have collected our data without your help and support. To Masato and Yea-Wen, what a fabulously smart team you make. Thank you to John and Stella for your guidance throughout the project; your expertise is irreplaceable. Thank you to Shau-Ju and Qin for helping us collect these data across continents— true intercultural research does take a village. A huge thank you to the editors and reviewers at JIIC, especially Rona Halualani, for your thoughtful feedback. An earlier version of this paper was presented at the National Communication Association conference in New Orleans, November 2011. Correspondence to: Courtney Vail Fletcher, University of Portland, 5000 N. Willamette Blvd, Portland, OR 97214, USA. Email: fletcher@up.edu

Journal of International and Intercultural Communication Vol. 7, No. 2, May 2014, pp. 148–169

ISSN 1751-3057 (print)/ISSN 1751-3065 (online) © 2014 National Communication Association http://dx.doi.org/10.1080/17513057.2014.898364

Introduction

There is growing awareness of some critical methodological issues involved in conducting cross-cultural comparisons such as establishing equivalence to ensure construct comparability (e.g., Budruk, 2010; Byrne et al., 2009). However, commun- ication scholars interested in explaining variability in communication across cultures, in general, have been relatively silent in such conversations. Among the scholars who do attend to methodological issues in cross-cultural communication research, Levine, Park, and Kim (2007) stated “straightforward cross-cultural comparison probably is meaningless” without establishing equivalence (p. 212). This paper bridges this gap by exemplifying and modeling the necessary practice of establishing measurement equivalence and testing measurement invariance when conducting theory-based cross-cultural studies of communication.

This essay is the first in an overarching study that tests and extends face- negotiation theory (FNT; Ting-Toomey, 2005) and its associated measures cross- culturally by examining how the theory guides a large-scale investigation in China, Taiwan, Uganda, Ethiopia, and the US. It also serves the larger purpose of providing a tutorial of equivalency considerations for communication scholars wishing to collect cross- and intercultural data. Our results suggest that caution and detail should be exercised when establishing measurement equivalency between cultures and/or when drawing conclusions about data, especially since our measurement model found only weak (definitive or tentative) support in four of the six scales tested.

Our selection of cultures directly responds to Gudykunst’s (2002) call for examining non-Asian cultures and studying at least four cultures in cross-cultural research. Specifically, the two Asian and two African cultures examined in this study serve as collectivistic and communalistic counterparts to the comparison basis of the US individualistic culture. Distinct from collectivism, Awa (1988) found that communalism, “a vital part of the African culture ethos” (p. 136), focuses social life beyond concerns for families to solidarity with others, especially the less fortunate, in a community. Whereas the overarching project aims to extend FNT to romantic relationship contexts, the purpose of this specific paper is to establish measurement equivalence of the scales used in the primary project through a variety of rigorous pre- and postdata collection efforts.

Literature Review

To highlight the importance of this inquiry, it is necessary to frame and support the current study in the context of the existing literature regarding what is known about the relationships between face and face-negotiation practices and communication outcomes in different cultures—and then consider the role of measurement equivalency and measurement invariance in conducting cross- and intercultural research. Therefore, the review of the literature addresses (a) FNT, (b) the additional

Journal of International and Intercultural Communication 149

new variables under investigation (i.e., power distance, relationship satisfaction, and forgiveness), (c) establishing equivalence, and (d) measurement invariance.

FNT

FNT (Ting-Toomey, 1988; Ting-Toomey & Kurogi, 1998) provides the necessary communication, culture, and conflict framework for the current study. FNT is used in a variety of cross-cultural studies, and results from various studies data have been highly relevant to the field of communication (Oetzel & Ting-Toomey, 2003).

FNT argues that face is the explanatory mechanism for conflict management styles across cultures. Face is the claimed sense of positive social worth (Ting-Toomey & Kurogi, 1998). Facework is the communicative strategies individuals use to enact self-face or to uphold, support, or challenge another person’s face (Ting-Toomey, 2005). In summary, the FNT assumes that:

(a) people in all cultures try to maintain and negotiate face in all communication situations; (b) the concept of “face” is especially problematic in uncertain situations (e.g., conflict situations) when the situated identities of the communicators are called into question; (c) cultural variability, individual-level variables, and situational variables influence cultural members’ selection of face concerns over others, and (d) subsequently, cultural variability, individual-level variables, and situational variables influence the use of various facework and conflict strategies in intergroup and interpersonal encounters. (Oetzel et al., 2001, p. 238)

The current version of FNT (Ting-Toomey, 2005) has 24 propositions. Those propositions focus on comparisons of face concerns and conflict communication styles at the cultural level (1–12), individual-level (13–22), and relational and situational-level (23–24). The cultural-level propositions center on comparisons between members of individualistic cultures and members of collectivistic cultures regarding their selections or preferences of face concerns (e.g., self-face, other-face, and mutual-face) and conflict communication styles (e.g., dominating, avoiding, obliging, compromising, integrating, emotional expression, third-party help, and neglect). The individual-level propositions concentrate on comparisons between self- construals (e.g., independent self-construal, interdependent self-construal, biconstrual orientation, and ambivalent orientation) and conflict styles as well as face-concern types and conflict styles. The relational and situational-level propositions focus on comparisons of individualists (i.e., independent) and collectivists (i.e., interdepend- ent) in terms of their face concerns and facework behaviors with in-group and out- group members in conflict situations.

Proposed New Theoretical and Outcomes Perspectives

In an effort to extend FNT and test a measurement model of its variables cross- culturally, it is necessary to conceptualize new ways in which conflict may occur in romantic relationships. More specifically, thus far, FNT primarily considers the role of cultures in explaining face concern, conflict style choices, self-construal, and

150 C. V. Fletcher et al.

outcomes. The overarching study suggests how power distance, relationship satisfaction, and forgiveness should be considered when theorizing how individuals may be affected by conflict. Below, an overview of power distance, relationship satisfaction, and forgiveness is offered.

Power Distance. Power distance is one of the cultural dimensions commonly examined by conflict researchers (Ohbuchi, Sato, & Tedeschi, 1999). Hofstede (1980) first introduced the concept of power distance to explain cultural differences in organizational conflict management. Power distance generally refers to the extent to which the less powerful members of institutions perceive and/or accept that power is distributed unequally (Hofstede, 1980) or the extent to which social behaviors are determined by status or rank (Triandis, 1994). While the relevance of power distance for interpersonal conflict management is undeniable, little is known about how power distance plays out in romantic conflicts cross-culturally.

Relationship satisfaction. Relationship satisfaction is “the degree to which an individual is content and satisfied with his or her relationship” (Anderson & Emmers-Sommer, 2006, p. 5). Substantial evidence suggests that relationship satisfac- tion is linked to the way individuals behave in romantic relationships, how people think about their romantic relationships, and the attributions that people make about a partner’s behavior (Ptacek & Dodge, 1995). There is significant support for the belief that individuals who resolve conflict are more satisfied with their romantic relationship. For example, Smith, Heaven, and Ciarrochi (2008) examined trait emotional intelligence (EI), conflict communication patterns, and relationship satisfaction in cohabiting heterosexual couples. Results indicated that the most satisfied couples were those who did not avoid discussion of relationship problems and who rated their partners high in EI.

Forgiveness. Fundamental to forgiveness is “an attitude of real goodwill toward the offender as a person” (Holmgren, 1993, p. 34). When considering how satisfied a couple may or may not be, it is necessary to consider the role of forgiveness following a conflict.

Forgiveness entails a positive or benevolent motivational state toward the transgressor that is not achieved simply by overcoming the avoidance goal set in motion by an unacceptable self-image or the negative motivational state that is caused by the transgression. (Fincham, Beach, & Davila, 2004, p. 73)

Forgiveness may therefore have substantial implications for long-term romantic relationship outcomes as well as short-term patterns of interaction.

Establishing Equivalency

Because this study examined perspectives of individuals in China, Taiwan, Uganda, Ethiopia, and the US in an effort to locate conflict and communication patterns, equivalency must be established both prior to and following the data collection

Journal of International and Intercultural Communication 151

(van de Vijver & Leung, 1997). Gudykunst (2005) suggested that at least five different equivalencies must be accounted for when conducting cross-cultural research: functional, conceptual, linguistic, metric, and sample. We briefly describe these equivalencies since they offer more of a design and conceptual understanding of social science research. That is, without adequately establishing these equivalencies during the initial conceptualizing and designing stages of a study, we could not statistically test another form of equivalence—measurement equivalence.

Functional equivalency is associated with the macro perspective of whether or not the concept or construct under investigation is similarly understood or received in each culture. Simply put, the same words and behaviors may serve different functions in different cultures (Cheung, van de Vijver, & Leong, 2011). For example, commun- ication apprehension is often depicted as having a negative connotation in US culture, whereas in Japan reticence is seen as socially desirable (Gudykunst, 2005). Thus, the construct of reticence is not functionally equivalent between these two groups because US American and Japanese individuals may not receive it in the same way. Therefore, when establishing functional equivalence, researchers need to be careful to ascertain whether or not the construct functions in the similarly in the cultures being compared. In the current study, this equivalency concern was addressed by conducting intensive background research on each country and via definitional interviews with native representatives of each culture now living in the US, in a similar way encouraged by Cheung and colleagues (2011).

Conceptual equivalency is associated with whether or not the construct has a similar cognitive meaning in the minds of the participants in each different culture under investigation. For example, the universality of the construct of face is debated, with some researchers, such as Brown and Levinson (1987), arguing that it is a universal concept (and therefore has the same inherent cognitive meaning in the minds of all people, despite perhaps different words being used to describe it), while Hofstede (1983) suggested that face is culturally specific. In this example, a derived etic measure must be developed. Additionally, it is important to establish the similar referent for a construct such as face. In the current study, each participant recalling a romantic partner conflict scenario addressed this equivalency concern. Specifically, each participant was asked to recall and briefly share the details of a specific situation during the last 6 months when he or she and their romantic partner “fought or had a disagreement.” The authors could then identify the emergent cognitive consistencies or equivalencies (or not) between how participants understood and described the conflicts, paying particular attention to the main constructs under investigation.

Linguistic equivalence is associated with whether the participant is completing the survey in his or her native language. If a respondent is not completing a survey or questionnaire in his/her first language, linguistic equivalence must be established. Two techniques ensure that linguistic equivalence can be established when data are being collected in two or more distinct cultures where different languages are spoken. The first technique is back-translation: This involves a bilingual person translating the questions from the survey’s original language into the second language and then a second bilingual translates that initially translated survey back into the original

152 C. V. Fletcher et al.

language of the survey. Back-translation was used to provide two versions of Chinese questionnaires for participants in China (simplified Chinese) and Taiwan (traditional Chinese). The second technique is decentering, or the removal of wordy language in a questionnaire. This was used to ensure quality translation and accessibility for multilingual participants in Uganda and Ethiopia.

Metric equivalence, also known as measurement equivalence, is associated with accounting for the differences in how people answer questions from different cultures. For example, research shows that Japanese respondents often do not use extreme score values (e.g., strongly agree, strongly disagree), while Hispanic respondents have been found to heavily favor extreme score values (i.e., extremity; Hui & Triandis, 1989). Van de Vijver and Leung (1997) offered three alternatives for why metric differences may exist: (a) the differences are in fact real and exist, (b) there is a qualitative (systematic) measurement error (related to linguistic and conceptual equivalency), or (c) there is a quantitative (random) measurement error. The current study addressed this equivalency issue by carefully cleaning (e.g., removing and imputing missing values—see data cleaning and preliminary analysis section) and analyzing the quantitative data for distinctive patterns and/or unchar- acteristic anomalies.

Sample equivalence is associated with ensuring that two sample populations under investigation are, in fact, similar. To ensure that similar samples are used, the demographics of participants should be closely examined, and more contextual data should be assessed. Additionally, it is necessary that the sample chosen be represent- ative of the larger population so that constructs can be validly measured. The current study addressed this potential equivalency issue by collecting questionnaire data and interview data from similar demographics (i.e., university students in major metropolis cities in each country were used) and by formulating several clarifying questions that allow the emergence of severe demographic differences that may skew data drastically or render them useless. Because a student population provided a potential for homogeneity in the results to occur, it was nonetheless considered to be an acceptable procedure given the breadth of theory supporting how each sample was still largely representative of the larger population’s cultural values.

Measurement Invariance

Measurement equivalency can be explored in several different ways, and generally it is not used rigorously in cross- and intercultural communication research, though tests of equivalency and measurement invariance are widely tested in psychology (for a further discussion, see Bracken, & Barona, 1991 and Poortinga, 1995). Though the larger intent of this project is to test and extend FNT, the data collected lead us first to address the largely neglected role of measurement invariance in communica- tion research. More specifically, in addition to establishing functional, conceptual, lin- guistic, metric, and sample equivalencies, comparisons between and among cultures, contexts, or time also require measurement invariance.

Journal of International and Intercultural Communication 153

Measurement invariance exists when the parameters (e.g., item intercepts, factor loadings) of the model are the same across the sample population (Borsboom, 2006). Measurement invariance is the similarity of measures of constructs in different contexts – and instruments/scales can report different levels of similarity or difference between and among multiple populations. Only when measurement invariance is established can researchers reliably draw valid conclusions about substantial differ- ences or relationships between cultures or constructs. Without evidence of measure- ment invariance, differences may also stem from measurement nuances or other unknown factors that may cause the differences observed in the data. Comparisons may be based upon manifest (i.e., directly observed) or latent (i.e., not directly observed) variables. Most communication constructs (e.g., attitudes, beliefs) represent latent constructs that cannot be directly measured and therefore are generally assessed using a measurement model.

The most common types of measurement invariance are: (a) configural (i.e., do the groups have the same factor structure?), (b) metric, or weak factorial invariance (i.e., do the groups have the same factor loadings?), (c) scalar, or strong factorial invariance (i.e., do the groups have the same item intercepts?), and (d) residual, or strict factorial invariance (i.e., do the groups have the same item residual variances? (Vandenberg & Lance, 2000).

The authors hope that this article serves as a tutorial of equivalency considerations for communication scholars wishing to collect cross- and intercultural data. At the same time we hope to answer the following research questions: (1) among the four types of measurement invariance, what is the most common type?; and (2) how often are more stringent types of invariance (i.e., strict and strong) held?

Methods

Research Sites

The research sites for this study were China, Taiwan, Uganda, Ethiopia, and the US The US (individualistic) served as the basis for cross-cultural comparisons, since this was the cultural context where most, if not all, of the scales used were initially developed. Ethiopia and Uganda were chosen to redress the currently limited understanding of communication processes situated in African (and communalistic) cultures (Gudykunst, 2005), whereas China and Taiwan were selected as the Asian (collectivistic) counterparts. These choices allow us to establish equivalency in multiple types of collectivistic and communalistic cultures, while comparing the results to the individualistic culture where the scales originated and have been more routinely tested—thus providing a potential baseline for comparison.

Participants

To maintain confidentiality, participation in the questionnaire portion of the study was anonymous. Participation was voluntary, and no extra credit was awarded to participants. Participants in China (N = 183) were recruited in a variety of English

154 C. V. Fletcher et al.

classes at a large university in central Mainland China. Participants in Taiwan (N = 222) were recruited in a variety of English classes at a large university in northern Taiwan. Participants in Uganda (N = 231) and Ethiopia (N = 156) were recruited in a variety of social science courses at large universities in the capital cities of both countries. Participants in the US (N = 229) were recruited in a variety of communication classes at a large university in the Southwest. Table 1 summarizes the demographic characteristics. Despite our efforts in recruiting equivalent samples from the five countries, they differed significantly in percentage male (p < 0.01), education levels (p < 0.01), and age (p < 0.01). Therefore, these demographic variables were included in the measurement models as covariates to control their potential influence on the outcome variables.

Survey Instrument and Procedures

The questionnaire consisted of a cover page that explained the students’ rights and the Institutional Review Board’s stamp of approval for the study. Following the cover page, there are seven pages consisting of six measures, demographic information (e.g., sex, age …) and several author-developed questions. The survey was available in English, traditional Chinese, and simplified Chinese. Prior to administrating the survey questionnaire in China and Taiwan, the third author translated the survey from English to Chinese, and the sixth author back-translated the survey from Chinese to English to ensure semantic equivalence. Later, the seventh author adopted the translated survey into simplified Chinese characters with minor semantic adaptations as needed. The questionnaire data in Uganda and Ethiopia were collected at universities and colleges where English was the primary language of instruction to ensure that participants had a working knowledge of the language. University students in Uganda were reported to have been using English as their primary

Table 1 Descriptive Statistics of Demographic Variables

China Taiwan Uganda Ethiopia US

Characteristics n % n % n % n % n %

Gender Male 37 20.3 76 34.5 144 62.3 113 73.9 77 33.9 Female 145 79.7 144 65.5 87 37.7 40 26.1 150 66.1

Education High School 0 0.0 0 0.0 20 8.8 1 0.7 1 0.4 Some College

27 15.5 181 82.3 64 28.2 105 68.6 221 98.2

College 82 47.1 2 0.9 71 31.3 31 20.3 2 0.9 Graduate Graduate School

64 36.8 37 16.8 49 21.6 8 5.2 1 0.4

Other 1 0.6 0 0.0 23 10.1 8 5.2 0 0.0 Age (mean ± SD)

26.37 5.10 21.63 2.82 24.10 3.97 21.25 2.34 22.60 5.69

Journal of International and Intercultural Communication 155

instructional (i.e., school) language since elementary school, while Ethiopian participants were reported to have been using English as their primary instructional language since middle school. In both Uganda and Ethiopian cultures, it can be challenging to collect data in a specific local language, since each has many national languages, which often leads individuals to communicate in English with one another.

In sections 1, 2, and 3 of the questionnaire, six scales were used to assess com- munication patterns related to face concerns, conflict styles, self-construal, power distance, relationship satisfaction, and forgiveness: (a) Face Concern Scale (Ting- Toomey, Oetzel, & Yee-Jung, 2001), (b) Conflict Styles Scale (Ting-Toomey et al., 1991), (c) the Self-Construal Scale (Gudykunst et al., 1996), (d) the Power Distance Scale (Earley & Erez, 1997), (e) the Relationship Satisfaction Scale (Hendrick, 1988), and (f) the Marital Forgiveness Scale (Fincham, Beach, & Davila, 2004). The final section of the questionnaire asked 12 clarifying and demographic questions designed by the authors.

Face concern was measured using a revised version of the 34-item scale. The scale used for this data collection included 15 items designed to assess the respondents’ face concern in conflict. More specifically, the participant primarily was concerned with saving his/her own face, the other’s face, and/or concerned equally with saving both of their faces (i.e., mutual) in a recalled conflict situation during the past 6 months when the participant and their partner fought or had a disagreement. Answers were scored on a 5-point Likert-type scale, and sample questions include: “I was concerned with respectful treatment for both of us [mutual]” and “I was concerned with not bringing shame to myself [self].” Prior studies reported an internal reliability of .90 for other-face, .80 for mutual-face, and .85 for self-face (Oetzel et al., 2001). Because of the way the three factors loaded during a preliminary Confirmatory Factor Analysis (CFA), mutual-face concern was excluded from the primary analysis of face concerns in the current study. The mean Cronbach α was 0.77 (range = 0.57–0.87) for the self-concern subscale and 0.68 (0.40–0.81) for the other concern subscale.

Conflict styles were measured using a 32-item scale. The scale is designed to assess individuals’ conflict style preference when engaged in the recalled conflict with their romantic partners. The specific styles assessed included: avoiding, integrating, dominating, third-party help, emotional expression, passive aggression, obliging, and compromising. Respondents answered on a 5-point Likert-type scale with responses ranging from strongly agree (5) to strongly disagree (1). Thus, higher scores are indicative of preference for certain conflict styles. Sample items include: “I used my authority to make a decision in my favor [dominate]” and “I said nothing and waited for things to get better [avoid].” Prior research has examined the relationship between conflict styles and specific outcomes, such as face concerns, in a variety of contexts (Oetzel & Ting-Toomey, 2003). The reliability of the conflict styles in other studies has ranged from .73 to .88 (Oetzel & Ting-Toomey, 2003). A simulation study indicated that an eight-factor CFA model may be too complex to test measurement invariance (Cheung & Rensvold, 2002); therefore, similar to past studies (e.g., Oetzel et al., 2000) the eight conflict styles were collapsed into three main overarching

156 C. V. Fletcher et al.

conflict styles (i.e., avoiding, dominating, and third-party help) that were then used in the primary analysis of the current study. The mean Cronbach α was 0.62 (0.52–0.78) for the avoiding subscale, 0.47 (0.33–55) for the dominating subscale, and 0.74 (0.63–0.82) for the third-party help subscale.

Self-construal was measured with 10 items from a previously validated 29-item instrument of self-construal (Gudykunst et al., 1996). Five items measured independ- ent self-construal, and five items measured interdependent self-construal. The 10 items were adapted to reflect the recalled conflict situation scenario with a romantic partner. For example, the word “partner” was added where specific context was needed (“I respect the majority of my partner’s wishes”). The validity of the self- construal scales is based on findings that the independent items correlate with individualistic cultural values, whereas the interdependent items correlate with collectivistic cultural values. The items are statements measured using a 5-point Likert-type scale with responses ranging from strongly agree (5) to strongly disagree (1). The mean Cronbach α was 0.60 (0.48–0.72) for the interdependent subscale and 0.57 (0.38–0.76) for the independent subscale.

Power distance was measured using a nine-item self-report instrument in which respondents answer on a 5-point Likert-type scale with responses ranging from strongly agree (5) to strongly disagree (1). Sample items included: “In most situations, people in authority should make decisions without consulting their subordinates” and “Once a person in authority makes a decision, people under him/her should not question it.” Prior studies have reported high reliability for the scale (e.g., Cronbach alpha = .88; Lam, Schaubroeck, & Aryee, 2002). The mean Cronbach α was 0.68 (0.59–0.77).

The Relationship Satisfaction Scale was used to measure the extent to which one is satisfied with one’s relationship with one’s romantic partner. It is a seven-item self- report instrument in which respondents answer on a 5-point Likert-type scale with responses ranging from strongly agree (5) to strongly disagree (1). Five elements of relationship satisfaction are assessed: acceptance, understanding, appreciation, other’s friends, and social life. Higher scores indicate being more satisfied with the relationship. Sample items included: “In general, how satisfied are you with your relationship?” and “To what extent has your relationship met your original expectations?” Prior research has reported high reliability for this scale (Pike & Sillars, 1985). The mean Cronbach α was 0.85 (0.69–0.92).

The Marital Forgiveness Scale is a nine-item scale measuring forgiveness. Forgiveness is viewed as an essential factor in healing and restoring relationships between people (Hargrave, 1994). Based on the recalled conflict, respondents answered on a 5-point Likert-type scale with responses ranging from strongly agree (5) to strongly disagree (1). Thus, higher scores are indicative of preference for forgiving others following a conflict situation. The scale measures three separate approaches towards forgiveness: benevolence, avoidance, and retaliation. Sample items include: “I soon forgave my partner [benevolence]” and “I found a way to make him/her regret it [retaliation].” Past research has found that forgiveness is important for marital conflict and spousal goals (Fincham, Beach, & Davila, 2004). Prior research has indicated the following internal reliabilities for each dimension:

Journal of International and Intercultural Communication 157

Benevolence =.86 and .85, Avoidance = .76 and .80, and Retaliation = .79 and .77 (Fincham, Beach, & Davila, 2004). In our study, the mean Cronbach α was 0.66 (0.37–0.77) for the benevolence subscale, 0.73 (0.59–0.85) for the avoidance subscale, and 0.63 (0.55–0.86) for the retaliation subscale.

Data collected with a few of the subscales showed unacceptably low Cronbach α values (e.g., avoiding in self-construal). Low α values typically indicate low item intercorrelations, and hence the lack of internal consistency, which may in turn contribute to the lack of invariance. However, it needs to be pointed out that low α values necessarily indicate neither a poor fit of CFA models (Miles & Shevlin, 2007) nor a lack of invariance. Furthermore, Cronbach’s α values are lower bounds to reliability and can largely underestimate true values of reliability (Sijtsma, 2009). These points highlight the importance of testing measurement invariance regardless of Cronbach’s α values.

Model Specifications and Statistical Analyses

Analyses were performed with R (version 2.12) and Mplus (version 6; Muthen & Muthen, 1998–2010).

Data cleaning and preliminary analysis. Before performing CFA, we followed the procedures below. First, invalid responses within each item (i.e., missing, 0, values greater than 5, or multiple-responses) were identified. Then, invalid participants, with invalid responses in more than 10% of the items (13), were identified and were excluded (n = 18). Using the remaining 1003 valid cases, descriptive statistics and Cronbach’s α for each scale and subscale were computed.

Even after removing these invalid cases, many participants (485/1003) still had at least one missing observation. To retain as many participants as possible in the analysis, we imputed missing observations using the MICE R package for multiple imputation. This algorithm takes advantage of all available information in a dataset and imputes multiple possible values; it is known to be the least biased estimator of missing observations under many circumstances (Schafer & Graham, 2002). Fifty values were imputed, and because many models did not successfully converge with multiple-imputed observations, presumably because of model complexity, the mean of the 50 observations was used as the final imputation to simplify the model.

Model fit and testing measurement invariance. To test the empirical validity and invariance across groups, the scales were submitted to CFA using Mplus. Because we were interested in measuring latent factors to ultimately compare their means and relationships across preexisting cultural groups, we needed to test whether these groups similarly responded to items of the measuring instruments (i.e., measurement invariance; Byrne, 2008; van de Vijver & Leung, 1997; van de Vijver & Matsumoto, 2011). While there are different approaches to test measurement invariance (Brown, 2006), we chose the multiple-groups CFA approach because it allows us to examine various aspects of invariance across groups. In addition, even though five-group CFA

158 C. V. Fletcher et al.

models can be very complex, the total valid sample size of around 1,000 would allow us to estimate parameters with sufficient precision.

To test measurement invariance, we followed the procedure recommended by Marsh and colleagues (2010). Their procedure takes several steps: configure invariance, weak measurement invariance, strong measurement invariance, and strict measurement invariance. In testing configural invariance, all parameters were freely estimated, and the model was evaluated within each group. Only overall patterns of the parameters were compared across groups. Model fit was evaluated using the following fit indices: (a) the ratio of χ2 to degrees of freedom (χ2/df); (b) the comparative fit index (CFI); (c) the Tucker–Lewis index (TLI); (d) the root mean square error of approximation (RMSEA); and (e) the standardized root mean square residual (SRMR). An acceptable model fit was defined as meeting at least four of the above criteria (Kline, 2005; Ledbetter, 2010): χ2/df < 3.0, RMSEA < .08, and CFI/TLI > .90, and SRMR < .10. The χ2 test was not used to judge model fit because it is sensitive to sample sizes and often mistakenly rejects a model (Kline, 2005).

To control for potential confounding variables, age, sex, and education were included. Using the series of log-likelihood (LR) tests, the model identified and included the covariate(s) that would give the best model fit while testing configural invariance (West, Welch, & Galecki, 2007). Afterwards, the same covariate(s) were included in the model to test more stringent invariances.

Following the test of configural invariance, we tested a series of measurement invariance in the following order. First, we tested weak measurement invariance by holding factor loadings invariant over the five groups. Some claim that this level of invariance is sufficient to compare group latent means, but not means of summed scores (Steinmetz, 2013). Second, strong measurement invariance was held when item intercepts were held invariant over groups. The test of strong measurement invariance is particularly important. According to Marsh and colleagues (2010), because researchers are typically interested in examining group differences in the latent construct, they compare latent factor means typically assuming that the factor means are a valid measurement of the construct. Nevertheless, contrary to the researchers’ assumption, the group differences in the latent factor means are not the valid measure of the group differences in the latent construct, unless factor loadings and item intercepts are invariant (i.e., strong measurement invariance is held; Marsh et al., 2010). Strict measurement invariance is tested by holding item residual variances constant across groups. This invariance is necessary when researchers wish to compare item scores (or factor scores) across groups as the index of latent construct (Marsh et al., 2010), even though Steinmetz’s (2013) study has found that strong invariance was sufficient for summed scores to accurately reflect the group differences in the latent means.

Statistical procedures for invariance tests. When testing the invariance of an instrument, researchers typically perform a χ2-difference test between strict and less strict models (e.g., weak invariant model and configural model); invariance across groups is supported when the χ2-difference test is not significant. That is, holding

Journal of International and Intercultural Communication 159

parameters constant across group did not result in a significant reduction in model fit; therefore, parameter values must be similar across groups (Brown, 2006; van de Vijver & Matsumoto, 2011). Nevertheless, even a slight reduction in model fit can result in a significant χ2-difference test, especially with large sample size. Therefore, methodologists have devised an alternative approach to the statistical test of invariance, which compares CFI and RMSEA between the models (Chen, 2007). According to this approach, invariance is supported when (1) a decrease is less than 0.01 in CFI and (2) an increase is less than 0.015 in RMSEA. Thus, invariance was considered definite when both conditions were met, and tentative when only one of them was met. When configural or measurement invariance was not supported, we were faced with two decisions: stop there and declare the noninvariance of the measurement scale, or modify parameters post hoc so that measurement invariance could still be held. Such modifications of the model are generally discouraged, especially at the hypothesis-testing phase (Marsh & Hau, 2007). Nevertheless, because the purpose of this study included exploration and hypothesis generation as well as hypothesis testing, we modified our model when Modification Indices (MI) suggested that invariance be supported by modifying one or two parameters. When MI suggested a particular modification only in one of two groups, we tested less strict partial invariance (Byrne, Shavelson, & Muthen, 1989; Marsh et al., 2010), by allowing some parameters to vary freely in those particular groups. When MI suggested the same modification across the majority of the groups (i.e., three or more), we applied the same modification to all five groups for the sake of congruence.

Results

Because some items showed evidence of extreme skewness (∣skewness∣ > 2.0) and kurtosis (∣kurtosis∣ > 3.0), we used Mardia’s test of multivariate normality, which indicated a significant violation of multivariate-normality (p < .001). Therefore, to obtain an accurate parameter estimate under multivariate non-normality, we used the robust maximum-likelihood estimator based on the Satorra–Bentler scaled χ2

(SBχ2, Satorra & Bentler, 1994).

Face Concerns

We first evaluated the configural-invariance model (Model 1) with a two-factor solution without any covariates. An exploratory factor analysis (EFA) showed that mutual-face and other-face concern items collapsed, so we tested two factors: self-face and other-face (see Oetzel & Ting-Toomey, 2003). Only three of the five fit indices were above the recommended benchmarks: SBχ2 (255, N = 1003) = 503.8, p < .01, SBχ2/df = 2.0, CFI = .88, TLI = .85, RMSEA = .07, SRMR = .08. Nevertheless, MI suggested that the residual variances of FC5 and FC7 be correlated in all groups but Ethiopia. FC5 was a question about the pride of one’s partner, and FC7 about the dignity of one’s partner; we reasoned that these items may have had too great an conceptual overlap (i.e., if they care about the pride of their partners, they would

160 C. V. Fletcher et al.

necessarily care about the dignity of their partners). Therefore, we allowed the residual variances of FC5 and FC7 to be correlated. The resulting Model 1b had a markedly improved model fit: SBχ2 (210, N = 1003) = 417.9, p < .01, SBχ2/df = 2.0, CFI = .91, TLI = .88, RMSEA = .07, SRMR = .06.

Next, we tested the weak measurement invariance by holding factor loadings constant across groups (Model 2b). This model had a similar fit to that of Model 1b, SBχ2 (246, N = 1003) = 469.5, p < .01, SBχ2/df = 1.9, CFI = .91, TLI = .89, RMSEA = .07, SRMR = .08; ΔCFI improvement of 0.007 and ΔRMSEA improvement of 0.003. Thus, weak measurement invariance was definitely supported. In testing strong measurement invariance, item intercepts were held constant across groups (Model 3b). Model 3 had a considerably worse fit than Model 2b, SBχ2 (290, N = 1003) = 868.7, p < .01, SBχ2/df = 3.0, CFI = .75, TLI = .77, RMSEA = .10, SRMR = .13, ΔCFI = 0.150, ΔRMSEA = 0.030. No modification indicated parsimonious improvement.

Conflict Style

We first evaluated the configural invariance by fitting a model (Model 1) with a three-factor solution without any covariates. Results of an EFA showed eight conflict styles loading onto three overarching factors: dominating (emotional expressiveness, passive aggressiveness), integrating (third-party help), and avoiding (comprising and obliging). Oetzel et al. (2000) also used the collapsed three-factor conflict style model. Only three of the five fit indices were above the recommended benchmarks: SBχ2

(255, N = 1003) = 503.8, p < .01, SBχ2/df = 2.0, CFI = .88, TLI = .86, RMSEA = .07, SRMR = .08. MI suggested that the residual variances of CS6 and CS14 be correlated in all groups but Ethiopia. Because CS6 was a question about using power in decision making, and CS14 about using authority for one’s competition, we reasoned that these items may have been too highly correlated (i.e., one who uses power for their advantage would surely take advantage of their authority). Therefore, we allowed the residual variances of CS6 and CS14 to be correlated. The resulting Model 1b had an acceptable model fit: SBχ2 (250, N = 1003) = 455.6, p < .01, SBχ2/df = 1.8, CFI = .90, TLI = .87, RMSEA = .06, SRMR = .08.

We subsequently tested the weak measurement invariance (Model 2b). This model had a worse fit than Model 1b, SBχ2 (286, N = 1003) = 544.3, p < .01, SBχ2/df = 1.9, CFI = .88, TLI = .86, RMSEA = .07, SRMR = .08, but ΔCFI was 0.025, and ΔRMSEA was 0.003. Thus, weak measurement invariance was tentatively supported. In testing strong measurement invariance, Model 3b had a considerably worse fit than Model 2b, SBχ2 (334, N = 1003) = 792.5, p < .01, SBχ2/df = 2.4, CFI = .78, TLI = .78, RMSEA = .08, SRMR = .12, ΔCFI = 0.096, ΔRMSEA = 0.040. No modification indicated parsimonious improvement.

Self-Construal

We first evaluated the configural invariance by fitting a model (Model 1) with a two-factor solution with no covariates. There were multivariate outliers among

Journal of International and Intercultural Communication 161

Taiwanese that prevented the model from converging; therefore, we identified the outliers using the Mahalanobis-Distance test (n = 17) and excluded them from further analysis. Only three of the five fit indices were above the recommended benchmarks: SBχ2 (170, N = 986) = 325.6, p < .01, SBχ2/df = 1.9, CFI = .83, TLI = .78, RMSEA = .08, SRMR = .07. In addition, MI suggested that, to achieve configural invariance, many of the model’s parameters should be freed. Thus, we concluded that configural invariance was not supported.

Power Distance

We first evaluated the configural invariance by fitting a model (Model 1) that included education as a covariate with a one-factor solution. Four of the five fit indices were above the recommended benchmarks: SBχ2 (70, N = 1003) = 131.7, p < .01, SBχ2/df = 1.9, CFI = .93, TLI = .90, RMSEA = .07, SRMR = .05. Nevertheless, MI suggested that the residual variances of PD1 and PD2 be correlated in the majority of the groups (Ethiopia, China, and Taiwan). Because PD1 was a question about how ineffective or effective participants’ conflict style was, and PD2 about how unsuccessful or successful participants’ conflict style was, we reasoned that these items may have had too great a conceptual overlap (i.e., successful conflict style tends to be effective, and vice versa). Therefore, we allowed the residual variances of PD1 and PD2 to be correlated. The resulting Model 1b had a markedly improved model fit: SBχ2 (65, N = 1003) = 92.2, p = .02, SBχ2/df = 1.4, CFI = .97, TLI = .95, RMSEA = .05, SRMR = .04.

Next, we tested the weak measurement invariance by holding factor loadings constant across groups, while allowing the correlated residuals between PD1 and 2 (Model 2b). This model had a considerably worse fit than Model 1b, SBχ2 (80, N = 1003) = 183.1, p < .01, SBχ2/df = 2.3, CFI = .89, TLI = .87, RMSEA = .08, SRMR = .09; ΔCFI was 0.08, and ΔRMSEA was 0.030, greater than 0.015. Thus, weak measurement invariance was not supported. MI did not suggest parsimonious ways to improve the model fit.

Relational Satisfaction

We first evaluated the configural invariance by fitting a model that included age, sex, and education (Model 1) with a one-factor solution. The five fit indices were above the recommended benchmarks: SBχ2 (85, N = 1003) = 110.3, p = .03, SBχ2/df = 1.3, CFI = .99, TLI = .98, RMSEA = .04, SRMR = .03. Next, we tested the weak measurement invariance by holding factor loadings constant across groups (Model 2). While this model had a slightly worse fit than Model 1, SBχ2 (101, N = 1003) = 139.3, p < .01, SBχ2/df = 1.4, CFI = .98, TLI = .98, RMSEA = .04, SRMR = .06, the difference in CFI (ΔCFI) was 0.007, less than the criterion value of 0.01, and the difference in RMSEA (ΔRMSEA) was 0.005, again less than the criterion value of 0.015. Thus, weak measurement invariance was definitely supported. In testing strong measurement invariance, item intercepts were held constant across groups (Model 3). Again, Model 3

162 C. V. Fletcher et al.

had a slightly worse fit than Model 2, SBχ2 (121, N = 1003) = 192.5, p < .01, SBχ2/df = 1.6, CFI = .96, TLI = .96, RMSEA = .05, SRMR = .07. ΔCFI was 0.017, slightly greater than the criterion value of 0.01, but ΔRMSEA was 0.011, less than the criterion value of 0.015. Thus, strong measurement invariance was tentatively supported. In testing strict measurement invariance, item residual variances were held constant across groups (Model 4). Model 4 had a markedly worse fit than Model 3, SBχ2 (141, N = 1003) = 470.5, p < .01, SBχ2/df = 3.3, CFI = .83, TLI = .85, RMSEA = .11, SRMR = .14. ΔCFI was 0.13, and ΔRMSEA was 0.05, both far greater than the accepted values. In addition, MI suggested diffused sources of model-fit reductions; that is, modifying one or two parameters would not lead to a marked improvement in model fit. Thus, strict measurement invariance was not supported.

Forgiveness Scale

We first evaluated the configural invariance by fitting a model (Model 1) with a three-factor solution plus sex, age, and education as covariates. There were multi- variate outliers among Ethiopians that prevented the model from converging; therefore, we identified the outliers using the Mahalanobis-Distance test (n = 9) and excluded them from further analysis. Five of the five fit indices were above the recommended benchmarks: SBχ2 (210, N = 994) = 382.0, p < .01, SBχ2/df = 1.8, CFI = .92, TLI = .89, RMSEA = .06, SRMR = .05. Next, we tested the weak measurement invariance by holding factor loadings constant across groups (Model 2). This model had an almost identical fit to that of Model 1, SBχ2 (234, N = 994) = 390.3, p < .01, SBχ2/df = 1.7, CFI = .93, TLI = .91, RMSEA = .06, SRMR = .06; ΔCFI was improved by 0.007, and ΔRMSEA was improved by 0.006. Thus, weak measurement invariance was definitely supported. In testing strong measurement invariance, item intercepts were held constant across groups (Model 3). Model 3 had a considerably worse fit than Model 2, SBχ2 (270, N = 994) = 656.6, p < .01, SBχ2/df = 2.4, CFI = .83, TLI = .80, RMSEA = .09, SRMR = .08, ΔCFI = 0.10, ΔRMSEA = 0.03. While we could stop here, we examined MI to see whether freeing a few item intercepts would improve the model. MI suggests that FS6 and FS9 in Uganda, China, and Taiwan would considerably improve the model fit. Following MI, we freed the intercepts for FS6 and FS9 and refit the model (Model 3b), which was considerably better than Model 3: SBχ2 (202, N = 994) = 408.0, p < .01, SBχ2/df = 2.0, CFI = .90, TLI = .89, RMSEA = .07, SRMR = .08, ΔCFI = 0.03, ΔRMSEA = 0.014. This was evidence of the Differential Item Functioning (DIF) of these items. That is, participants from different groups responded more or less strongly to specific items, in this case FS6 and FS9, even though they presumably had the same levels of latent-factor scores (Zumbo, 2007). Thus, after controlling for DIF, strong measurement invariance was tentatively supported. In testing strict measurement invariance, item residual variances were held constant across groups, except for those for FS6 and FS9 (Model 4b). Model 4b had a considerably worse fit than Model 3b: SBχ2 (230, N = 994) = 679.2, p < .01, SBχ2/df = 3.0, CFI = .78, TLI = .79, RMSEA = .10, SRMR = .12, ΔCFI = 0.12, ΔRMSEA = 0.03. MI suggested diffused sources of model-fit reductions; that is, modifying one or two

Journal of International and Intercultural Communication 163

parameters would not lead to a marked improvement in model fit. Thus, strict measurement invariance was not supported.

Summary

Table 2 summarizes the results of invariance testing. The most basic configural invariance was held in five of the six survey instruments tested. That is, the hypothesized factor structure supposedly measured by each instrument is congruent across cultural groups as diverse as China, Taiwan, Uganda, Ethiopia, and the US. In addition, the weak invariance was held in four of the instruments (three definitely and one tentatively). Taken together, our study suggests that what communication (or psychological) construct each instrument measures, and how the items in it measure the construct, appears to be similar across the five cultural groups examined.

On the other hand, more stringent types of variance were held in data collected using fewer instruments. Strong invariance was held only in two instruments, and neither was definite, while no strict invariance was held. The implications of these findings and recommendations are discussed below. At any rate, measurement invariance must be routinely tested in communication studies.

Some readers may be concerned that the total sample size (N = ∼1,000) may be too small given the number of countries and numbers of items per instruments. It has been shown that sample size per culture affects the indices of invariance little, much less than the number of indicators per factor and the number of factors (Cheung & Rensvold, 2002). That is, even a sample size per culture is as small as 150, the invariance indexes are relatively unbiased (i.e., deviate less than 5% from the population value) when there were fewer than three factors, and the number of indicators per factor was five or fewer. Thus, we are confident that, given our average sample size per group of ∼200 and numbers of factors and indicators, our results are accurate.

Discussion

This paper, responding to the growing awareness of establishing measurement equivalency and invariance to ensure construct comparability across cultures (e.g., Budruk, 2010; Byrne et al., 2009; Gudykunst, 2005), models and promotes this

Table 2 Summary of Invariance-Test Results

Types of invariance

Scales Configural Weak Strong Strict

Relationship Satisfaction Yes Definite Tentative No Power Distance Yes No Forgiveness Scale Yes Definite Partial, tentative No Self-Construal No Face Concerns Yes Definite No Conflict Style Yes Tentative No

164 C. V. Fletcher et al.

important but often neglected practice in communication research. This paper is part of a larger study that tests and extends FNT to cross-cultural romantic relationship contexts by adding three scales (i.e., relationship satisfaction, power distance, and forgiveness) to the existing scales associated with FNT (i.e., face concerns, conflict styles, and self-construal). Considering the importance of establishing measurement equivalence for meaningful cross-cultural comparisons, this paper tested the measurement invariance of the scales used in the overarching study as a first step before actually extending FNT to include these relational constructs. The results highlight the importance of routinely testing invariance in cross- and intercultural research. The authors would like to offer specific suggestions when a particular invariance is supported or not.

First, when strict invariance is held, researchers can compute summed scores within a scale because these scores will be accurate indices of the latent construct (Marsh et al., 2010). However, strict invariance may rarely be supported, as we have seen that it was not supported in the data collected with any of the six scales examined. It appears that one of the sources of strict noninvariance may be due to differences in possible response patterns across groups (van de Vijver & Matsumoto, 2011). For instance, in Relational Satisfaction, Ugandans had greater item standard deviations (mean SD = 1.31 vs. 0.94–1.14 for the other cultures) than the other groups, which may have contributed to their greater residual variances. Such a response pattern may be characterized as part of a larger cultural value to choose extreme values. In contrast, Taiwanese had the smallest item standard deviations in all scales, which may have contributed their smaller residual variances. Such a response pattern may be characterized as choosing middle values, similar to the data discussed earlier regarding Japanese data collection suggesting a cultural preference for less expressive answers (e.g., agree, disagree).

Second, when strict invariance is not supported, but strong invariance is (at least partially) supported, the literature seems to offer different pieces of advice. Marsh and colleagues (2010) suggested the use of latent factor means for group comparison, as strict invariance is not necessary for this approach. On the other hand, Steinmetz (2013) appeared to suggest that it would be safe to compute summed scores as indices of the latent means, as long as strong invariance was fully supported (i.e., all item intercepts across groups are equal).

Third, if the strong invariance is not supported or only partially supported (i.e., some items are allowed to vary across groups owing to DIF), and only weak invariance is supported, we suggest that researchers avoid computing summed scores. A simulation study (Steinmetz, 2013) has shown that when groups differ in even one of the four- or six-item intercepts, using summed scores to compare group means inflated Type I error rates from the typical 5% to up to 21%. This is because summed scores include random as well as systematic measurement errors (e.g., errors owing to an artifact such as DIF and/or cultural response patterns) and can create spurious group differences when there is no true group difference in the constructs of interest. Conversely, true group differences may be masked by lowering the power of a study (Brown, 2006). In the current study, strong invariance was fully supported in only

Journal of International and Intercultural Communication 165

one of the six scales, which suggests that strong invariance may be fully supported infrequently in the literature. Therefore, we suggest that researchers use latent means as the method of group comparison, not summed scores, to maintain Type I error rates at the nominal level. In the case where weak invariance is not fully supported, Steinmetz (2013) recommended that researchers use structural modeling/CFA to compare latent means, instead of summing observed scores, because latent means are more accurate indices of latent constructs, relatively free of the random and/or systematic errors described above.

To illustrate this point, we use relationship satisfaction data as an example. The means of summed scores/latent scores (after establishing tentative strong equival- ence) for the US, China, Ethiopia, Taiwan, and Uganda were 19.8/0.0, 19.2/−0.1, 19.5/−0.1, 20.6/0.1, and 18.4/−0.3 (latent means were computed using the US as the reference group). Even though the rank orders among the five countries were similar between the two scores, there was a difference in the statistical test results: Even though both summed and latent means of Ugandans were significantly greater than those of US Americans, only the Taiwanese’s summed mean, but not their latent mean, was significantly smaller than the US Americans’ means. This result indicates that the random and/or systematic errors associated with summed scores may have inflated the group difference between Taiwanese and US Americans in relationship satisfaction, which resulted in a significant difference in the summed means but not in the latent means.

Fourth, what should one do if configural invariance is not supported? In fact, in the current study configural invariance was not supported in one scale (Self-Construal). We can offer two suggestions for this noninvariance at the basic level. First, one can examine possible problems inherent to a particular instrument. Because most survey instruments have been developed using Western populations, they may have suffered various issues owing to linguistic, sample, and/or functional nonequivalence when they were used in non-Western countries. If this were true, one would observe that the model-fit indices for the US would have been all acceptable, while other countries had had a worse fit, bringing overall configural invariance down to unacceptable levels.

Our second suggestion is that one considers alternative approaches to multigroup CFA. Some methodologists have argued that the assumptions for the multigroup CFA approach may be unreasonable (Marsh et al., 2010): This approach, which assumes that each item loads onto only one factor, may be too restrictive, especially when scales consist of multiple factors and many items. For instance, even the most basic configural invariance has rarely been supported in the NEO-Five-Factor Inventory of personality, a very well-tested and established scale based on EFA (Marsh et al., 2010). With such a scale, Marsh, and colleagues argue that a more appropriate approach may be less restrictive exploratory structural equation modeling, with which they established even the most stringent invariance in the Inventory. Similarly, partial least-square path modeling can be another alternative approach, especially when the numbers of constructs and items are large, relative to the sample size (Reinartz, Haenlein, & Henseler, 2009). Multiple-Indicators, Multiple Causes would be another

166 C. V. Fletcher et al.

alternative, if a study contains many groups relative to the sample size (Brown, 2006). Our future study will examine how effective these approaches are in establishing equivalences across cultures.

Ultimately, this paper, as a first step in a larger examination, extension, and test of FNT in five countries representing collectivistic, communalistic, and individualistic cultures, demonstrates and models the importance of establishing measurement equivalence in (theory-based) cross-cultural communication research. We would like to emphasize that measurement noninvariance is not necessarily a nuisance. Instead, if real, noninvariance it may offer a key insight to how different cultures operate communicatively, psychologically, and cognitively. Therefore, we suggest that cross- and intercultural researchers incorporate testing of invariance to locate and distill potential components of true cultural differences (van de Vijver & Matsumoto, 2011).

References

Anderson, T., & Emmers-Sommer, T. (2006). Predictors of relationship satisfaction in online romantic relationships. Communication Studies, 57, 153–172. doi:10.1080/10510970600666834

Awa, N.E. (1988). Communication in Africa; Implications for development planning. The Howard Journal of Communications, 1, 131–144. doi:10.1080/10646178809359686

Borsboom, D. (2006). When does measurement invariance matter? Medical Care, 44(11), 176–181. doi:10.1097/01.mlr.0000245143.08679.cc

Bracken, B.A., & Barona, A. (1991). State of the art procedures for translating, validating and using psychoeducational tests in cross-cultural assessment. School Psychology International, 12, 119–132. doi:10.1177/0143034391121010

Brown, P., & Levinson, S.C. (1987). Politeness: Some universals in language usage. New York, NY: Cambridge University Press.

Brown, T.A. (2006). Confirmatory factor analysis for applied research. New York, NY: Guilford Press.

Budruk, M. (2010). Cross-language measurement equivalence of the place attachment scale: A multigroup confirmatory factor analysis approach. Journal of Leisure Research, 42, 25–42.

Byrne, B.M. (2008). Testing for multigroup equivalence of a measuring instrument: A walk through the process. Psycothema, 20, 872–882.

Byrne, B.M., Oakland, T., Leong, F.T.L., van de Vijver, F.J.R., Hambleton, R.K., & Cheung, F.M. (2009). A critical analysis of cross-cultural research and testing practices: Implications for improved education and training in psychology. Training and Education in Professional Psychology, 3, 94–105. doi:10.1037/a0014516

Byrne, B.M., Shavelson, R.J., & Muthen, B.O. (1989). Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychological Bulletin, 105, 456–466. doi:10.1037/0033-2909.105.3.456

Chen, F.F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14, 464–504. doi:10.1080/10705510701301834

Cheung, F.M., van de Vijver, F.J., & Leong, F.T. (2011). Toward a new approach to the study of personality in culture. American Psychologist, 66, 593–603. doi:10.1037/a0022389

Cheung, G.W., & Rensvold, R.B. (2002). Evaluating goodness-of-fit indexes for testing measurement Invariance. Structural Equation Modeling, 9, 233–255. doi:10.1207/S15328007SEM0902_5

Earley, P.C., & Erez, M. (1997). The transplanted executive. New York, NY: Oxford University Press. Fincham, F.D., Beach, S.R., & Davila, J. (2004). Forgiveness and conflict resolution in marriage.

Journal of Family Psychology, 18, 72–81. doi:10.1037/0893-3200.18.1.72 Gudykunst, W.B. (2002). Issues in cross-cultural communication research. In W.B. Gudykunst &

B. Mody (Eds.), Handbook of international and intercultural communication (pp. 165–177). Thousand Oaks, CA: Sage.

Gudykunst, W.B. (2005). Theorizing about intercultural communication. Thousand Oaks, CA: Sage.

Journal of International and Intercultural Communication 167

Gudykunst, W.B., Matsumoto, Y., Ting-Toomey, S., Nishida, T., Kim, K., & Heyman, S. (1996). The influence of cultural individualism–collectivism, self construals, and individual values on communication styles across cultures. Human Communication Research, 22, 510–543.

Hargrave, T. (1994). Healing wounds in the intergenerational family. Levittown, PA: Brunner/Mazel. Hendrick, S.S. (1988). A generic measure of relationship satisfaction. Journal of Marriage and the

Family, 50, 93–98. Hofstede, G. (1980). Culture’s consequences: International differences in work related attitudes.

Beverly Hills, CA: Sage. Hofstede, G. (1983). Dimensions of national cultures in fifty countries and three regions. In

J. Deregowski, S. Dzuirawiec, & R. Annis (Eds.), Explications in cross-cultural psychology (pp. 335–355). Lisse, Netherlands: Swets and Zeitlinger.

Holmgren, M. (1993). Forgiveness and the intrinsic value of persons. American Philosophical Quarterly, 30, 341–352.

Hui, C.H., & Triandis, H.C. (1989). Effects of culture and response format on extreme response style. Journal of Cross-Cultural Psychology, 20, 296–309. doi:10.1177/0022022189203004

Kline, R.B. (2005). Principles and practice of structural equation modeling (2nd ed.). New York, NY: Guilford Press.

Lam, S.S.K., Schaubroeck, J., & Aryee, S. (2002). Relationship between organizational justice and employee work outcomes: A cross national study. Journal of Organizational Behavior, 23, 1–18.

Ledbetter, A.M. (2010). Assessing the measurement invariance of relational maintenance behavior when face-to-face and online. Communication Research Reports, 27, 30–37.

Levine, T.R., Park, H.S., & Kim, R.K. (2007). Some conceptual and theoretical challenges for cross- cultural communication research in the 21st century. Journal of Intercultural Communication Research, 36, 205–221. doi:10.1080/17475750701737140

Marsh, H.W., & Hau, K.T. (2007). Applications of latent-variable models in educational psychology: The need for methodological-substantive synergies. Contemporary Educational Psychology, 33, 151–170. doi:10.1016/j.cedpsych.2006.10.008

Marsh, H.W., Ludtke, O., Muthen, B., Asparouhov, T., Morin, A.J., & Trautwein, U. (2010). A new look at the big five factor structure through exploratory structural equation modeling. Psychological Assessment, 22, 471–491. doi:2010-18043-001 [pii] 10.1037/a0019227

Miles, J., & Shevlin, M. (2007). A time and a place for incremental fit indices. Personality and Individual Differences, 42, 869–874. doi:10.1016/j.paid.2006.09.022

Muthen, L.K., & Muthen, B.O. (1998–2010). Mplus user’s guide (6th ed.). Los Angeles, CA: Muthen & Muthen.

Oetzel, J.G., & Ting-Toomey, S. (2003). Face concerns in interpersonal conflict: A cross-cultural empirical test of the face negotiation theory. Communication Research, 30, 599–624. doi:10.1177/0093650203257841

Oetzel, J.G., Ting-Toomey, S., Masumoto, T., Yokochi, Y., Pan, X., Takai, J., & Wilcox, R. (2001). Face and facework in conflict: A cross-cultural comparison of China, Germany, Japan, and the United States. Communication Monographs, 68, 235–258. doi:10.1080/03637750128061

Oetzel, J.G., Ting-Toomey, S., Yokochi, Y., Masumoto, T., & Takai, J. (2000). A typology of facework behaviors in conflicts with best friends and relative strangers. Communication Quarterly, 48, 397–419. doi:10.1080/01463370009385606

Ohbuchi, K.I., Sato, S., & Tedeschi, J.T. (1999). Nationality, individualism–collectivism, and power distance in conflict management. Tohoku Psychologica Folia, 58, 36–49.

Pike, G.R., & Sillars, A.L. (1985). Reciprocity of marital communication. Journal of Personal and Social Relationships, 3, 303–324. doi:10.1177/0265407585023005

Poortinga, Y.H. (1995). Cultural bias in assessment: Historical and thematic issues. European Journal of Psychological Assessment, 11, 140–146. doi:10.1027/1015-5759.11.3.140

Ptacek, J.T., & Dodge, K.L. (1995). Coping strategies and relationship satisfaction in couples. Personality and Social Psychology Bulletin, 21, 76–84. doi:10.1177/0146167295211008

Reinartz, W.J., Haenlein, M., & Henseler, J. (2009). An empirical comparison of the efficacy of covariance-based and variance-based SEM. International Journal of Research in Marketing, 26, 332–344. doi:10.1016/j.ijresmar.2009.08.001

168 C. V. Fletcher et al.

Satorra, A., & Bentler, P.M. (1994). Corrections to test statistics and standard errors in covariance structure analysis. In A. von Eye & C.C. Clogg (Eds.), Latent variance analysis: Applications for developmental research (pp. 399–419). Thousand Oaks, CA: Sage.

Schafer, J.L., & Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147–177.

Sijtsma, K. (2009). Reliability beyond theory and into practice. Psychometrika, 74, 169–173. doi:10.1007/s11336-008-9103-y

Smith, L.M., Heaven, P.C.L., & Ciarrochi, J. (2008). Trait emotional intelligence, conflict communication patterns, and relationship satisfaction. Personality and Individual Differences, 44, 1314–1325.

Steinmetz, H. (2013). Analyzing observed composite differences across groups. Is partial measurement invariance enough? Methodology, 9, 1–12. doi:10.1027/1614-2241/a000049

Ting-Toomey, S. (1988). Intercultural conflict styles: A face-negotiation theory. In Y.Y. Kim & W.B. Gudykunst (Eds.), Theories in intercultural communication (pp. 213–235). Newbury Park, CA: Sage.

Ting-Toomey, S. (2005). The matrix of face: An updated face-negotiation theory. In W.B. Gudykunst (Eds.), Theorizing about intercultural communication (pp. 71–117). Thousand Oaks, CA: Sage.

Ting-Toomey, S., Gao, G., Trubisky, P., Yang, Z., Kim, H.S., Lin, S.L., & Nishida, T. (1991). Culture, face maintenance, and styles of handling interpersonal conflict: A study of five cultures. The International Journal of Conflict Management, 2, 275–96. doi:10.1108/eb022702

Ting-Toomey, S., & Kurogi, A. (1998). Facework competence in intercultural conflict: An updated face-negotiation theory. International Journal of Intercultural Relations, 22, 187–225. doi:10.1016/S0147-1767(98)00004-2

Ting-Toomey, S., Oetzel, J.G., & Yee-Jung, K. (2001). Self-construal types and conflict management styles. Communication Reports, 14, 87–103. doi:10.1080/08934210109367741

Triandis, H. (1994). Culture and social behavior. New York, NY: McGraw-Hill. van de Vijver, F.J.R., & Leung, K. (1997). Methods and data analysis for cross-cultural research.

Thousand Oaks, CA: Sage. van de Vijver, F.J.R., & Matsumoto, D. (2011). Introduction to the methodological issues associated

with cross-cultural research. In D. Matsumoto & F.J.R. van de Vijver (Eds.), Cross-cultural research methods in psychology (pp. 1–14). New York, NY: Cambridge University Press.

Vandenberg, R.J., & Lance, C.E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organ- izational Research Methods, 3, 4–70. doi:10.1177/109442810031002

West, B.T., Welch, K.B., & Galecki, A.T. (2007). Linear mixed models: A practical guide using statistical software. Boca Raton, FL: Chapman & Hall/CRC.

Zumbo, B.D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4, 223–233. doi:10.1080/ 15434300701375832

Journal of International and Intercultural Communication 169

Copyright of Journal of International & Intercultural Communication is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

  • Abstract
  • Introduction
  • Literature Review
    • FNT
    • Proposed New Theoretical and Outcomes Perspectives
      • Power Distance
      • Relationship satisfaction
      • Forgiveness
    • Establishing Equivalency
    • Measurement Invariance
  • Methods
    • Research Sites
    • Participants
    • Survey Instrument and Procedures
    • Model Specifications and Statistical Analyses
      • Data cleaning and preliminary analysis
      • Model fit and testing measurement invariance
      • Statistical procedures for invariance tests
  • Results
    • Face Concerns
    • Conflict Style
    • Self-Construal
    • Power Distance
    • Relational Satisfaction
    • Forgiveness Scale
    • Summary
  • Discussion
  • References