Week 7.2 spss
183
Volume 14, Issue 1 August 2020
Designing and Piloting a Repeated-Measures ANOVA Study on L2 Academic Writing:
Methodology and Challenges
Larisa Nikitina
University of Malaya, Malaysia
larisa@um.edu.my
Abstract
This article highlights methodological challenges inherent in implementing a repeated-
measures ANOVA study on L2 academic writing and suggests possible solutions to these
challenges. The RM-ANOVA design is employed when the same participants are measured more
than once. Therefore, a repeated-measures ANOVA study with its several sessions of data
collection and multiple variables requires a meticulous planning and well-planned
implementation. This article highlights important considerations that researchers might want to
pay attention to while designing and implementing a repeated-measures ANOVA study. These
considerations pertain to selecting and operationalizing the study’s variables, recruiting the
participants, selecting an appropriate research instrument and ensuring that the data are reliable
and valid.
Keywords: L2 academic writing; research methods; RM-ANOVA design; university students
Introduction
Task-based language teaching (TBLT) provides a framework for second language (L2)
researchers and practitioners to design class tasks that are meaningful for the language learners,
that promote L2 communication in the classroom and enhance the learners’ use of authentic
language while performing the tasks (Willis & Willis, 2008). A cognitive model of task-based
instruction— the Cognition Hypothesis—asserts that pedagogical tasks with increased cognitive
184
demands will help to achieve two pedagogical aims. Firstly, these tasks will nudge the language
learners to produce more accurate and complex language. Secondly, they will stimulate the
learners to engage in a lengthier interaction and heighten their attention to, and memory of, the
linguistic input (Robinson, 2003a, 2003b). The literature on the Cognition Hypothesis portrays
task complexity as an important cognitive factor that needs to be properly addressed while
designing a task. This is because this creates more learning opportunities and strengthens
attentional mechanism for L2 production, development and acquisition (Robinson, 2003b, 2005,
2007).
The Cognition Hypothesis and its associated Triadic Componential Framework (TCF)
provides a taxonomic system for pedagogical task design and implementation. Three broad
classificatory categories, namely, task complexity, task condition and task difficulty are each
guided by two continuums (Robinson, 2001a, 2003b, 2005, 2007). A key component—task
complexity and its resource-directing and resource-dispersing continuums—can be operated
according to the -/+ continua to determine the cognitive complexity of a task and its emphasis
on L2 production, development and acquisition31. Causal reasoning demands is one of the
resource-directing factors that promotes L2 production. However, as a review of scholarly
literature shows, it is a greatly underexplored factor. Not only does the cognitive factor or causal
reasoning demands requires empirical validation, but also the interactive factor that refers to the
number of participants in the task (or the task condition) is of the utmost importance.
It is possible that the scarcity of studies that address this important topic in the literature
on L2 academic writing is due to the methodological considerations and challenges that
researchers have to face both at the research design and research implementation stages. Such a
study would require a complex research design that involves a meticulous planning and thorough
execution to allow for a smooth implementation and eventual success of the research project.
Therefore, the current article focuses on the methodological issues that need to be addressed
while designing a study that aims to examine the effects of -/+ causal reasoning demands across
different -/+ number of participant groupings (i.e., individual, dyadic and triadic) on the L2
written modality. The dependent variables for the writing modality in a study highlighted here
are Complexity, Accuracy and Fluency (CAF) in the L2 written production.
31 The continuum +/-, which is attached to the assessed components in a study on L2 academic writing, indicates that “there is relatively more versus relatively less” of the component (Robinson 2001b, p.30). The components may be the task complexity, the number of participants, the allocated time, etc.
185
To be more specific, among the variables, Complexity refers to the number of clauses the
learner connects or includes within a sentence (Foster & Skehan, 1996). This construct in L2
production shows the development of the restructuring process within the L2 learners’
interlanguage systems (Skehan, 1996). Accuracy refers to the learner’s ability to exercise the
maximum level of control to prevent errors during a language performance (Ellis, 2003). Fluency
refers to the learner’s ability to use the language with a high number of words (Larsen-Freeman,
2006).
This article is structured as follows. Firstly, it reviews the methodology and statistical
procedures employed in the earlier studies on the effects of task complexity on the L2 individual
written production in terms of Complexity, Accuracy and Fluency. Then, it highlights important
considerations that researchers might want to attend to while designing, piloting and
implementing a repeated-measures ANOVA (RM-ANOVA) study. The article gives examples of
the challenges that could be faced during data collection and data analysis procedures. While the
actual findings are not reported here, the article gives several examples from a study that adopted
the RM-ANOVA research design. The article provides suggestions to possible actions that could
help to successfully overcome these challenges. In short, this article might offer a useful
guidance to novice researchers who would like to implement their own RM-ANOVA study.
Review of Literature
Earlier research studies have employed various methodologies, instruments and statistical
procedures to investigate the effects of task features on L2 academic writing. Ishikawa (2007)
employed a one-way ANOVA to examine the effects of Here and Now on the Complexity,
Accuracy and Fluency (CAF) measures in the L2 narrative written output produced by 54
Japanese third-year high school students. The Here and Now variable was manipulated with the
availability of the cartoon strip. The findings revealed that more complex tasks resulted in higher
CAF indices. This finding was compatible with the Cognition Hypothesis that postulates that
cognitively more complex tasks would have positive effects on the quality of the L2 production.
However, it is not clear if the resource-dispersing variable—planning time—was included in the
study by Ishikawa. Therefore, it could be suggested that the results could be due to the effects of
the pre-task planning time, which might have lessened the complexity level of the resource-
directing here and now. This issue was also noted by Skehan (2009).
186
Kuiken and Vedder (2007) employed a repeated measures ANOVA (RM-ANOVA)
design to investigate the effects of task complexity with -/+ number of elements, and -/+ number
of reasoning demands on the language performance. The participants were Dutch learners of
French and Italian who were instructed to write a letter about the choice of the holiday
destination. The independent variables included 3 elements to reason the choice of the holiday
destination for the non-complex writing task and 6 elements for the complex writing task. The
assessment was based on the general versus specific measures of writing proficiency. In the
study, accuracy was examined by counting the type of errors made in the L2 texts, whereas
lexical complexity was inspected by distinguishing the frequent words from infrequent ones. The
results revealed that the complex writing task led to a significant decrease of errors and yielded a
lexically more complex text. The effects of task complexity on higher accuracy could be mainly
attributed to lower ratios of lexical errors in the more complex task.
A study by Ruiz-Funes (2015) examined the effects of task complexity and several
learner-related variables in essay writing. The researcher focused on the CAF measures and the
participants were L2/foreign language (FL) groups with advanced and intermediate language
proficiency. Similar to Ishikawa’s (2007) finding, Ruiz-Funes detected a positive impact of the
increased task complexity on syntactic complexity, accuracy and fluency with the advanced level
learners. The results also revealed that the complex tasks yielded a higher syntactic complexity
but had a lower accuracy and fluency. However, it was also found that there were positive
changes in syntactic complexity, accuracy and fluency in the complex task writings of the
advanced learners.
A series of later studies by Kuiken and Vedder (2008, 2009, 2011, 2012) employed a
RM-MANOVA analysis to investigate the effects of task complexity, manipulated with ±
number of elements and ± reasoning demands, and high and low proficiency learners in the L2
written and spoken production. The findings from these studies showed positive impacts of
increasing task complexity mostly on accuracy. However, they did not indicate any effect on
syntactic complexity and lexical variation. It was found that increasing task complexity along
resource-directing variables led to higher accuracy in the L2 written output. Also, the findings
indicated that the learners performed with a higher accuracy in complex tasks and there were
decreases in the lexical errors. This result contradicted the findings regarding the lexical
variation. One of the studies indicated that the effects of task complexity on L2 academic written
187
production was not dependent on the oral and written production modes. As the earlier studies
indicated, positive impacts on accuracy were repeatedly identified in the written L2 productions;
however, there was no statistically significant effect on the lexical variety. Also, in the written
L2 production mode there was no effect on the syntactic complexity. These results might be due
to different task types employed in the studies, which might have affected the learners’ attention
and dispersed it to different dimensions of the L2 production (Skehan, 2009).
In a more recent study, Frear and Bitchener (2015) partially replicated Kuiken and
Vedder’s (2012) operationalized reasoning demands variable with three letter-writing tasks, each
at a different level of task complexity. They examined the effects of increasing task complexity
on the lexical and syntactic complexity in the writing by 34 non-native speakers of English. The
researchers found that the L2 production in the writing task with lower complexity had a larger
number of adverbial clauses while the medium and high complexity tasks yielded less adverbial
clauses. Overall, the study detected increases in the lexical complexity between low complexity
and high complexity writing tasks. However, the increase in the lexical complexity did not lead
to the increase in the syntactic complexity. As Frear and Bitchener noted, these results did not
support the Cognition Hypothesis. They suggested that these findings could be due to the nature
of the tasks, which required a different communication function. Also, there was no statistically
significant difference in the ratio of dependent clauses to t-units across all types of the dependent
clauses. When the ratio of the dependent clauses to t-units for each type of the dependent clause
was analyzed separately, there occurred a decrease in adverbial dependent clauses in the tasks
with higher complexity. Rahimi (2018) employed the paired samples t-tests and Wilcoxon
Signed Ranks tests to investigate the effects of increasing reasoning demands and the number of
elements on CAF indices. In the study, two argumentative tasks were adapted from Révész
(2011); the participants were 60 upper-intermediate FL learners of English in Iran. The findings
showed that increasing task complexity produced a larger number of subordinate clauses with a
greater lexical and syntactic complexity but also with a reduced writing accuracy.
To sum up, the earlier studies that employed statistical analyses to examine the effects of
task complexity and task condition on individual learners’ L2 academic writing were conducted
with different kinds of participants. This might have affected the findings due to the variability of
the overall mean scores that could stem from the participants’ individual differences. Therefore,
to increase the accuracy of the statistical analysis it would be advisable to conduct a study among
188
the same group of participants. This would require implementing an RM-ANOVA design. In
other words, an RM-ANOVA study could be a better analytical tool to examine the effects of the
task design variable (i.e., task complexity: ± causal reasoning demands) and the task
implementation variable (i.e., task condition: ± number of participants) on L2 individual
argumentative written production and measures of the CAF indices.
This article demonstrates how an RM-ANOVA study could be designed, piloted and
implemented. It also highlights the methodological challenges and possible solutions when
implementing such a study. The investigation of the effects of task complexity level (i.e., simple
versus complex) and task condition (i.e., individual, dyadic and triadic) on the L2 individual
academic writing (an argumentative essay in this particular case) was guided by the following
research question: Is there a statistically significant effect of task complexity (simple vs complex
task) and task condition (individual vs dyadic vs triadic groupings) on lexical and syntactic
Complexities, grammatical Accuracy and Fluency in L2 individual academic writing? The
following sections highlight important considerations that researchers have to address while
designing an RM-ANOVA study.
Designing an RM-ANOVA Study: Methodological Considerations
Operationalizing the Variable and Proposing the Relationship among the Variables
Studies employing an RM-ANOVA design assess relationships among several variables.
Moreover, the RM-ANOVA design can be implemented either with only the within-group
variables or in a combination of the within- and between-groups variables. As advised by
Larson-Hall (2015), in order to make the research design and the study’s variables clear to the
reader, researchers might want to provide a design box that visually presents their RM-ANOVA
analysis and variables.
In the current article, the RM-ANOVA analysis investigated whether there was a
statistically significant difference in the L2 individual writing (i.e., the argumentative essays) in
three different task conditions (i.e., individual, dyadic and triadic) which were performed by the
same group of participants. Figure 1 depicts the research design and variables in the current
study.
189
Dependent Variables Independent Variable & Moderator Variable
Continuous Variable Categorical Variable
Within-groups
variable
Between-groups
variable
L2 Individual Writing
• Lexical complexity and
Syntactic Complexity
• Grammatical Accuracy
• Fluency
Independent variables
Task Complexity
• Simple Task with 2
causes and 2 effects
• Complex Task with
6 causes and 6
effects
Moderator variables
Task Condition
• Individual - No
peer discussion,
individual writing
• Dyadic - 15
minutes
discussion,
individual writing
• Triadic - 15
minutes
discussion,
individual writing
Figure 1: Design box of the current 2 x 3 RM-ANOVA study
To be more specific, the independent variable—task complexity—had two levels (simple
and complex) and it was the within-groups variable. The moderator variable—task condition—
indicated that the same participants performed the task in the individual, dyadic and triadic
grouping. This variable was the between-groups variable. The independent and moderator
variables were categorical variables because the former represented the levels of task complexity
while the latter represented the conditions of the task implementation. As for the dependent
variables, the L2 individual writing was operationalized using the global measures of
Complexity, Accuracy and Fluency (CAF). Each of the dependent variables was measured on a
continuous scale.
190
Complexity, Accuracy and Fluency Measures and their Analysis
The writing quality of the argumentative academic writing tasks is usually assessed by
the global measures of Complexity, Accuracy, and Fluency (CAF). The researchers might want
to explain how the CAF were measured in their study. A detailed explanation of the CAF
measures in this particular study is given in Table 1.
Table 1: Global CAF Measures of the academic writing quality
Global Measures CAF Examples
Complexity
Lexical complexity
Syntactic
complexity
(Foster & Skehan,
1996)
Lexical complexity: Measured by a mean segmental type-token ratio.
Example:
A man walked through the underbridge. He was robbed at the underbridge.
• Different words/Total words (10/12)
Syntactic complexity: Measured by the number of S-nodes per T-unit in the
written text.
Example:
The picture shows that a man walked through the underbridge. He was
robbed at the underbridge by a group of unknown people.
• S-nodes/ T-units (3/2)
Accuracy
(Ellis, 2003)
Accuracy: Measured by the number of error-free T-units per T-unit in the
text.
Example:
A man walked through the underbridge. He was robbed at the underbridge
by a group of unknown people. Then, he went to the police station to lodge a
report. He tell the police that he recognized a man’s face.
• Error-free T-units/ total T-units (3/4)
191
Fluency
(Larsen-Freeman,
2006)
Fluency: Measured by the number of words per composition or per T-unit.
In the present study, two types of complexity were measured—Lexical complexity and
Syntactic complexity. As stated earlier, complexity refers to the number of clauses the learner
connects or includes within a sentence (Foster & Skehan, 1996). Accuracy is the learner’s ability
to exercise the maximum level of control to prevent errors during a language performance (Ellis,
2003) while Fluency is the learner’s ability to use the language with a high number of words
(Larsen-Freeman, 2006).
Steps in the Data Collection, Management and Analysis
Next, due to a complex nature of an RM-ANOVA design, which involves not only
multiple variables but also several data collection sessions, it would be helpful if researchers
could provide a graphical representations of the steps in the data collection, management and
analysis. Figure 2 offers a visual representation of the steps in the data collection, management
and analysis adopted in the current study. Moreover, as can be seen from Figure 2, the graphical
representation of RM-ANOVA design could be integrated with the visualization of the overall
research design of the study (e.g., mono- or mixed-methods).
192
Figure 2: Steps in the data collection, data management and data analysis
As can be seen from Figure 2, the data were collected in three sessions and in three task
condition settings—individual, dyadic and triadic. The individual session was set as a baseline
for the further analysis; each participant wrote an essay involving one simple and one complex
task, without any peer interaction. The findings were then compared with the findings from the
L2 individual written production in the dyadic and triadic settings, which involved the
intervention or a peer discussion session prior to the doing the task. In other words, in the dyadic
and triadic sessions, the participants were required to have a group discussion before writing
their individual essays. The written production (i.e., the written L2 texts) were coded to enable
the measurement of the CAF indices. The frequencies of CAF measures were tabulated and
analyzed using the Statistical Package Social Science (SPSS) Version 21.
To sum up, at the initial stage of developing an RM-ANOVA study, the main
considerations would be: 1) defining and operationalizing the variables in the study, 2) choosing
appropriate research instruments and analytical tools and 3) planning the data collection sessions.
Other considerations include addressing research ethics and planning the logistics (i.e., the scale
and the timing of the data collection sessions). The following section focuses on the second step,
which is choosing an appropriate research instrument.
193
Developing a Research Instrument
When choosing the research instrument (i.e., the type of essay), researchers may want to
consider the educational context where their study is conducted. In a Malaysia education context,
where the current study took place, it could be advisable to select an argumentative type of an
essay for the L2 writing task. There are two main reasons for this choice. Firstly, the
argumentative writing task requires the learners to use their logic and reasoning to generate an
argument; therefore, in the Triadic Componential Framework (Robinson, 2001a, 2001b, 2007;
Robinson & Gilabert, 2007) increasing the resource-directing variable, reasoning demands is
considered a cognitively more complex task.
Secondly, the argumentative writing genre is often employed in the academic writing
courses at the tertiary level in Malaysia (e.g., Veerappan, Yusof and Aris, 2013) and other
educational contexts (e.g., Khodabandeh et al., 2013). Therefore, at the tertiary level settings,
participants in a study could be familiar and comfortable with being given an argumentative
writing task stimulus rather than being provided a series of pictures for their writing task.
Therefore, while designing the current RM-ANOVA study argumentative-based topics were
considered as the most suitable to be used as prompts for the L2 individual academic writing in
all three types of settings (i.e., individual, dyadic and triadic) and also for the peer interaction
sessions (i.e., dyadic and triadic).
Research literature offers ample support for using argumentative writing tasks. For
example, Long (2015) proposed that tasks should be analytical in nature in order to stimulate
learner’s attentional mechanisms and memory resources. Argumentative writing tasks allow
bringing out the learners’ ability to understand, analyze, evaluate, explain and justify an issue
when they take a different position on the topic (Duff, 1985; Long, 1990). Besides, as noted in
several studies (Duff, 1985; Long, 2015), argumentative writing tasks allow learners to maintain
different positions during the interaction to reach a consensus and eventually succeed in their
writing task. Importantly, Foster and Skehan (1996) pointed out that an argumentative-based task
that incorporates ‘critical decision-making’ elements would allow yielding the most constant
patterns of the linguistic features and CAF measures. In a similar vein, other researchers (e.g.,
Ellis, 2003; Robinson, 2001a, 2005) argued that tasks that prompt reasoning are considered
cognitively more complex than tasks with decreased reasoning demands (Halford, Cowan, &
Andrews, 2007). The chosen research instrument needs to be tested in a pilot study. The
194
following section addresses issues pertaining the pilot study phase of the RM-ANOVA research
project.
Pilot Study
A pilot study is necessary to conduct in order to identify and prevent potential problems
that might arise in the actual study (Loewen and Plonsky, 2015). It would allow avoiding costly
mistakes (time-wise and resources-wise) that might arise due to deficiencies in the research
design and data elicitation devices, such as research instruments. During a pilot study various
aspects of the future study are assessed and tested, including the research settings, the potential
participants, the research instruments and the analytical tools. An RM-ANOVA study might need
more than one pilot study due to its complex research design, which includes multiple variables,
multiple data collection settings and different timings. In the current study, three rounds of pilot
studies were conducted before carrying out the actual study.
To be more specific, Pilot Study 1 tested the suitability of the intended group of
participants in terms of their English language proficiency level which is required for completing
the tasks. It also evaluated the appropriateness of the complexity level of the argumentative
writing tasks. This pilot study revealed that the participants with a low English level proficiency
or those who had obtained Bands 1 and 2 of The Malaysian University English Test (MUET)
were not able to complete the simple written task within a stipulated time (1 hour). They also
struggled to understand the demands and instructions for the tasks. Therefore, it was decided to
limit the participation in the actual study to only the learners at an intermediate level (i.e., MUET
Bands 3 and 4).
During Pilot Study 2, the main focus was on the concept and design of the task
complexity as well as the implementation of the argumentative essay writing tasks. A group of
15 ESL students at their intermediate levels of proficiency (MUET bands 3 to 4) completed two
argumentative writing tasks: one simple task and one complex task in three different task
conditions—individual, dyadic and triadic. The pilot study results showed that the learners were
able to complete both simple and complex tasks in all three types of settings or task conditions.
After each writing session the researcher had a casual conversation with the participants to seek
their perceptions of the complexity level of the tasks as well as their preferences for the topics of
195
the argumentative essays. The participants deemed the task complexity levels as appropriate,
with 2 causes and 2 effects for the simple task and 4 causes and 4 effects for the complex tasks.
To verify the appropriateness of the task complexity parameters, the researcher identified
several possible topics for the argumentative essays based on her discussion with the participants
and emailed these topics to Peter Robinson. This action aimed to check the feasibility of the task
complexity for the simple (2 causes and 2 effects) and complex (4 causes and 4 effects) L2
writing tasks. Robinson suggested to increase the complexity level for the complex tasks to 6
causes and 6 effects. As for the essay topic, the participants suggested several themes that they
considered engaging and relevant to real life. These themes included parenting, relationship,
academic achievement, freedom, technology intervention and mobile pedagogy.
Based on the findings, some amendments were made for the next round of the pilot
study. The complexity levels for the argumentative tasks were modified—2 causes and 2 effects
for the simple task and 6 causes and 6 effects for the complex task. Time for the individual
writing task was limited to 40 minutes whereas the peer discussion was limited to 15 minutes.
Both simple and complex tasks were discussed in the dyadic and triadic groupings.
Finally, Pilot Study 3 was conducted to verify the appropriateness and feasibility of the
amendments made on the basis of the two earlier pilot studies. It tested the feasibility and
suitability of the tasks, selection of the argumentative essay topics, task complexity levels, time
given to complete the tasks, settings and peer groupings arrangements. The participants were a
different group of 15 ESL learners (MUET bands 3 and 4). The findings of the Pilot Study 3
showed that the participants were able to produce a complete argumentative writing (both for the
simple and complex tasks) and that the research instrument and the data collection procedures
were appropriate.
The Actual Study
Data Collection and Participants
The actual study was conducted in a private university in Malaysia. The recruitment of
participants commenced after getting the official permission from the Dean of the Faculty.
Purposive sampling was adopted to recruit the participants. The criteria for participation in the
study were as follows: participants must be L2 learners of English, must have obtained the
minimum MUET band 3, and must be students at a local university.
196
To recruit the participants, the researcher distributed photocopied forms seeking personal
particulars from potential participants and requested the interested students to return the form to
the researcher. In the form, the potential participants were asked to give their name, state their
mother tongue, indicate their age, gender, degree majors, proficiency level in the English level
(assessed in MUET or IELTS results), and stated their hometown and contact number. Also, it
was stated in the form that as a small token of appreciation, the participants in the study would
receive a certificate of participation upon the completion of all three L2 writing sessions.
Prior to each data collection session, the researcher consulted the participants about
possible dates and times via WhatsApp. The sessions were set based on the students’
availability. Figure 3 offers a detailed visual depiction of the steps involved in the data collection
procedure. Initially, 126 students expressed their interest and willingness to participate in the
study. However, out of the 126 students only 43 attended the first round of data collection.
Furthermore, 7 of these 43 participants did not appear in the second and third sessions.
197
Pre-
selection
of
Participa
nts and
Recruitm
ent
Distributi
ng
Consent
Forms &
Survey
Forms
Two
weeks
later
Session 1
(Individual)
T w
o- W
ee k
In te
rv al
Session 2
(Dyad)
T w
o- W
ee k
In te
rv al
Session 3
(Triad)
Simple
Task 1
10 minutes
Preparation
Simple Task
2
10 minutes
Preparation
15 minutes
Dyadic
Discussion
40 minutes
Individual
Writing
Break 5-10
minutes
Complex
Task 2
10 minutes
Preparation
15 minutes
Dyadic
Simple Task
3
10 minutes
Preparation
15 minutes
Triadic
Discussion
40 minutes
Individual
Writing
Break 5-10
minutes
Complex
Task 3
10 minutes
Preparation
15 minutes
Triadic
198
40 minutes
Writing
Break 5-10
minutes
Complex
Task 1
10 minutes
Preparation
40 minutes
Writing
Discussion
40 minutes
Individual
Writing
Discussion
40 minutes
Individual
Writing
Figure 3: Data Collection Procedures
The remaining 36 (N=36) participants who took part in the study were all from the same
university but various academic programs, such as Civil Engineering, Materials and
Manufacturing Engineering, Mechatronics Engineering, Chemical Engineering, Broadcasting,
Graphic Design and Multimedia, Accounting, International Business and Actuarial Science.
Developing Good Rapport between the Researcher and Participants
Building trust and good rapport with the participants prior to and during the
implementation of an RM-ANOVA study is important in order to maintain the participants’
interest and good will throughout the research project and beyond (e.g., for later sharing of the
findings with the participants). For this purpose, the researcher explained the importance of the
study and highlighted how this research and its findings could contribute to developing a better
curriculum and how they could inform pedagogical decisions regarding the choice of the
teaching materials and classroom activities. In the current study, the 36 participants who stayed
199
throughout the several data collection sessions over a two months period were committed due to
their sense of responsibility and understanding the value of the study for the betterment of the
English language course. Also, the researcher thanked the students in person for participating in
each writing session and reminded them about the upcoming data collection session via the
WhatsApp messages; she reiterated the importance and value of their presence and feedback in
the writing session.
It should be noted that in order to retain the participants a proper organization of the data
collection sessions needs to be considered by the researcher and planned in advance. This
includes scheduling short breaks between the session, finding comfortable settings and
appropriate timing. The next section addresses these issues.
Keeping Participants Alert through the Data Collection Session
Another challenge is keeping the participants alert throughout the writing sessions. This
is especially important in view that the students might come to the data collection sessions after
their lectures and tutorials and they might be tired from their day-time activities. In the current
study, the participants were required to complete two writing tasks in one session, which took
approximately two hours. To maintain the energy level of the students, short breaks and
refreshments were provided. It cannot be stressed strongly enough that researcher’s sincere
concern for the participants’ well-being, such as providing some light food and having a short
chat with them during the break, will go a long way in facilitating and even enabling the
implementation of an RM-ANOVA study.
Performing the RM-ANOVA Test and Reporting the Results
The current article does not aim to report and discuss the statistical results from the actual
RM-ANOVA study that was carried out by the first author of this article. The main aim of this
paper is to highlight methodological challenges and issues that might arise while designing and
implementing an RM-ANOVA study on academic L2 writing. However, it is important to
remind that researchers must be aware of the hidden assumptions underlying statistical tests.
These assumptions must be checked and fulfilled before implementing the actual statistical
analysis. In order to give legitimacy to the findings from the RM-ANOVA statistical procedure,
200
researchers must ensure that the assumptions for this statistical analysis had been tested and met
before the data were analyzed.
As Larson-Hall (2015) reminded, besides the standard statistical assumptions, which are
a normal distribution and equal variances for all groups, there is one important additional
assumption for the RM-ANOVA test known as sphericity. This concept is complex but,
basically, sphericity “measures whether differences between the variances of a single
participant’s data are equal” (Larson-Hall, 2015, p. 326). Researchers usually employ the
Mauchly’s test to assess sphericity. If this assumption is not observed, then either the
Greenhouse–Geisser or Huynh–Feldt correction can be used as an option to remedy the analysis.
However, statisticians and methodologists warn that the Mauchly’s test is not a very
robust and powerful analysis (Howell 2002 as cited in Larson-Hall, 2015). Therefore, it is
advisable that even if the sphericity assumption is observed according to the results from the
Mauchly’s test researchers still might want to use either the Greenhouse–Geisser correction or
the Huynh–Feldt correction, preferably the former one as more conservative. The Greenhouse–
Geisser and Huynh–Feldt correction values are available in the SPSS reports of the RM-ANOVA
results. The researchers might want to include these results when reporting their statistical
findings (see Larson-Hall, 2015 for more details).
Finally, when reporting the statistical results, it would be good to provide graphic depictions
of the findings (see Larson-Hall, 2015). This would help the readers not only get a better general
impression of the findings but it also offers an effective way to summarize and present the
study’s results in a clear manner at both the group and individual levels.
Challenges while Implementing an RM-ANOVA Study
Like any research project, an RM-ANOVA study is bound to pose challenges to
researchers. These challenges include not only devising the research instrument and recruiting
the participants, as highlighted earlier in this article, but also minimizing the participants’
attrition throughout the research project. The following section is devoted to these issues.
Recruiting Participants
Recruiting the participants for this study was the main challenge. The participants were
selected on voluntary basis or their willingness and interest to be a part of the study. The
201
participants were from the same university but from different academic programs and courses.
As a result, they had different timetables for the lectures and tutorials. This made it quite
challenging to arrange the dyadic and triadic L2 writing sessions. Therefore, deciding whether
the participants would come from the same or different university programs and courses could be
important for a smoother implementation of the research project.
Participants’ Attrition and No-show
Attrition of the participants was another major challenge. From the initial 126 students
who showed their interest in the study, only 36 were able to provide the full set of the data. As
ethical considerations would require participation in a research project must be entirely voluntary
and the participants can withdraw from the study at any time they wish. To deal with the
challenge of participants’ attrition, which might jeopardize the success of an RM-ANOVA study,
the negative consequences of the participants’ attrition must be foreseen and minimized by the
researcher. This is especially important for an RM-ANOVA study as the analysis of the within-
group data demands that the same people participate in each and every session of the data
collection.
In addition, the participants’ non-show is a serious challenge. When some participants are
absent on specific data collection days the dyadic and triadic settings cannot be well-formed.
This is because the groupings are planned in advance. Therefore, to avoid losing the much
needed data, the session needs to be re-scheduled. To minimize this challenge it is desirable to
establish a good rapport between the researcher and the participants.
Preventing the Carryover Effects
Participating in a study with multiple data collection sessions, as it is required in a RM-
ANOVA design, can be quite demanding for the participants. To prevent the research fatigue, the
loss of motivation and to minimize the carryover effect, the data collection schedule in the
current study was planned with two-week intervals between each data collection point. However,
the caveat is that longer intervals might increase the attrition rate. To reduce this possibility, the
researcher explained to the participants the consequences and negative effects to the research
project if they would withdraw half-way through the project or do not attend the writing sessions.
This highlights the challenge of creating the ‘study participants’ in a true meaning of this word
202
and to make the students realize that they are important and valuable stakeholders in the research
project.
Conclusions
An RM-ANOVA study requires a meticulous design and well-planned execution. The
steps taken by the researcher beginning from the development of the research design throughout
the data analysis and reporting the findings can ‘make or break’ the success of an RM-ANOVA
research project.
As this article has highlighted, the challenges while implementing the RM-ANOVA
study ranged from the participant recruitment phase to the research instrument developing stage
and to the data collection phase. Arranging and grouping the participants for the dyadic and
triadic L2 writing sessions was one of the main challenges at the data collection phase.
Maintaining the participants’ interest throughout the study was another issue that needed to be
properly addressed by the researcher. Moreover, keeping the participants alert and keen during
the group discussions and immediately following writing sessions was of a paramount
importance for obtaining valid and reliable data and implementing the research project.
As Larson-Hall (2015) notes, research designs that incorporate repeated measures, such
as the RM-ANOVA design, are “quite desirable, as they increase the statistical power of a test”
(p.323). The current article described issues and challenges in a study that adopted the principle
of natural progression of task complexity from simple to complex. Future studies that adopt an
RM-ANOVA design might want to investigate the consequences of the reverse change in task
complexity from complex to simple. It is much hoped that methodological issues and challenges
highlighted in the current article, as well as suggestions provided, would help researchers in their
efforts to design and implement future RM-ANOVA studies.
References
Duff, P. A. (1985). Another look at interlanguage talk: Taking task to task. University of Hawai'i
Working Papers in English as a Second Language, 4(2).
Ellis, R. (2003). Task-based language teaching and learning. Oxford: Oxford University Press.
Foster, P., & Skehan, P. (1996). The influence of planning and task type on second language
performance. Studies in Second Language Acquisition, 18(03), 299-323.
203
Frear, M. W., & Bitchener, J. (2015). The effects of cognitive task complexity on writing
complexity. Journal of Second Language Writing, 30, 45-57.
doi:10.1016/j.jslw.2015.08.009
Halford, G. S., Cowan, N., & Andrews, G. (2007). Separating cognitive capacity from
knowledge: A new hypothesis. Trends in cognitive sciences, 11(6), 236-242.
Ishikawa, T. (2007). The effect of manipulating task complexity along the [+/-Here-and-Now]
dimension on L2 written narrative discourse. Investigating tasks in formal language
learning, 136-156.
Khodabandeh, F., Jafarigohar, M., Soleimani, H., & Hemmati, F. (2013). The impact of explicit,
implicit, and no-formal genre-based instruction on argumentative essay writing.
Linguistics Journal, 7(1).
Kuiken, F., & Vedder, I. (2007). Task complexity and measures of linguistic performance in L2
writing. International Review of Applied Linguistics in Language Teaching (IRAL), 261-
284. doi:10.1515
Kuiken, F., & Vedder, I. (2008). Cognitive task complexity and written output in Italian and
French as a foreign language. Journal of Second Language Writing, 17(1), 48-60.
doi:10.1016/j.jslw.2007.08.003
Kuiken, F., & Vedder, I. (2009). Tasks across modalities: The influence of task complexity on
linguistic performance in L2 writing and speaking. Paper presented at the colloquium
‘Tasks across modalities’. Paper presented at the Task based Language Teaching
Conference, Lancaster, UK.
Kuiken, F., & Vedder, I. (2011). Task performance in L2 writing and speaking: The effect of
mode. Second language task complexity: Researching the Cognition Hypothesis of
language learning and performance, 91-104.
Kuiken, F., & Vedder, I. (2012). Syntactic complexity, lexical variation and accuracy as a
function of task complexity and proficiency level in L2 writing and speaking. Dimensions
of L2 performance and proficiency: Complexity, accuracy and fluency in SLA, 143-170.
Larsen-Freeman, D. (2006). The emergence of complexity, fluency, and accuracy in the oral and
written production of five Chinese learners of English. Applied Linguistics, 27(4), 590-
619.
204
Larson-Hall, J. (2015). A guide to doing statistics in second language research using SPSS and
R. Routledge.
Long, M. H. (1990). Task, group, and task-group interactions.
Long, M. H. (2015). Second language acquisition and task-based language teaching. Hoboken:
Wiley Blackwell.
Loewen, S., & Plonsky, L. (2015). An A–Z of applied linguistics research methods. Macmillan
International Higher Education.
Rahimi, M. (2018). Effects of increasing the degree of reasoning and the number of elements on
L2 argumentative writing. Language Teaching Research, 1362168818761465.
Révész, A. (2011). Task Complexity, focus on L2 constructions, and individual cifferences: A
classroom-based study. Modern Language Journal, 95, 162-181. doi:10.1111/j.1540-
4781.2011.01241.x
Robinson, P. (2001a). Task complexity, cognitive resources, and syllabus design: A triadic
framework for examining task influences on SLA. Cognition and second language
instruction, 287-318.
Robinson, P. (2001b). Task complexity, task difficulty, and task production: Exploring
interactions in a componential framework. Applied Linguistics, 22(1), 27-57.
Robinson, P. (2003a). Attention and memory during SLA. The Handbook of Second Language
Acquisition, 631-678.
Robinson, P. (2003b). The cognition hypothesis, task design, and adult task-based language
learning. Second Language Studies, 21(2), 45-105.
Robinson, P. (2005). Cognitive complexity and task sequencing: Studies in a componential
framework for second language task design. IRAL-International Review of Applied
Linguistics in Language Teaching, 43(1), 1-32.
Robinson, P. (2007). Task complexity, theory of mind, and intentional reasoning: Effects on L2
speech production, interaction, uptake and perceptions of task difficulty. IRAL-
International Review of Applied Linguistics in Language Teaching, 45(3), 193-213.
Robinson, P., & Gilabert, R. (2007). Task complexity, the Cognition Hypothesis and second
language learning and performance. IRAL, 45, 161-176.
205
Ruiz-Funes, M. (2015). Exploring the potential of second/foreign language writing for language
learning: The effects of task factors and learner variables. Journal of Second Language
Writing, 28, 1-19. doi:10.1016/j.jslw.2015.02.001
Skehan, P. (1996). A framework for the implementation of task-based instruction. Applied
Linguistics, 17(1), 38-62.
Skehan, P. (2009). Modelling second language performance: Integrating Complexity, Accuracy,
Fluency, and Lexis. Applied Linguistics, 30(4), 510-532. doi:10.1093/applin/amp047
Veerappan, V., Yusof, D. S. M., & Aris, A. M. (2013). Language-switching in L2 composition
among ESL and EFL undergraduate writers. Linguistics Journal, 7(1), 209-228.
Willis, D., & Willis, J. (2008). Doing task-based teaching. Oxford University Press.
Copyright of Linguistics Journal is the property of E.L.E. Publishing and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.