Article Summary

profileNicole36
Article1.pdf

Computers & Education 175 (2021) 104325

Available online 11 September 2021 0360-1315/© 2021 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

The longitudinal trajectories of online engagement over a full program

Mohammed Saqr a,b,*, Sonsoles López-Pernas c

a KTH Royal Institute of Technology, EECS - School of Electrical Engineering and Computer Science, Lindstedtsvägen 3, SE-100 44 Stockholm, Sweden b University of Eastern Finland, School of Computing, Joensuu, Yliopistokatu 2, Joensuu, fi-80100, Finland c Universidad Politécnica de Madrid, ETSI Telecomunicación, Departamento de Ingeniería de Sistemas Telemáticos, Avda. Complutense 30, 28040, Madrid, Spain

A R T I C L E I N F O

Keywords: Longitudinal engagement Trajectories of engagement Learning analytics Sequence mining Survival analysis

A B S T R A C T

Student engagement has a trajectory (a timeline) that unfolds over time and can be shaped by different factors including learners’ motivation, school conditions, and the nature of learning tasks. Such factors may result in either a stable, declining or fluctuating engagement trajectory. While research on online engagement is abundant, most authors have examined student engagement in a single course or two. Little research has been devoted to studying online lon- gitudinal engagement, i.e., the evolution of student engagement over a full educational program. This learning analytics study examines the engagement states (sequences, successions, stability, and transitions) of 106 students in 1396 course enrollments over a full program. All data of students enrolled in the academic year 2014–2015, and their subsequent data in 2015–2016, 2016–2017, and 2017–2018 (15 courses) were collected. The engagement states were clustered using Hidden Markov Models (HMM) to uncover the hidden engagement trajectories which resulted in a mostly-engaged (33% of students), an intermediate (39.6%), and a troubled (27.4%) trajectory. The mostly-engaged trajectory was stable with infrequent changes, scored the highest, and was less likely to drop out. The troubled trajectory showed early disengagement, frequent dropouts and scored the lowest grades. The results of our study show how to identify early program disengagement (activities within the third decile) and when students may drop out (first year and early second year).

1. Introduction

As online learning environments became increasingly common, along with all types of technology-enhanced learning (Palvia et al., 2018), a wealth of online data has driven interest in using modern analytical methods for understanding, profiling, and supporting student engagement (Barthakur et al., 2021; Gašević, Jovanović, Pardo, & Dawson, 2017; Henrie, Halverson, & Graham, 2015). Research in learning analytics has produced valuable insights regarding, e.g., engagement patterns, students’ profiles, and relationship of engagement with achievement (Barthakur et al., 2021; Henrie et al., 2015; Pardo et al., 2015). Such insights have relied on a diverse range of methods such as clustering, sequence mining, process mining, and predictive modelling (Bos, 2016; Cornwell, 2018; Gašević

* Corresponding author. University of Eastern Finland, School of Computing, Joensuu Campus, Yliopistokatu 2, fi-80100 Joensuu, Finland P.O. Box 111, fi-80101, Joensuu, Finland.

E-mail address: [email protected] (M. Saqr).

Contents lists available at ScienceDirect

Computers & Education

journal homepage: www.elsevier.com/locate/compedu

https://doi.org/10.1016/j.compedu.2021.104325 Received 7 April 2021; Received in revised form 28 August 2021; Accepted 4 September 2021

Computers & Education 175 (2021) 104325

2

et al., 2017; López-Pernas, Saqr, & Viberg, 2021; Matcha, Gašević, Uzir, Jovanović, & Pardo, 2019). A pervasive pattern of such research is that they have examined individual courses, or few courses with different students (Barthakur et al., 2021). While such studies have shed light on students’ learning strategies, approaches to learning, or engagement, little is known about the longitudinal online engagement in a full year of studies, a curriculum, or a program (Barthakur et al., 2021). The abundance of online data enables researchers a scalable method for collecting and analyzing data about learners. Additionally, the study of longitudinal engagement (or disengagement) enables researchers to understand learning strategies at the program scale and offer solutions to problems like disengagement, attrition, or poor academic performance in a program (Archambault & Dupéré, 2017; Barthakur et al., 2021; Gašević et al., 2017; Zhen et al., 2020).

Building on the existing body of literature, this article aims to study the longitudinal online engagement of students in a full ac- ademic program. Our method follows the latest research in the field of learning analytics and extends the existing methods with modern sequence typology methods. In particular, we use Latent Class Analysis (LCA) (Goodman, 1974; McCutcheon, 1987) to reveal the clusters of students’ learning strategies and engagement patterns in each course (Barthakur et al., 2021; Carroll & White, 2017; Park, Yu, & Jo, 2016). We further study the succession of engagement states using sequence pattern mining methods. We take advantage of Hidden Markov Models (HMM) to uncover the hidden engagement trajectories (sequence of similar engagement patterns across the program or typologies) (Helske et al., 2015, 2018). HMM has the advantage of being able to reveal hidden longitudinal patterns, as well as summarize complex information. For each trajectory, we study the patterns of engagement, stability, and de- viations (transitions from a trajectory to another). We conclude by studying the fate of engagement trajectories and their relation to performance and dropout. The contribution of this study is twofold: 1) it contributes to the body of the literature on online engagement trajectories, the criteria of the trajectories, stability thereof, and fate, and 2) it demonstrates the sequence typology methods and the wealth of insights they can bring. The research questions of our study are:

RQ1. What are the engagement states at the course level and what are their defining features?

RQ2. How stable are the learners’ engagement states at the course level (within and between students) and, if unstable, how do these engagement states change?

RQ3. What are the trajectories of students’ engagement across the full program? What are their defining features? How stable are these trajectories and, if unstable, how do they change?

RQ4. What is the temporal relationship between engagement trajectories and dropping out of the program or academic achievement (GPA)?

2. Background

2.1. Engagement as a construct

Many scholars posit that engagement is a “meta” concept that integrates –or overlaps with– other theoretical constructs such as motivation and self-regulation (Finn & Zimmer, 2012; Fredricks, Blumenfeld, & Paris, 2004). According to Fredricks et al. (2004), engagement is a multidimensional construct with integrated sub-components of behavior, emotion, and cognition. These components are dynamically intertwined and linked within the individual (Fredricks et al., 2004; Fredricks & McColskey, 2012). The behavioral component reflects following school norms, efforts to study, or participation in academic activities. Such behavior can be observed in regular attendance, participation, and commitment to homework and assignments. In online settings, behavioral engagement mani- fests in, e.g., students’ use of online resources, participation, regularity, and the patterns thereof (Azcona, Hsiao, & Smeaton, 2019; Rienties, Tempelaar, Nguyen, & Littlejohn, 2019). Emotional engagement reflects learners’ feelings about school, peers, and their learning experience, which includes emotional reactions such as interest, disinterest, happiness, sadness, and anxiety. Emotional engagement helps students engage behaviorally in school activities, feel belonging, and persist in education. Cognitive engagement reflects students’ efforts which can range from memorizing and trying to understand learning tasks to investing thoughtful energy to grasp advanced concepts or acquire complex skills. Cognitive engagement also includes the use of self-regulated learning strategies, preference for challenge, and positive coping in tough situations. As understanding of the concepts evolved, newer definitions emphasized the multidimensional nature of engagement (Finn & Zimmer, 2012; Fredricks et al., 2004; Fredricks & McColskey, 2012; Furrer & Skinner, 2003). Such dimensions are dynamically closely interrelated, e.g., positive feelings about school motivate behavioral engagement in school activities, and cognitive engagement (Azevedo, 2015; Finn & Zimmer, 2012; Fredricks et al., 2004). Researchers have demonstrated that all dimensions of engagement (behavioral, emotional, and cognitive) are significant catalysts of academic achievement. Such a relationship has been repeatedly confirmed in all levels of education (King, 2015; Lei, Cui, & Zhou, 2018; Wang & Degol, 2014).

Engagement has obvious qualities that can be observed, tracked, and easily understood by teachers (Fredricks & McColskey, 2012; Henrie et al., 2015). Since engagement is malleable, disengaged students are amenable to intervention and support (Finn & Zimmer, 2012; Fredricks et al., 2004). Furthermore, engagement may be seen as mediated by motivation. That is, students who have a positive belief about their capacity to succeed (academic self-concept) or see their learning tasks as valuable or enjoyable (subjective task value) are more likely to be engaged and succeed (Gottfried, Fleming, & Gottfried, 2001; Wang & Eccles, 2013). Therefore, an intervention that promotes students’ motivation, or improves school environment or learning tasks has the potential to enhance engagement and academic achievement (Gottfried et al., 2001).

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

3

2.2. Engagement in online learning

Online learning has become an essential part of today’s education, courses and programs supported by technology or exclusively delivered online continue to grow in size, scale, and adoption (Escueta, Quan, Nickow, & Oreopoulos, 2017; Palvia et al., 2018). A remarkable upsurge in online education has been seen since the outbreak of the COVID-19 pandemic (Saqr & Wasson, 2020). The adoption of online learning has brought several opportunities, e.g., enabling the provision of learning resources that transcended the barriers of place and time, offering rich interactive multimedia, and facilitating collaborative learning (Escueta et al., 2017; López-Pernas, Gordillo, Barra, & Quemada, 2021; Schöbel et al., 2020). However, online learning requires certain qualities to succeed, e.g., self-regulation, agency, and engagement. Therefore, it has proven challenging for some students (Bol & Garner, 2011; Dabbagh & Kitsantas, 2004). As such, researchers and educators have searched for ways to better support students’ online learning, and therefore engagement has gained significant momentum (Bol & Garner, 2011; Dabbagh & Kitsantas, 2004).

Learning analytics affords researchers a seamless and unobtrusive “behind the scenes” method of collecting fine-grained data about learners’ behavioral engagement and holds the promise for optimizing learning and learning environments (Henrie et al., 2015). Such fine-grained data about students’ engagement can be collected in real time enabling the provision of timely feedback (Rienties et al., 2019). What is more, such feedback can be personalized, automated, and delivered at scale. Therefore, a significant corpus of research has examined students’ engagement in online learning. Such research has shed light on the measurement of engagement, early signs of disengagement, and the prediction of disengagement, to mention a few (Henrie et al., 2015; Lei et al., 2018; Saqr, Fors, & Tedre, 2017).

2.3. Person-centered methods

Variable-oriented methods capture the relationships and interactions among variables. Thus, variable-oriented methods are immensely helpful since they can produce valuable information about what is expected from a group of learners on average (Hick- endorff, Edelsbrunner, McMullen, Schneider, & Trezise, 2018; Rosato & Baer, 2012). However, “oftentimes, an ‘average’ learning pattern is not an adequate description for many learners because this ignores the unobserved heterogeneity between learners” (Hickendorff et al., 2018, p. 5). Therefore, variable-oriented methods have been criticized for being a misleading over-generalization: “that study findings represent the overall sample” (Rosato & Baer, 2012, p. 61). Person-centered methods, commonly implemented as Latent Class Analysis (LCA) or Latent Profile Analysis, aim at capturing distinct patterns of heterogeneity and variations within human behavior and experiences (Hagenaars and Jacques A, 2002; Hickendorff et al., 2018; Rienties et al., 2019; Rosato & Baer, 2012). In doing so, person-centered methods help researchers discover latent classes or “hidden subgroups” based on combinations of shared observable characteristics (Goodman, 1974; Hickendorff et al., 2018; McCutcheon, 1987; Rosato & Baer, 2012).

2.4. Engagement as a heterogeneous process

Using person-centered methods (e.g., LCA) to study engagement has the potential to advance our understanding of the hetero- geneity and diversity within students’ subpopulations and consequently offer the appropriate support (Wang & Degol, 2014). Evidence of sub-populations of students’ engagement has been reported by, e.g., Wang and Eccles (2013), who used latent profile analysis to find students’ engagement profiles. The authors reported five clusters of engagement profiles with varying patterns that influenced their educational and psychological functioning (Wang & Eccles, 2013). Rienties et al. (2019, p. 245) used person-centered methods to identify “distinct clusters of behavioral engagement in an online e-tutorial”. Further evidence of the heterogeneity of engagement among students’ sub-population was also reported in longitudinal studies (Archambault & Dupéré, 2017; Zhen et al., 2020). Zhen et al. (2020) reported four profiles of students’ engagement: persistent, climbing, descending, and struggling. Archambault and Dupéré (2017) reported five groups: stable high, stable moderate, transitory inclining, transitory declining, and constantly declining.

The heterogeneity within engagement profiles of online learners has also been a common finding among learning analytics re- searchers who reported distinct profiles of learners using other person-centered techniques such as clustering and sequence mining based on learners’ online activities, use of resources, and navigational behavior. Three levels of engagement (or subgroups) are commonly identified (Barthakur et al., 2021; Jovanović, Dawson, Joksimović, & Siemens, 2020, 2017; Kovanović et al., 2019; López-Pernas, Saqr, & Viberg, 2021; Matcha, Gašević, Uzir, Jovanović, & Pardo, 2019): 1) active, intense, or highly-engaged learners, who are intensive users of available resources and invest a considerable amount of energy in their approach to learning, 2) disengaged learners, who show low levels of participation and invest less time and effort doing learning activities, and 3) a third group of selective learners, who invest average time and efforts in their learning, directing most of their efforts to accomplish their tasks with reasonable efforts.

2.5. The trajectories of engagement

Engagement has a trajectory (a pathway or a timeline) that unfolds over time and can be shaped by different factors including learners’ motivation, school conditions, teachers, peers, social context, and the nature of learning tasks (Fredricks et al., 2004; Saqr, Nouri, & Jormanainen, 2019; Wang & Degol, 2014; You & Sharkey, 2009). Such factors may result in either a stable engaged tra- jectory, a declining trajectory, or a fluctuating trajectory between states over time (Finn & Zimmer, 2012; Skinner & Pitzer, 2012; Wang & Degol, 2014). So far, research in face-to-face settings is inconclusive regarding the longitudinal trajectory of engagement. Some scholars suggest that engagement shows marked inter-individual stability with low variation between the years, and high cross-time correlation (Gottfried et al., 2001; Janosz, Archambault, Morizot, & Pagani, 2008; Skinner & Belmont, 1993). That is, for an

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

4

individual student, engagement during an academic year predicts engagement at the next year (Gottfried, Marcoulides, Gottfried, Oliver, & Guerin, 2007; Skinner & Pitzer, 2012). Others posit that engagement declines through the years (Wigfield, Eccles, Schiefele, Roeser, & Davis-Kean, 2007), while some suggest that engagement can grow under optimal conditions (You & Sharkey, 2009). Recently, some researchers have pointed to a heterogeneous pattern in which different groups of students exhibits different engagement profiles (Archambault & Dupéré, 2017; Li & Lerner, 2011; Pagani, Fitzpatrick, & Parent, 2012; Wigfield et al., 2007; Zhen et al., 2020), where some students maintain a stable state of engagement (usually the majority) and another group shows variations: ascending to higher engagement levels or declining to disengagement (Archambault & Dupéré, 2017; Li & Lerner, 2011; Zhen et al., 2020). The aforementioned studies have been conducted in face-to-face settings, while online longitudinal engagement over a full program remains an area in need of further research.

2.6. Modelling trajectories of engagement

Accounting for the sequential relationship between states/events is particularly important when modelling the timeline of certain observations or behavior. Therefore, sequence mining, a data analysis method that emphasizes the sequential order, was conceptu- alized (Agrawal & Srikant, 1995). Typology is a method aimed at identifying the patterns of the sequence of events or the trajectories. Such methods have been well-established in the study of longitudinal life events such as career pathways or sequences of marital life events (Helske, Helske, & Eerola, 2018; Malin & Wise, 2016; Ritschard & Studer, 2018). Since we are interested in revealing the trajectory of longitudinal engagement over a program, we used HMM, a method that has been used in modelling life events, as well as in several other fields, to model, e.g., temporal processes. HMMs have been frequently used to reveal students’ profiles (Jeong et al., 2008), and have proven particularly useful in mining longitudinal sequential patterns or trajectories (Helske et al., 2018; Jeong et al., 2008). In HMM, sequence data are observed states, which are considered probabilistic functions of hidden (latent) states (trajectories). Such “hidden states cannot be observed directly, but only through the sequence(s) of observations”. A trajectory is a typical succession of similar behaviors that are as homogeneous as possible as well as distinct from other trajectories: for instance, a trajectory of a career pathway. The hidden states are hypothesized to be generated by a similar mechanism, thus resulting in a certain trajectory. Furthermore, HMM can reveal the unobservable state in data and has a summarizing power that allows to compress information from different types of observations into meaningful insights (Helske & Helske, 2019). In this study, we use a combination of methods to identify students’ engagement states, describe the engagement states’ characteristics, group such states into homogeneous trajectories, and study such trajectories. The methods are described in detail in the next section.

2.7. Motivation for this research

The majority of existing works have focused on individual courses, with an obvious scarcity of research on the longitudinal engagement across a program. However, few examples exist. Lust, Elen, and Clarebout (2013) examined learners’ patterns of use of an online learning tool across two subsequent phases of a single blended learning course. They identified three distinct clusters of learners during the first phase whereas, in the second phase, they found an additional cluster, indicating changes in students’ patterns of engagement with the online tool across the two phases of the course. To address this dearth in existing literature, Barthakur et al. (2021) sought to cluster learners based on their program-level engagement profiles using their weekly activities across a full year online Massive Open Online Course (MOOC). They found three distinct clusters of learners according to their program-level strategies: 1) consistently engaged: students who engaged with the various learning activities throughout each course and across the program; 2) disorganized: students who were mostly disengaged all the way through and sometimes slightly more active towards the end of each course, and 3) get-it-done: students were mostly active at the start of each course across the program, focusing on the assessment components and disengaged for the rest of the course.

3. Methods

3.1. Context

The context of our study was Qassim University. While it was a blended context, the online component was an integral part of the teaching strategy. Since the program was based on Problem-Based Learning (PBL), students were required to read and contribute to the weekly online PBL discussions several times a week. The content and objectives of the online PBL were the subjects of the lectures, seminars, and practical sessions. While some of these were delivered face-to-face, students had to extract the learning objectives, lectures, and handouts from the online Learning Management System (LMS), as well as get course updates, follow teachers’ an- nouncements, and interact with colleagues. Therefore, engagement with the online component was an integral element of the pro- gram. Contributing to online PBL was “required” and students got a grade of (5%) for contributing and replying to colleagues. The online lectures were important (but not mandatory) as they were the main source of learning resources. The LMS also contained the learning objectives that students had to achieve as well as the schedules and deadlines of the various assessments of the courses. The courses were homogeneous and shared the same teaching strategy (PBL), organization of PBL sessions, and teaching and assessment methods including the breakdown of assessment methods. There were around four courses (always referred to as blocks) each year with durations that averaged six weeks. The courses were sequential, and students had to complete them in order. Some practical courses were longitudinal (full year) and were taught in parallel to the ongoing course (e.g., clinical skills). Such courses were excluded from our study as they did not share the same teaching strategy, had different online content, and had an evaluation method based on

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

5

practical performance. Students’ performance was measured by the course grade, consisting of separate grades for 1) the final exam, 2) the level of

engagement in the online forums, and 3) students’ continuous assessment with the learning tasks and interaction during class. The final exam accounted for 80% of the overall grade, whereas the latter two components accounted for the remaining 20%, distributed as follows: 10% continuous assessment (e.g., practical assignments, seminar preparations, and engagement in lectures and course duties), 5% for participation in online forums (based on posting and responding to colleagues), and 5% for face-to-face PBL group sessions, and engagement in lectures and seminars. The GPA is the total grade for all courses over the full program. In the program under study, students who do not attend a course or fail get only 60% of the grade if they retake it. Withdrawn means excused to attend the course while keeping all grades. This could be a sick student, absent due to travelling, or having a personal issue. They are allowed to have or “re-take” a special exam. These exams and studies are done privately and so their data are not represented online. Including these students (although very few) would show up as disengaged (erroneously).

3.2. Data collection and operationalization

The data were collected from the institution’s LMS (Moodle) for all courses of all students enrolled in the academic year 2014–2015, and their subsequent data in 2015–2016, 2016–2017, and 2017–2018 (15 courses). Only learning-related events in Moodle logs were considered, in other words, teacher ‘events’ such as clicks on profiles pages, chats, and checking grades were not included in the analysis. In addition, very infrequent events (e.g., clicks on inactive wiki modules, workshops, and labels) were removed as they could not be used in the analysis. We operationalized online engagement –which is essentially behavioral engagement– following the dominant view of the literature on online learning (Azevedo, 2015; Henrie et al., 2015), as Henrie et al. (2015) sum- marized it in their literature synthesis on measuring engagement in online learning, “by computer-recorded indicators such as as- signments completed; frequency of logins to the website; number and frequency of postings, responses, and views; number of podcasts, screencasts, or other website resources accessed; time spent creating a post; and time spent online.” It is worth mentioning that such type of data can only capture behavioral engagement and less so cognitive engagement. To ensure we captured the full gamut of students’ activities in our PBL program, we collected three categories of indicators with eight variables for each student in each course:

Frequency of activities

These indicators reflect students’ investment in coursework, participation with colleagues, and contribution to the collaborative process (Redmond, Heffernan, Abawi, Brown, & Henderson, 2018; Saqr, Viberg, & Vartiainen, 2020).

• Frequency of Course Browsing: The number of times a student viewed the course main page, which acts as the gateway for all other resources, displays course announcements and updates (e.g., new announcements by the teacher, new lectures, posts from peers, assignments, etc.).

• Frequency of Forum Consumption: The number of times a student read other students’ forum posts. “Reading forums” is oper- ationalized as social engagement.

• Frequency of Forum Contributing: The number of times a student actively contributed to the forum content, i.e., created, updated, edited, or deleted posts. It is operationalized as student participation in the course collaborative activities, efforts to contribute with colleagues.

• Frequency of Lecture View: The number of times a student clicked on a learning resource, such as a recorded lecture or a “folder” with learning resources (group of learning materials).

2.4. Online time

These indicators reflect students’ on-task time and effort in online activities (Azevedo, 2015; Henrie et al., 2015).

• Session Count: The number of learning sessions of a student. A session was defined as an uninterrupted sequence of learning events where the time gap between any two consecutive events is below the chosen threshold. The threshold was set to the 85th percentile of the time gap between two consecutive learning events. A time gap of 20 min of inactivity between two events was considered a cut-off value (corresponding to the 85th percentile of the dataset) for considering a new session. While there is no agreed-upon standard for session duration, we have chosen a relatively long session time gap since our students spent a significant time reading or writing forum posts and sometimes looking up the literature to compose their posts. A shorter time gap would erro- neously assign such actions to multiple sessions.

• Total session duration: The sum of the duration of all the learning sessions of a student. The session duration was calculated as the total time between the first and last event in a session.

Activity and regularity

These indicators reflect students’ activity levels and regularity of activities (Jovanović, Mirriahi, Gašević, Dawson, & Pardo, 2019; Anonymized).

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

6

● Active Days: The number of days a student was active in the LMS, i.e., the number of days for which there were logs recorded for the student. Only clicks on learning materials were considered, e.g., opening a forum or lecture.

● Regularity: The regularity of student online behavior. It was calculated as the entropy of the course browsing events for each student during each course over the duration of the course using the method described in (Jovanović et al., 2019). Entropy as a regularity measure was shown to reflect student’s engagement as well as predict performance in multiple studies (Jovanović et al., 2019). Further details on entropy are in section 3.3.2.

Modelling a long-term process over four years brings challenging variability over that long period. As we are interested in sig- nificant shifts from “engaged states” to “disengaged states”, small and fine-grained changes in the activities required coarsening. As Winne (2020) puts it, “reliability can sometimes be improved by tuning grain size of data so it is neither too coarse, masking variance within bins, nor too fine-grained, inviting distinctions that cannot be made reliably”. Therefore, converting students’ data into discrete values (most commonly deciles) has been a frequent variable processing step in learning analytics (Alves, Morais, & Miranda, 2017; Dewar, Hope, Jaap, & Cameron, 2021; Miyamoto et al., 2015; Sander & Services, 2016), especially when modelling long-term tem- poral processes (Alves et al., 2017; Dewar et al., 2021). Alves et al. (2017) used virtual learning environments log-stream data (VLEs) to divide access logs into deciles of equal width of students’ activities to overcome “great variability of values” of access to the VLE. The authors found that being in the lower percentile is strongly negatively correlated with passing rates, in which 89.4% of students in the lowest decile did not pass any course. Using students’ data over a five-year program, Dewar et al. (2021) divided students’ engagement data into equal deciles “as a means to overcome variation in year average engagement scores”. Their findings indicated that over 50% of students who failed the summative exams were in the bottom two deciles. In another study from Harvard University, the authors used deciles of session counts and total time to allow comparisons in twenty MOOCs to predict their certification (Miyamoto et al., 2015). In summary, the benefits of using a discretization method (deciles, or binning in general) in modelling a longitudinal process may help overcome variations (Alves et al., 2017; Dewar et al., 2021; Miyamoto et al., 2015), as well as facilitate the communication of results to stakeholders. To that end, all indicators were discretized within each course using equal-width bins of 10 deciles, so that every bin contained an equal number of students similar to (Alves et al., 2017; Miyamoto et al., 2015). The least active student would be in the lowest bin (1), and the highest would be in the highest bin (10).

Fig. 1. Summary of the methods used in the present work.

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

7

3.3. Data analysis

The present section details the methods used to analyze the data and answer the research questions of this study. Fig. 1 summarizes the complete data analysis process, which is described step by step in the following subsections.

3.3.1. Clustering of engagement states To answer RQ1, LCA was selected to cluster students according to their engagement at the course level (Goodman, 1974;

McCutcheon, 1987). LCA is a flexible method commonly used for understanding patterns of human behavior and identifying subgroups based on combinations of shared observable characteristics. LCA assigns to each student a probability of belonging to each subgroup based on maximum likelihood estimation (Goodman, 1974; McCutcheon, 1987). The student is then assigned the group to which he/she belongs with the highest probability (Carroll & White, 2017; Park et al., 2016). To identify the numbers of clusters of engagement, and since there is no single criterion to select the number of subgroups (Weller, Bowen, & Faubert, 2020), we started by estimating a one-class model and then adding more classes up to ten (Weller et al., 2020). Our class enumeration was informed by the previous literature that identified a limited number of levels (mostly three) (Barthakur et al., 2021). Therefore, we limited our search to ten classes. The best model was selected based on a combination of the results of best fit and the model with the lowest Akaike in- formation criterion (AIC) and Bayes information criterion (BIC) (Goodman, 1974; McCutcheon, 1987; Rosato & Baer, 2012; Weller et al., 2020). The input of the LCA was the students’ eight activity indicators (divided into deciles as described in section 3.2). Students were classified according to their engagement level within each course. We refer to this as “engagement state” throughout the manuscript. To estimate the differences among the states identified, we performed a one-way analysis of variance (ANOVA) (Vieira, 2011). To rigorously evaluate the magnitude of the obtained results, we calculated the eta-squared (η2) partial eta-squared (ηp2), and the omega-squared (ω2) effect sizes. Post-hoc pairwise comparisons were performed through Dunn’s test to verify the magnitude and significance of the differences of states (clusters), with Holm’s correction for multiple testing (Holm, 1979). The homogeneity assumption among clusters was checked (and satisfied) for all variables using Levene’s test (Vieira, 2011). The results of the LCA algorithms were plotted and described in detail. Some students (in 13 instances) applied for a formal withdrawal request (unenroll) from a course offering, and so their data from that course was marked as withdrawal. Withdrawal requests are granted to students on request with no excuse required. Lastly, it is worth mentioning that LCA assumes conditional independence in which each item within a class is independent of other items within that class” (Rosato & Baer, 2012, p. 6), also often referred to as local independence (Rosenberg, Beymer, Anderson, van Lissa, & Schmidt, 2018). Therefore, we calculated all possible pairwise correlations (n = 84) between all examined variables within each cluster. The results have shown that the majority of pairwise correlations within each cluster were weak and all pairwise correlations were well below the threshold of 0.8 for multicollinearity.

3.3.2. Sequence mining To answer RQ2, we first implemented sequence mining to construct and study course engagement sequences. For each student,

engagement states were ordered chronologically according to the course start date. Since our dataset has 15 courses, each student is represented by a sequence consisting of —at most— 15 engagement states (one per course). An example of a sequence for a student looks as follows:

Our analysis aimed to reveal the distribution of each engagement state at each time point in the whole university program as well as the stability or lack thereof (e.g., how frequent a student goes from being engaged to disengaged). The TramineR R package (Gaba- dinho, Ritschard, Müller, & Studer, 2011) was used to construct a state sequence object from the chronologically-ordered course engagement states. The distribution of the sequence of states was plotted to demonstrate the ratio of each course engagement state at each time point. An index plot was used to visualize the longitudinal timeline of engagement states for every single student. The index plot represents each student as a sequence of horizontally stacked colored bars of course engagement states in sequential order (1− 15), thus demonstrating the temporal succession of different states.

To measure how stable course engagement states were within each student (intra-individual stability, i.e., how frequent a student would change from an engaged state to another) we calculated the within-student entropy based on Shannon’s formula (Gabadinho, Ritschard, Studer, & Nicolas, 2009). Entropy is lowest for students with homogeneous course engagement states, (i.e., entropy is zero for students with a single engagement state) and highest for students with maximum changes in engagement states at different time points. To measure the heterogeneity of course engagement states among subgroups of students, the between-student entropy was calculated at each time point. Values of between-student entropy are lowest when all students have the same course engagement state at the same time point and increase with heterogeneity. To understand the patterns of transitions of engagement states, we computed the frequent subsequences of engagement. We report the 10 most frequent subsequences with the highest support. Support is defined as an occurrence of a subsequence at least once in each student.

3.3.3. Modelling and studying the trajectories of engagement To answer RQ3, we used sequence mining and HMM, which are standard techniques for modelling longitudinal life events (Helske

et al., 2015, 2018). To identify the program level trajectories of learners’ engagement, we used the R library seqHMM to group se- quences of course engagement states into similar trajectories, i.e., typical successions of the course engagement states that are as unique and distinct as possible (Helske et al., 2018; Helske & Helske, 2019). The output of seqHMM is a group of trajectories, in other words, clusters of sequences of engagement states. We computed the probabilities of belonging to each trajectory of each student and assigned the student to the highest probable trajectory. The HMM model was estimated using the Expectation-Maximization (EM)

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

8

algorithm following the method of (Helske et al., 2018). The chosen model was based on the best Bayesian information criterion (BIC). We estimated the emission probabilities which represent how likely an engagement state is to belong to each trajectory at the onset. To describe and visualize each of the discovered engagement trajectories, each trajectory was visualized using the same methods pre- viously used in the sequence mining step, i.e., index plot for longitudinal sequences, entropy plot for stability, as well as mean time plot for the mean time spent at each state. We used implicative statistics plot to visualize and identify the differences between trajectories. The implicative plot presents, at each course sequence, the relevance of the rule: being in this trajectory implies to be in this engagement state at this time point (course sequence) with the given confidence interval, the x-axis represents the strength of the rule. It does so by comparing the expected engagement state versus the counter states (unexpected). Such a method is a more rigorous way of identifying trajectories compared to the commonly implemented methods of visual inspection (Ritschard et al., 2013). The method and equation can be found in the work by Ritschard et al. (2013).

3.3.4. Survival analysis To answer RQ4, a Kaplan-Meier (KM) survival curve was performed. KM estimates the fraction of students who are still in the

program at each time point with the corresponding confidence interval. For each group, we report the number of events (of dropouts), the survival probability, the confidence interval, and the standard error. To compare the difference between trajectories, we performed the recommended tests (Blossfeld, Golsch, & Rohwer, 2007): Log-rank, Gehan, Tarone-Ware, and Peto-Peto. To estimate how each trajectory relates to performance (2nd part of RQ4) as measured Grade Point Average (GPA), we performed a Kruskal–Wallis ANOVA test, comparing the mean values of the GPA among the three trajectories (Ostertagova, Ostertag, & Kováč, 2014). To rigorously evaluate the magnitude of the obtained results, we calculated the epsilon-squared effect size (Tomczak & Tomczak, 2014). Post-hoc pairwise comparisons were performed through Dunn’s using Holm’s correction for multiple testing (Holm, 1979).

4. Results

The study included 106 students and 1396 course enrollments. Among all the students, 85 (80.2%) were able to reach the last year within the expected time. The initial log record, extracted from the LMS, included 1,052,807 events, which amounted to 790,956 after cleaning the non-learning-related events (e.g., clicks on profile and chats). The median number of events per course offering (in the final data set) was 19,591. The median number of forum consumption events was 48.5 per student per course offering; the median forum contribution per student per course offering was 50; the median number of sessions per student per course offering was 48, and the median duration of online time was 4.99 h per student per course offering. Table S2 contains detailed statistics of all indicators included in the study in each course.

RQ1: The grouping using LCA of students’ activity indicators in each course resulted in three distinct clusters, each representing a

different engagement state. Fig. 2 shows the mean values for each indicator among the clusters.

● The first cluster we refer to as Active, where students were highly engaged with online course materials. Their course activity was around the 7th decile in the number of active days indicator, the frequency of course browsing, the frequency of forum consumption, regularity, and session count. They were between the 6th and 7th decile in frequency of forum contribution and frequency of lecture viewing.

● The second cluster was named Average, representing students who were around the 5th decile for frequency of forum consumption, frequency of forum contribution, and frequency of lecture viewing. They were approximately halfway between the 4th and 5th decile for all other indicators. This group is averagely engaged with the collaborative tasks which are the most important tasks of the course.

● The third cluster, referred to as Disengaged, includes students who had lower activity levels than both other clusters, and all the indicators lie between the 3rd and 4th deciles, the frequency of forum consumption and frequency of forum contribution indicators are the lowest, which illustrate their disengagement with the collaborative task.

● A Withdraw group (not included in the clustering) instances when students applied formally to unenroll from the course because they did not have a representative digital footprint of their online activities.

The clusters differed the most in the variables that reflect students’ efforts and investment in learning and doing what was “beyond what is required” (regularity, active days, and course browsing and session count) with a strong effect size. The clusters differed less in the required components (forum consume, contribute, and lecture view), which showed overlap in most courses, yet with a moderate to strong effect size. Fig. 3 shows a boxplot color-coded by cluster for each variable in each course that shows where the variables were separated and where there was overlap. The descriptive statistics per course are shown in Table S2 (Appendix). To calculate the difference among clusters, we performed an ANOVA test (Table S1). The results showed that activity indicators significantly differed among the three clusters with a relatively strong effect size for all indicators except for frequency of forum contribute and frequency of lecture view, which were moderate. All post-hoc pairwise comparisons were statistically significant.

The residual degrees of freedom of the LCA for three classes were 1230; AIC was 28,720.35, and BIC was 29,435.54, which were the lowest among the ten tested classes. Two classes were higher, with (AIC = 31578.15, BIC = 32110.23), and so was four classes (AIC = 31652.59, BIC = 31862.49), BIC ranged from 31290.11 for five classes to 30379.67 for ten classes.

RQ2: Students’ course engagement states were plotted using sequence index plots (Fig. 4A), where each single student is represented as a

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

9

sequence of horizontal stacked bars denoting the student’s course engagement states in sequential order (1− 15) with different colors (green for Active, blue for Average, yellow for Disengaged, and red for Withdraw). To enhance readability, the states were ordered according to their similarity. The upper side of the graph shows a predominance of the green color, indicating the existence of a group of students who were —with few exceptions— engaged throughout the whole program. The middle part is dominated by the yellow color, blank spaces, and red color, indicating disengaged, dropouts, and withdrawn students. Lastly, the color blue is prevailing at the bottom of the graph, illustrating mostly Average students who show active and disengaged behavior occasionally. The sequence dis- tribution plot in Fig. 4B shows that the distribution of course engagement states along the program has a decreasing pattern of lower engagement states as students advance through the program; the decreasing pattern of disengaged and withdrawn students is probably due to dropouts who are no longer represented in the distribution, and therefore, the remaining students are relatively more active.

The within-student entropy (within-student stability of engagement states across the 15 courses) was low to moderate in most students with a mean of 0.43 (see Fig. 4C for the distribution of within-student entropy), indicating the relative stability of engagement states. To study to which extent students maintain a state of engagement between successive courses, ascend to a higher engagement state, or descend to a lower engagement state, we calculated the transition probabilities (Fig. 5). Students in the Active state had a 71% probability to remain Active in their next course; students in the Average state had a 62% probability of being Average in the next course, while students in the Disengaged state in a course had a 49% probability of remaining Disengaged. These findings indicate that Active students were the most stable and were the most likely to retain their state between courses, whilst Disengaged students were the most likely to change states with 39% probability of transitioning to Average, and 12% probability of being Active in the next course.1

Students with Average engagement states were more likely to move between course engagement states with 18% to Disengaged or 20% to Active. The most common transitions with the highest support (occurrence of the sequence at least once in each student) were transitions between Active and Average as well as Average and Disengaged in both directions. The most noteworthy ones that may need the attention of educators are those where students descended to the Disengaged state, which occurred as frequently as at least once in 45% of the students.

RQ3: As the visualization of the sequence index plot suggested the presence of three groups, we used HMM to reveal the possible hidden

states. The HMM with the lowest BIC revealed a three-cluster model as the best descriptor of students’ engagement trajectories at the program level (Fig. 6). We refer to these trajectories as (1) Mostly-engaged, (2) Intermediate, and (3) Troubled based on the included states.

Mostly-engaged trajectory

The Mostly-engaged trajectory included students who maintained an Active engagement state in most of the courses in the program and infrequently were averagely engaged. Active course engagement state was the dominating state (emission prob. = 81.57%), fol- lowed by Average course engagement (emission prob. = 16.8%). Students in this trajectory were more likely to remain therein for the whole duration of the program (prob. = 98%), with rare transitions to the Intermediate trajectory (transition prob. = 2%) and never to the Disengaged one (transition prob. ≅ 0%).

The sequence index plot of the Mostly-engaged trajectory shows the detailed sequences of students’ engagement states where each

Fig. 2. Mean value for each indicator in each of the identified engagement states.

1 Students who dropped out were not included in the calculation of transitions since dropping out is extensively covered in RQ4.

M. Saqr and S. López-Pernas

Computers& Education175(2021)104325

10

Fig. 3. Boxplot of the level each indicator of engagement for each course.

M . Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

11

line represents a single student path (Fig. 7A). Notably, the sequence plot is dominated by the Active course engagement state. The implication plot (Fig. 7B) confirms such findings with all the sequences above the 99% confidence interval as expected. The mean time plot (Fig. 7C) shows that students in this trajectory spent most of the program (a mean of 11.4 courses) with an Active course engagement state and much fewer courses with an Average engagement state (a mean of 3 courses). The within-student entropy was low (mean = 0.37), indicating a stable trajectory within each student of this group throughout the program. The between-students entropy plot (Fig. 7D) shows a relatively low level of entropy (median = 0.48) between students at different time points. Only a single student dropped, at the 13th course, for two courses.

Intermediate trajectory

The Intermediate trajectory included students dominated with the Average state, with infrequent fluctuations between higher

Fig. 4. A) Index sequence plot for the course engagement states. B) Overall sequence of course engagement states. C) Within-student entropy plot.

Fig. 5. A) Top 10 most common subsequences of engagement states. B) Transition plot for the engagement states.

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

12

engagement states and disengagement states. More specifically, students in this trajectory had an Average course-level engagement state throughout most of the program (emission prob. = 73.5%), followed by Active state (emission prob. = 13.28%) and Disengaged state (emission prob. = 13.26%) with close to equal probabilities. The Intermediate trajectory students were also most likely to remain in this trajectory for the whole duration of the program (transition prob. = 91%). They were more likely to transition to the Mostly- engaged trajectory (transition prob. = 1.96%) than to the Troubled trajectory (transition prob. = 1.29%).

The sequence index plot of the Intermediate trajectory was more heterogeneous than that of the Mostly-engaged trajectory (Fig. 8A). This trajectory had the Average course engagement state as its main pattern, with infrequent Active and Disengaged states. Notably, many students started in the Disengaged state in this trajectory, as confirmed by the implication plot (Fig. 8B), as it shows a statistically insignificant pattern for the first course, meaning that this group began with a diverse start, followed by a dominant and statistically significant average state thereafter. The mean time plot (Fig. 8C) shows that students in the Intermediate trajectory spent most of the program (a mean of 9.7 courses) with an Average course engagement state, and fewer courses with an Active engagement state (a mean of 2.5 courses), and even fewer with a Disengaged state (2.1 courses on average). The within-student entropy was relatively higher than

Fig. 6. Hidden Markov Model showing three hidden states for program-level engagement, the colors represent the composition of each trajectory.

Fig. 7. A) Index plot, B) Implication plot, C) Mean time, and D) Entropy index plot, for the Mostly-engaged trajectory.

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

13

the Mostly-engaged trajectory, indicating that students in this trajectory were relatively unstable. The between-students entropy plot (Fig. 8D) shows a high level of entropy (median = 0.6), denoting high diversity among the trajectory members. Five students dropped out in this trajectory, mostly after the second year.

Troubled trajectory

Lastly, the Troubled trajectory included students who mostly showed a Disengaged state (emission prob. = 68.63%), followed by Average engagement state (emission prob. = 21.9%), and less so by Active engagement state (emission prob. = 4.89%). Students in the Troubled trajectory were most likely to remain so throughout the program, (transition prob. = 90.85%), ascend to the Intermediate trajectory (transition prob. = 7.67%), and rarely to the Mostly-engaged one (transition prob. = 2.48%). A total of 15 out of the 21 students who withdrew from the courses belonged to this troubled trajectory.

The sequence index plot of the Troubled trajectory (Fig. 9A) shows a diverse heterogeneous engagement pattern, where 10% of the students started in an Active engagement state, 37% in an Average engagement state, 44% in a Disengaged state, and two students withdrew. The next courses were mostly dominated by the Disengaged state as well as frequent absent sequences, indicating dropping out, which is evident as early as the second course where five students withdrew. Another two dropped out in the third course, and two more in the fourth. The process of dropping out slowed down later as evinced by the fact that only two more students dropped out in the ninth and the last course. The mean time plot (Fig. 9C) shows that students spent most courses with a Disengaged state (a mean of 5.3 courses), a few courses with an Average engagement state (mean of 2.8), and barely any with an Active engagement state (mean of 0.7).

The mean within-student entropy was 0.33 —the lowest of all trajectories— indicating a fairly stable trajectory. It might be partially attributable to the shorter segments of students’ sequences due to dropping out. The between-student entropy plot (Fig. 9D) shows a high level of entropy (median = 0.6). This level of entropy is similar to that of the Intermediate trajectory.

RQ4: How far a student would persist (“survive”, not drop out) in the program can be estimated through survival analysis (Fig. 10).

Students in the Mostly-engaged trajectory were the most likely to persist, with a survival probability of 0.97 (CI = [0.92:1.00]), followed by those in the Intermediate trajectory, with a survival probability of 0.88 (CI = [0.79:0.99]), and lastly by those in the Troubled one, who had a survival probability of 0.64 (CI = [0.51:0.81]). The dropouts in the Troubled trajectory occurred mostly in the first five courses and rarely after the 8th course, emphasizing the need for early intervention in the program. The Log-rank, Gehan, Tarone- Ware, Peto-Peto for between trajectory differences were all statistically significant with p < 0.001, demonstrating the statistical

Fig. 8. A) Index plot, B) Implication plot, C) Mean time, and D) Entropy index plot, for the Intermediate trajectory.

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

14

significance of the likelihood that being disengaged would likely lead to drop out. Table S3 (Appendix) contains the full detailed KM survival curve.

As shown in Fig. 11, students in the Mostly-engaged trajectory were the highest achievers (mean GPA = 87.44%), followed by students in the Intermediate trajectory (mean GPA = 81.75%), and lastly by those in the Troubled trajectory (mean GPA = 78.59%). The GPAs of students in the Mostly-engaged trajectory were significantly higher than those of the two other trajectories. However, no statistically significant difference was found between the Intermediate and Troubled trajectories in terms of GPA. Since the GPA is estimated at the end of the program, the dropped-out students were not included, which may explain why intermediate and troubled students had comparable grades. The results including all the students can be seen in Figure S1 of the Appendix.

5. Discussion

Our study sought to investigate students’ engagement in the context of blended learning across a four-year program. We have looked at the frequency of learners’ engagement states, stability and transitions at the course level, and the trajectories of engagement, their stability, and relationship with drop-out.

Regarding engagement states at the course level and their defining features (RQ1), the clustering with the LCA algorithm identified three distinct clusters: active or intensively engaged (with activity level around the 7th decile), average with activity level just below the 5th decile, and disengaged with activity level around the 3rd decile. The three identified subgroups of engagement are in tandem with previous studies (Jovanović et al., 2020, 2017; Kovanović et al., 2019; López-Pernas, Saqr, & Viberg, 2021; Matcha, Gašević, Uzir, Jovanović, & Pardo, 2019). The engaged group was described as highly engaged by Kovanović, Gašević, Joksimović, Hatala, and Adesope (2015), as Intensive by Barthakur et al. (2021), as highly active by Mirriahi, Jovanovic, Dawson, Gaševic, and Pardo (2018), and as intense by Jovanović, Gašević, Dawson, Pardo, and Mirriahi (2017). Expectedly, the disengaged group has been described by all the reviewed literature. Such disengaged students had levels of activities that were comparatively lower than their peers. The average group was in an intermediate state of engagement. Such a state has been referred to as get-it-done by Barthakur et al. (2021) or selective (Jovanović et al., 2017; Kovanović et al., 2019). The average group is typically moderately active, directs their energy to what is required to pass, and usually completes the course in most cases and so was the case with our results.

The subgroups identified in our study (or engagement states) differed significantly especially in the regularity and activity variables (variables that reflect investment in learning). In other words, highly engaged students have been the highest active group in all the regularity and activity indicators, and average engagement has been intermediate, etc. The essential activities (forum contribute and lecture view) showed a narrow difference among clusters, which is in agreement with the literature that students with lower levels of

Fig. 9. A) Index plot, B) Implication plot, C) Mean time, and D) Entropy index plot, for the Troubled trajectory.

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

15

Fig. 10. Kaplan-Meier curve for retention.

Fig. 11. Violin plot comparing the GPA of students who completed the program among the three engagement trajectories through Kruskal-Wallis test and Dunn pairwise test.

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

16

engagement are active in the assessed components (e.g., Barthakur et al., 2021; Gašević et al., 2017; Matcha, Gašević, Ahmad Uzir et al., 2019a,b; Mirriahi et al., 2018). In turn, highly engaged students do “beyond the required” and invest quality time and efforts in learning (Fredricks et al., 2004; Skinner & Pitzer, 2012; Trowler, 2010) and, therefore, they engage intensively with learning resources (Barthakur et al., 2021; Gašević et al., 2017; Matcha, Gašević, Uzir, Jovanović, & Pardo, 2019). In summary, our results are com- parable to previous research in showing three levels of engagement on the course level. The characteristics of the engagement states reflect that highly engaged students are more invested in learning, doing beyond what is required, while disengaged students have low levels of activities and focus on the assessed resources.

In RQ2, we aimed to explore how stable the learners’ course engagement states are. Our results revealed interesting findings: within- student engagement changes were uncommon as evidenced by the low entropy values (an indication of regularity) indicating a relatively stable trajectory of engagement, especially in engaged students. A student in an Active engagement state (highly engaged) has a 71% probability to continue to be Active in the next course. A student in an Average engagement state has a probability to continue to be Average of 62%, and the Disengaged students have a probability to remain Disengaged of 49% (fluctuations were more common in early years, please see RQ4 for further elaboration). These results emphasized the previous findings in face-to-face settings that engagement shows higher cross-time correlations between courses, i.e., students are likely to retain their engagement state between courses. The cross-course correlation was more evident in the Active subgroup of students (Gottfried et al, 2001, 2007; Janosz et al., 2008; Skinner & Belmont, 1993; Skinner & Pitzer, 2012).

Results also highlighted more transitions in the less engaged states, where Disengaged students were more likely to ascend to an Average engagement state (39%), and Average students were likely to ascend to an Active engagement state (20%). In fact, 47% of the students had —at least once during the program— ascended from Disengaged to Average, and 58% has ascended from Average to Active (Fig. 4). These findings contradict the view that engagement is strictly stable and highlight the vast potential for positive intervention and the students’ efforts to so do. Our results also raise an alarm that some students may descend to a Disengaged state: 18% of average transitioned to disengaged, and at least 45% of the students had at least transitioned to disengaged state at least once during the program. Compared to Barthakur et al. (2021), transitions in our study were less frequent, which could be explained by the difference in settings since, in MOOCs, students have higher possibilities of choice and fewer obligations to commit to the program. It is important to emphasize that the transitions between engagement states depend on the accuracy of the data, processing, and clustering methods. Therefore, some of the transitions may be the results of e.g., inaccurate assignments by LCA. This is of special relevance for the interpretation of the results of the students in the intermediate trajectory, which showed more frequent transitions.

The previous two steps have been informative; however, they were intermediate steps for the primary goal of our study which is to study the longitudinal program-level trajectories of learners’ engagement (RQ3). To do so, we applied HMM to investigate the possibility to cluster students’ engagement trajectories into homogeneous clusters and study these clusters. HMM is an established method in life- event studies similar to our context (Helske et al, 2015, 2018; Jeong et al., 2008). HMM identified three distinct trajectories: a tra- jectory that is Mostly-engaged, an Intermediate trajectory, and a Troubled one, confirming the heterogeneous nature of engagement and the presence of well-defined subgroups (Archambault & Dupéré, 2017; Zhen et al., 2020). Students in the Mostly-engaged trajectory were very stable with 98% probability to stay in the same trajectory throughout the whole program, as were the Intermediate group (97%) and the Troubled group 91%. The Mostly-engaged group showed the highest between-student stability (indicating a homoge- neous group), and high within-student stability pointing to very homogeneous and stable students who are less likely to be disengaged or require help. The Intermediate trajectory showed lower stability within and between students, emphasizing the fluidity of the group. Such a group may require support to guard against a possible disengagement. The Disengaged group showed early and frequent dropouts, emphasizing the need for early support. Therefore, educators should be alarmed those average levels of engagement are not far from disengaged or dropping out.

While the previous research on the long-term evolution of students’ engagement has been performed in face-to-face settings (Archambault & Dupéré, 2017; Zhen et al., 2020), our results in a blended learning context were comparable. On the one hand, Archambault and Dupéré (2017) have identified an engaged trajectory which they referred to as “Stable High Trajectory” in 71% of the students, and Zhen et al. (2020) identified a “Persistent engagement group” in 71% of the students, these ratios are similar to our study if we combine the Mostly-engaged and Intermediate trajectories (72%). Combining both trajectories as “Engaged” can be justified by the fact that these students maintained a reasonable level of engagement, remained enrolled, and mostly completed the program in time. Furthermore, research has shown that a significant number of students use online learning to “get-it-done”, i.e., students remain engaged but use the online learning environment just as needed. In a similar way, the combined engaged group in our study also resembled in size the “consistent” and “get-it-done” groups reported by (Barthakur et al., 2021), which accounted for around 67% of students if combined. In summary, one can conclude that roughly two thirds of students maintain at least a reasonable stable state of engagement, both in face-to-face and online settings, that enables them to complete the program. Such findings highlight the value of using learning analytics methods as an unobtrusive way of inferring the state of engagement of students.

Our study has shown that engagement trajectories are relatively stable in students in the engaged subgroups, corroborating pre- vious works (Archambault & Dupéré, 2017; Zhen et al., 2020) that have reported similar stable homogeneous subgroups. However, Archambault and Dupéré (2017) reported more frequent transitions between engagement trajectories. A declining pattern has been observed in the disengaged students in our study since 20% of the students dropped out, a number similar to the “Descending engagement group” of 17% in the work by Zhen et al. (2020) and higher than 8% of the declining groups found by Archambault and Dupéré (2017). The differences can be explained by the different context and measurement methods. A noteworthy observation about the Troubled trajectory is that the first course has been dominated by Average and Active engagement states, which was sooner followed by troubled trajectories of Disengaged states. In a similar way, dropping out in this group occurred mostly in the first five courses. The survival analysis (RQ4) confirmed these findings; that is, that the troubled trajectory was the least likely to complete the program with

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

17

a probability of 0.64 which is markedly below that of the Mostly-engaged (0.97) and Intermediate (0.88) groups. Not surprisingly, the Troubled trajectory scored the lowest grades, a finding that has been repeatedly confirmed in face-to-face and online learning settings (Barthakur et al., 2021; Gašević et al., 2017; Lei et al., 2018). However, the difference was not statistically significant for the inter- mediate trajectory, since students who dropped out were not considered and it is expected that students who graduate are better achievers. Another analysis (Figure S1), which considered the average of all grades available for each student, showed this difference to be statistically significant and around 11 points.

Regarding the methods used in the present work, heterogeneity has been a common phenomenon of learners’ behavior’s leading researchers to use different clustering methods for the discovery of subsets of populations (Barthakur et al., 2021; Carroll & White, 2017; Jang et al., 2017; Park et al., 2016; Parpala, Lindblom-Ylänne, Komulainen, Litmanen, & Hirsto, 2010; Quirk, Nylund-Gibson, & Furlong, 2013; Rienties et al., 2019; Tempelaar, Nguyen, & Rienties, 2020). LCA has been one of such methods that were recom- mended to account for the heterogeneous and multidimensional nature of learning and learners’ behavior (Hickendorff et al., 2018). For instance, Parpala et al. (2010) used LCA to capture students’ approach to learning, the authors reported the presence of four clusters of distinct profiles. Quirk et al. (2013) used LCA to find subgroups or distinct profiles of children’s school readiness using their social, emotional, and cognitive qualities. The authors were able to find five subgroups of children which were related to their school performance. While the previous examples have used categorical survey data, LCA was recently extended to students’ online data. For instance, LCA was used to extract clusters of features of online courses (Park et al., 2016), to discover subgroups within students’ online behavior (Carroll & White, 2017), trajectories of learning strategies (Mirriahi et al., 2018), and to model the longitudinal engagement of students in MOOCs (Barthakur et al., 2021). LCA has the advantage of requiring no assumptions of normal distribution, linearity of data, or homogeneity. Therefore, LCA can be applied to a wide range of data. While initial implementations of LCA required categorical data, LCA has been extended to continuous variables (Hickendorff et al., 2018).

The data processing in our study, i.e., using a discretization method (deciles, or binning in general) in modelling a longitudinal process over a long time might be justified in terms of overcoming variations (Alves et al., 2017; Dewar et al., 2021; Miyamoto et al., 2015) and facilitating communication of results to stakeholders. The disadvantage of discretizing variables of educational data is yet to be settled as most research has not offered a head-to-head comparison. In fact, while discretizing the data may lead to information loss, evidence exists that, in some cases, it improves prediction by, e.g., reducing the noise. Further evidence comes from the work of (Saqr et al., 2017) who compared “raw frequency” indicators to a discretized form. The authors reported that the discretized indicators performed slightly and consistently better than “raw frequency”. In a massive and recent study by Jovanović, Saqr, Joksimović, and Gašević (2021), the authors found that the activity indicators (based on a discretized indicators of daily or weekly access to neutralize the “non-uniform, bursty temporal patterns in interaction with online learning resources”) have comparable predictive power to the raw frequencies.

The context and the nature of the program in healthcare with online PBL had some implications in the collection of the data and analysis. Healthcare education tends to be demanding and therefore a large proportion of students were engaged with the “essential components” which were the forums and lectures as they are related to assessment. Therefore, we had to include other metrics such as regularity and active days. Such metrics, which can be considered “course-agnostic”, have proven useful in differentiating between student’s engagement states.

6. Conclusions and implications

Our study has several implications for educators and stakeholders: students who show a level of activity in the lower three deciles of online activities of their peers early in the program may be at risk of disengagement and dropping out. Longitudinal engagement is a heterogenous process with subgroups of engagement patterns that manifest as “trajectories”. While such trajectories are relatively stable in the highly engaged students who proceed usually as they started, the intermediately and disengaged students had more fluctuations. That is, students with early average engagement continued or declined. Similarly, but less likely, some students who were early disengaged improved but more frequently dropped out. It should be stressed here that disengagement in a course has implications on the next course and may culminate in dropping out of the program. Another implication of our study is that the longitudinal trajectories of learning behaviors and dispositions need to be investigated (e.g., motivation, self-regulation), and should receive due attention from researchers and educators especially in online environments. Third, we believe that the methods presented in this study are relevant to a wide range of applications and represent an addition to the repertoire of learning analytics methods.

7. Limitations

Our study is not without limitations. First, the findings of the study may be limited to the context; therefore, generalization may require further investigation. Second, our study is limited to online engagement, which allows easier access to recorded behavioral clickstream data. As such, other aspects of engagement have not been reflected in our data (e.g., affective engagement). Third, the students’ clustering accuracy is limited by the quality of the data, and more specifically how far the LMS data reveal about the students. Using binning for discretizing the data may have an impact on the accuracy of clustering. In particular, LCA may consider variables with small differences at either bin boundary as different, while variables with a larger difference in the same bin as similar. Addi- tionally, the clustering algorithm is far from perfect. LCA assigns a student to a single category in each course and therefore, students at either side of the cluster’s boundary or with close probabilities (e.g., 0.49 and 0.51) would be classified in distinct clusters. Such decisions may result in Type I errors (i.e., the error of rejecting a true hypothesis) when our method does not identify a student as disengaged while he/or she is. Missing a disengaged student would result in a lack of support for that student and therefore, may drop

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

18

out. Type II error (identifying a student as disengaged) while he or she is not, and thus offering needless support which stretches the resources of the institution. Therefore, the results of our study should not be considered “diagnostic” of an engagement state, but rather a trial to understand the process and raise awareness of the longitudinal aspects of it. Our study has been conducted with 106 students. Of them, only 85 students completed the program. Such a small number of students is an obvious limitation that needs to be considered when reading our results. Lastly, the confidence interval for emission probabilities was not calculated due to estimation difficulties concerning the reliabilities of these confidence estimates.

Ethical approval

The study received ethical approval ID:6065 from the University Ethical Board.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.compedu.2021.104325.

Authors’ contributions

MS has contributed to the idea conceptualization, research design, and planning. MS has performed data collection. MS and SLP have contributed to the methods, data analysis and reporting of results and visualization. All graphics and illustrations have been created by SLP. MS and SLP have contributed to manuscript writing and revision. The authors read and approved the final manuscript.

References

Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. Proceedings of the Eleventh International Conference on Data Engineering, 3–14. Alves, P., Morais, C., & Miranda, L. (2017). Learning analytics in higher education: Assessing learning outcomes. In M. A. Peres P, & A. Mesquita (Eds.), Proceedings of

the European Conference on e-Learning, ECEL (Vols. 2010-Octob (pp. 25–32). Academic Conferences Limited. Archambault, I., & Dupéré, V. (2017). Joint trajectories of behavioral, affective, and cognitive engagement in elementary school. The Journal of Educational Research,

110(2), 188–198. https://doi.org/10.1080/00220671.2015.1060931 Azcona, D., Hsiao, I.-H., & Smeaton, A. F. (2019). Detecting students-at-risk in computer programming classes with learning analytics from students’ digital footprints.

User Modeling and User-Adapted Interaction, 29(4), 759–788. Azevedo, R. (2015). Defining and measuring engagement and learning in science: Conceptual, theoretical, methodological, and analytical issues. Educational

Psychologist, 50(1), 84–94. https://doi.org/10.1080/00461520.2015.1004069 Barthakur, A., Kovanovic, V., Joksimovic, S., Siemens, G., Richey, M., & Dawson, S. (2021). Assessing program-level learning strategies in MOOCs. Computers in

Human Behavior, 117(January 2020), 106674. https://doi.org/10.1016/j.chb.2020.106674 Blossfeld, H. P., Golsch, K., & Rohwer, G. (2007). Event history analysis with stata. In Event history analysis with stata. Routledge. https://doi.org/10.4324/

9780203936559. Bol, L., & Garner, J. K. (2011). Challenges in supporting self-regulation in distance education environments. Journal of Computing in Higher Education, 23(2), 104–123. Bos, N. (2016). Student differences in regulation strategies and their use of learning resources : Implications for educational design. Learning Analytics and Knowledge

(LAK’16), 27–29. April 2016, 344–353. Carroll, P., & White, A. (2017). Identifying patterns of learner behaviour: What business statistics students do with learning resources. INFORMS Transactions on

Education, 18(1), 1–13. https://doi.org/10.1287/ited.2016.0169 Cornwell, B. (2018). Network analysis of sequence structures. https://doi.org/10.1007/978-3-319-95420-2_7 Dabbagh, N., & Kitsantas, A. (2004). Supporting self-regulation in student-centered web-based learning environments. International Journal on E-Learning, 3(1), 40–47. Dewar, A., Hope, D., Jaap, A., & Cameron, H. (2021). Predicting failure before it happens: A 5-year, 1042 participant prospective study. Medical Teacher, 1–9. https://

doi.org/10.1080/0142159X.2021.1908526, 0(0. Escueta, M., Quan, V., Nickow, A. J., & Oreopoulos, P. (2017). Education technology: An evidence-based review. Finn, J. D., & Zimmer, K. S. (2012). Student engagement: What is it? Why does it matter?. Handbook of research on student engagement. Springer US. https://doi.org/

10.1007/978-1-4614-2018-7_5 Fredricks, J. A., Blumenfeld, P. C., & Paris, A. H. (2004). School engagement: Potential of the concept, state of the evidence. Review of Educational Research, 74(1),

59–109. https://doi.org/10.3102/00346543074001059 Fredricks, J. A., & McColskey, W. (2012). The measurement of student engagement: A comparative analysis of various methods and student self-report instruments. In

S. L. Christenson, A. L. Reschly, & C. Wylie (Eds.), Handbook of research on student engagement (pp. 763–782). Springer US. https://doi.org/10.1007/978-1-4614- 2018-7_37.

Furrer, C., & Skinner, E. (2003). Sense of relatedness as a factor in children’s academic engagement and performance. Journal of Educational Psychology, 95(1), 148–162. https://doi.org/10.1037/0022-0663.95.1.148

Gabadinho, A., Ritschard, G., Müller, N. S., & Studer, M. (2011). Analyzing and visualizing state sequences in R with TraMineR. Journal of Statistical Software, 40(4). https://doi.org/10.18637/jss.v040.i04

Gabadinho, A., Ritschard, G., Studer, M., & Nicolas, S. M. (2009). Mining sequence data in R with the TraMineR package: A users guide for version 1.2 (Vol. 1). Geneva: University of Geneva.

Gašević, D., Jovanović, J., Pardo, A., & Dawson, S. (2017). Detecting learning strategies with analytics: Links with self-reported measures and academic performance. Journal of Learning Analytics, 4(2), 113–128. https://doi.org/10.18608/jla.2017.42.10

Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61(2), 215–231. Gottfried, A. E., Fleming, J. S., & Gottfried, A. W. (2001). Continuity of academic intrinsic motivation from childhood through late adolescence: A longitudinal study.

Journal of Educational Psychology, 93(1), 3. Gottfried, A. E., Marcoulides, G. A., Gottfried, A. W., Oliver, P. H., & Guerin, D. W. (2007). Multivariate latent change modeling of developmental decline in academic

intrinsic math motivation and achievement: Childhood through adolescence. International Journal of Behavioral Development, 31(4), 317–327. Hagenaars, & Jacques A, A. L. M. (2002). Applied latent class Analysis. In J. A. Hagenaars, & A. L. McCutcheon (Eds.), Cambridge university press. Cambridge University

Press. https://doi.org/10.1017/cbo9780511499531.

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

19

Helske, S., & Helske, J. (2019). Mixture hidden Markov models for sequence data: The seqhmm package in R. Journal of Statistical Software, 88(1). https://doi.org/ 10.18637/jss.v088.i03

Helske, S., Helske, J., & Eerola, M. (2018). In G. Ritschard, & M. Studer (Eds.), 185–200)Combining sequence analysis and hidden Markov models in the analysis of complex life sequence data. Springer International Publishing. https://doi.org/10.1007/978-3-319-95420-2_11.

Helske, S., Steele, F., Kokko, K., Räikkönen, E., & Eerola, M. (2015). Partnership formation and dissolution over the life course: Applying sequence analysis and event history analysis in the study of recurrent events. Longitudinal and Life Course Studies, 6(1), 1–25. https://doi.org/10.14301/llcs.v6i1.290

Henrie, C. R., Halverson, L. R., & Graham, C. R. (2015). Measuring student engagement in technology-mediated learning: A review. Computers & Education, 90, 36–53. https://doi.org/10.1016/j.compedu.2015.09.005

Hickendorff, M., Edelsbrunner, P. A., McMullen, J., Schneider, M., & Trezise, K. (2018). Informative tools for characterizing individual differences in learning: Latent class, latent profile, and latent transition analysis. Learning and Individual Differences, 66(November 2017), 4–15. https://doi.org/10.1016/j.lindif.2017.11.001

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70. Jang, E. E., Lajoie, S. P., Wagner, M., Xu, Z., Poitras, E., & Naismith, L. (2017). Person-oriented approaches to profiling learners in technology-rich learning

environments for ecological learner modeling. Journal of Educational Computing Research, 55(4), 552–597. https://doi.org/10.1177/0735633116678995 Janosz, M., Archambault, I., Morizot, J., & Pagani, L. S. (2008). School engagement trajectories and their differential predictive relations to dropout. Journal of Social

Issues, 64(1), 21–40. Jeong, H., Gupta, A., Roscoe, R., Wagster, J., Biswas, G., & Schwartz, D. (2008). Using hidden Markov models to characterize student behaviors in learning-by-

teaching environments. International Conference on Intelligent Tutoring Systems, 614–625. Jovanović, J., Dawson, S., Joksimović, S., & Siemens, G. (2020). Supporting actionable intelligence: Reframing the analysis of observed study strategies. Proceedings of

the Tenth International Conference on Learning Analytics & Knowledge, 161–170. https://doi.org/10.1145/3375462.3375474 Jovanović, J., Gašević, D., Dawson, S., Pardo, A., & Mirriahi, N. (2017). Learning analytics to unveil learning strategies in a flipped classroom. The Internet and Higher

Education, 33, 74–85. https://doi.org/10.1016/j.iheduc.2017.02.001 Jovanović, J., Mirriahi, N., Gašević, D., Dawson, S., & Pardo, A. (2019). Predictive power of regularity of pre-class activities in a flipped classroom. Computers and

Education. https://doi.org/10.1016/j.compedu.2019.02.011 Jovanović, J., Saqr, M., Joksimović, S., & Gašević, D. (2021). Students matter the most in learning analytics: The effects of internal and instructional conditions in

predicting academic success. Computers & Education, 172(April), 104251. https://doi.org/10.1016/j.compedu.2021.104251 King, R. B. (2015). Sense of relatedness boosts engagement, achievement, and well-being: A latent growth model study. Contemporary Educational Psychology, 42,

26–38. Kovanović, V., Gašević, D., Joksimović, S., Hatala, M., & Adesope, O. (2015). Analytics of communities of inquiry: Effects of learning technology use on cognitive

presence in asynchronous online discussions. The Internet and Higher Education, 27, 74–89. https://doi.org/10.1016/j.iheduc.2015.06.002 Kovanović, V., Joksimović, S., Poquet, O., Hennis, T., de Vries, P., Hatala, M., et al. (2019). Examining communities of inquiry in Massive Open Online Courses: The

role of study strategies. The Internet and Higher Education, 40, 20–43. https://doi.org/10.1016/j.iheduc.2018.09.001 Lei, H., Cui, Y., & Zhou, W. (2018). Relationships between student engagement and academic achievement: A meta-analysis. Social Behavior and Personality, 46(3),

517–528. https://doi.org/10.2224/sbp.7054 Li, Y., & Lerner, R. M. (2011). Trajectories of school engagement during adolescence: Implications for grades, depression, delinquency, and substance use.

Developmental Psychology, 47(1), 233–247. https://doi.org/10.1037/a0021307 López-Pernas, S., Gordillo, A., Barra, E., & Quemada, J. (2021). Escapp: A web platform for conducting educational escape rooms. IEEE Access, 9, 38062–38077.

https://doi.org/10.1109/ACCESS.2021.3063711 López-Pernas, S., Saqr, M., & Viberg, O. (2021). Putting it all together: Combining learning analytics methods and data sources to understand students’ approaches to

learning programming. Sustainability, 13(9). https://doi.org/10.3390/su13094825 Lust, G., Elen, J., & Clarebout, G. (2013). Regulation of tool-use within a blended course: Student differences and performance effects. Computers & Education, 60(1),

385–395. https://doi.org/10.1016/j.compedu.2012.09.001 Malin, L., & Wise, R. (2016). Glass ceilings , escalators and revolving Doors : Comparing gendered occupational trajectories and the upward mobility of men and women in and

women in west Germany. June, 0–74. Matcha, W., Gašević, D., Uzir, N. A., Jovanović, J., & Pardo, A. (2019a). Analytics of learning strategies: Associations with academic performance and feedback.

Proceedings of the 9th International Conference on Learning Analytics & Knowledge, 461–470. https://doi.org/10.1145/3303772.3303787 Matcha, W., Gašević, D., Uzir, N. A., Jovanović, J., Pardo, A., Maldonado-Mahauad, J., et al. (2019b). In M. Scheffel, J. Broisin, V. Pammer-Schindler, A. Ioannou, &

J. Schneider (Eds.), Issue septemberDetection of learning strategies: A comparison of process, sequence and network analytic approaches (pp. 525–540). Springer International Publishing. https://doi.org/10.1007/978-3-030-29736-7_39.

McCutcheon, A. L. (1987). Latent class analysis (Issue 64). Sage. Mirriahi, N., Jovanovic, J., Dawson, S., Gaševic, D., & Pardo, A. (2018). Identifying engagement patterns with video annotation activities: A case study in professional

development. Australasian Journal of Educational Technology, 34(Issue 1), 57–72. Miyamoto, Y. R., Coleman, C. A., Williams, J. J., Whitehill, J., Nesterko, S., & Reich, J. (2015). Beyond time-on-task: The relationship between spaced study and

certification in MOOCs. Journal of Learning Analytics, 2(Issue 2), 47–69. Ostertagova, E., Ostertag, O., & Kováč, J. (2014). Methodology and application of the Kruskal-Wallis test. Applied Mechanics and Materials, 611, 115–120. Pagani, L. S., Fitzpatrick, C., & Parent, S. (2012). Relating kindergarten attention to subsequent developmental pathways of classroom engagement in elementary

school. Journal of Abnormal Child Psychology, 40(5), 715–725. https://doi.org/10.1007/s10802-011-9605-4 Palvia, S., Aeron, P., Gupta, P., Mahapatra, D., Parida, R., Rosner, R., et al. (2018). Online education: Worldwide status, challenges, trends, and implications. Journal

of Global Information Technology Management, 21(4), 233–241. https://doi.org/10.1080/1097198X.2018.1542262 Pardo, A., Mirriahi, N., Dawson, S., Zhao, Y., Zhao, A., & Gašević, D. (2015). Identifying learning strategies associated with active use of video annotation software (pp.

255–259). https://doi.org/10.1145/2723576.2723611 Park, Y., Yu, J. H., & Jo, I.-H. (2016). Clustering blended learning courses by online behavior data: A case study in a Korean higher education institute. The Internet and

Higher Education, 29, 1–11. https://doi.org/10.1016/j.iheduc.2015.11.001 Parpala, A., Lindblom-Ylänne, S., Komulainen, E., Litmanen, T., & Hirsto, L. (2010). Students’ approaches to learning and their experiences of the teaching-learning

environment in different disciplines. British Journal of Educational Psychology, 80(2), 269–282. https://doi.org/10.1348/000709909X476946 Quirk, M., Nylund-Gibson, K., & Furlong, M. (2013). Exploring patterns of Latino/a children’s school readiness at kindergarten entry and their relations with Grade 2

achievement. Early Childhood Research Quarterly, 28(2), 437–449. https://doi.org/10.1016/j.ecresq.2012.11.002 Redmond, P., Heffernan, A., Abawi, L., Brown, A., & Henderson, R. (2018). An online engagement framework for higher education. Online Learning, 22(1), 183–204.

https://doi.org/10.24059/olj.v22i1.1175 Rienties, B., Tempelaar, D., Nguyen, Q., & Littlejohn, A. (2019). Unpacking the intertemporal impact of self-regulation in a blended mathematics environment.

Computers in Human Behavior, 100(June), 345–357. https://doi.org/10.1016/j.chb.2019.07.007 Ritschard, G., & Studer, M. (2018). Sequence analysis: Where are we, where are we going?. https://doi.org/10.1007/978-3-319-95420-2_1, 1-11. Ritschard, G., Studer, M., Buergin, R., Gabadinho, A., Muller, N., & Rousset, P. (2013). TraMineRextras: Extras for use with the TraMineR packages. Geneva: CRAN.

Https://Cran. r-Project. Org/, 26.08. 2020. Rosato, N. S., & Baer, J. C. (2012). Latent class analysis: A method for capturing heterogeneity. Social Work Research, 36(1), 61–69. https://doi.org/10.1093/swr/

svs006 Rosenberg, J., Beymer, P., Anderson, D., van Lissa, C.j., & Schmidt, J. (2018). tidyLPA: An R package to easily carry out latent profile Analysis (LPA) using open-source

or commercial software. Journal of Open Source Software, 3(30), 978. https://doi.org/10.21105/joss.00978 Sander, P., & Services, I. (2016). Using learning analytics to predict academic outcomes of first-year students in higher education. In CAPSTONE REPORT pete Sander

manager (Vol. 1277, pp. 2–41). Information Services Oregon State University University of Oregon Applied Information Management Program Spring, 800.

M. Saqr and S. López-Pernas

Computers & Education 175 (2021) 104325

20

Saqr, M., Fors, U., & Tedre, M. (2017). How learning analytics can early predict under-achieving students in a blended medical education course. Medical Teacher, 39 (7), 757–767. https://doi.org/10.1080/0142159X.2017.1309376

Saqr, M., Nouri, J., & Jormanainen, I. (2019). A learning analytics study of the effect of group size on social dynamics and performance in online collaborative learning. In M. Scheffel, J. Broisin, V. Pammer-Schindler, A. Ioannou, & J. Schneider (Eds.), Lecture notes in computer science (Vol. 11722, pp. 466–479). Cham: Springer. https://doi.org/10.1007/978-3-030-29736-7_35.

Saqr, M., Viberg, O., & Vartiainen, H. (2020). Capturing the participation and social dimensions of computer-supported collaborative learning through social network analysis: Which method and measures matter? International Journal of Computer-Supported Collaborative Learning, 15(2), 227–248. https://doi.org/10.1007/ s11412-020-09322-6

Saqr, M., & Wasson, B. (2020). COVID-19: Lost opportunities and lessons for the future. International Journal of Health Sciences, 14(3), 4–6. https://pubmed.ncbi.nlm. nih.gov/32536841.

Schöbel, S., Janson, A., Jahn, K., Kordyaka, B., Turetken, O., Djafarova, N., et al. (2020). A research agenda for the why, what, and how of gamification designs – results on an ECIS 2019 panel. Communications of the association for information systems. https://doi.org/10.17705/1CAIS.04630

Skinner, E. A., & Belmont, M. J. (1993). Motivation in the classroom: Reciprocal effects of teacher behavior and student engagement across the school year. Journal of Educational Psychology, 85(4), 571–581. https://doi.org/10.1037/0022-0663.85.4.571

Skinner, E. A., & Pitzer, J. R. (2012). Developmental dynamics of student engagement, coping, and everyday resilience. In S. L. Christenson, A. L. Reschly, & C. Wylie (Eds.), Handbook of research on student engagement (pp. 21–44). Springer US. https://doi.org/10.1007/978-1-4614-2018-7_2.

Tempelaar, D., Nguyen, Q., & Rienties, B. (2020). Learning analytics and the measurement of learning engagement (pp. 159–176). https://doi.org/10.1007/978-3-030- 47392-1_9

Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in Sport Sciences, 1(21), 19–25.

Trowler, V. (2010). Student engagement literature review. Higher Education, 1. https://doi.org/10.1037/0022-0663.85.4.571. –15, November. Vieira, V. A. (2011). Experimental designs using ANOVA. In Revista de Administração contemporânea (Vol. 15). Belmont, CA: Thomson/Brooks/Cole. https://doi.org/

10.1590/s1415-65552011000200016. Issue 2. Wang, M.-T., & Degol, J. (2014). Staying engaged: Knowledge and research needs in student engagement. Child Development Perspectives, 8(3), 137–143. https://doi.

org/10.1111/cdep.12073 Wang, M.-T., & Eccles, J. S. (2013). School context, achievement motivation, and academic engagement: A longitudinal study of school engagement using a

multidimensional perspective. Learning and Instruction, 28, 12–23. https://doi.org/10.1016/j.learninstruc.2013.04.002 Weller, B. E., Bowen, N. K., & Faubert, S. J. (2020). Latent class Analysis: A guide to best practice. Journal of Black Psychology, 46(4), 287–311. https://doi.org/

10.1177/0095798420930932 Wigfield, A., Eccles, J. S., Schiefele, U., Roeser, R. W., & Davis-Kean, P. (2007). Development of achievement motivation. Handbook of Child Psychology, 3. Winne, P. H. (2020). Construct and consequential validity for learning analytics based on trace data. Computers in Human Behavior, 112, 106457. https://doi.org/

10.1016/j.chb.2020.106457 You, S., & Sharkey, J. (2009). Testing a developmental-ecological model of student engagement: A multilevel latent growth curve analysis. Educational Psychology, 29

(6), 659–684. https://doi.org/10.1080/01443410903206815 Zhen, R., Liu, R. De, Wang, M. T., Ding, Y., Jiang, R., Fu, X., et al. (2020). Trajectory patterns of academic engagement among elementary school students: The implicit

theory of intelligence and academic self-efficacy matters. British Journal of Educational Psychology, 90(3), 618–634. https://doi.org/10.1111/bjep.12320

M. Saqr and S. López-Pernas

  • The longitudinal trajectories of online engagement over a full program
    • 1 Introduction
    • 2 Background
      • 2.1 Engagement as a construct
      • 2.2 Engagement in online learning
      • 2.3 Person-centered methods
      • 2.4 Engagement as a heterogeneous process
      • 2.5 The trajectories of engagement
      • 2.6 Modelling trajectories of engagement
      • 2.7 Motivation for this research
    • 3 Methods
      • 3.1 Context
      • 3.2 Data collection and operationalization
      • Frequency of activities
      • 2.4 Online time
      • Activity and regularity
      • 3.3 Data analysis
        • 3.3.1 Clustering of engagement states
        • 3.3.2 Sequence mining
        • 3.3.3 Modelling and studying the trajectories of engagement
        • 3.3.4 Survival analysis
    • 4 Results
      • Mostly-engaged trajectory
      • Intermediate trajectory
      • Troubled trajectory
    • 5 Discussion
    • 6 Conclusions and implications
    • 7 Limitations
    • Ethical approval
    • Appendix A Supplementary data
    • Authors’ contributions
    • References