Description

reddygs17
ResearchArticle5.pdf

An Agile Methodology for the Disaster Recovery of Information Systems Under Catastrophic Scenarios

COREY BAHAM, RUDY HIRSCHHEIM, ANDRES A. CALDERON, AND VICTORIA KISEKKA

COREY BAHAM (corey.baham@okstate.edu; corresponding author) is an assistant professor of management science and information systems (IS) at Oklahoma State University. His research focuses on agility in IS development, systems recovery, and firm dexterity. His work has been published in Communications of the AIS and major IS conference proceedings.

RUDY HIRSCHHEIM (rudy@lsu.edu) is the Ourso Family Distinguished Professor of Information Systems at Louisiana State University. He was previously on the faculties of the University of Houston, Templeton College–Oxford, and the London School of Economics. He was given the LEO Award for Lifetime Achievement by the Association for Information Systems. He is senior editor for the journal Information and Organization and on the editorial boards of Information Systems Journal, Journal of Management Information Systems, Journal of Strategic Information Systems, and others.

ANDRES A. CALDERON (calandres@gmail.com) has 25 years of diversified technology experience at both large and small enterprises; technical personnel management capability at different levels of the organizational structure; project management, and enterprise system administration experience. He also has vast experience in aligning technology to corporate vision and extensive background in business development.

VICTORIA KISEKKA (vkisekka@albany.edu) is an assistant professor at the State University of New York at Albany. Her research interests include information assurance, organizational resilience, health-care information technologies, and dis- aster recovery and response. Her work has been published in Computers in Human Behavior and various information systems conference proceedings.

ABSTRACT: We explore the use of an agile methodology for improving the recovery of complex systems under catastrophic scenarios. Our adaptation of Kanban presents a novel, agile approach to overcoming the unique challenges that organizations face during disaster recovery. An action research study approach is employed to test the

Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/mmis

Journal of Management Information Systems / 2017, Vol. 34, No. 3, pp. 633–663.

Copyright © Taylor & Francis Group, LLC

ISSN 0742–1222 (print) / ISSN 1557–928X (online)

DOI: https://doi.org/10.1080/07421222.2017.1372996

implementation of Kanban during a complex scenario at a large enterprise. The findings suggest that an adaptive and flexible methodology is required for an efficient disaster recovery in confronting unintended and cascading consequences. This research offers several contributions. First, to our knowledge, this is the first study to detail an approach for disaster recovery using an agile methodology. Second, this study uses a new combination of classic, canonical, and dialogical action research approaches to conduct the first empirical test of the effectiveness of an agile approach during an actual disaster recovery event. Third, in response to this Special Issue, the aforementioned research approach discusses the relationships between information systems researchers and research clients, demonstrating how action research can lead to improved organizational situations.

KEY WORDS AND PHRASES: action research, agile project management, catastrophic scenario planning, IS disaster recovery, Kanban.

Disaster recovery (DR) and business continuity planning is one of the top concerns for information technology (IT) executives [24] because of the increasingly detri- mental effects of IT downtime on a firm’s reputation, its ability to conduct business, and ultimately its survivability. According to a survey by Computer Associates Technologies, IT downtime cost over $26 billion in lost revenue in 2010 [20]. One compelling reason for this loss is the increased dependency on technology found in today’s firms. The increase in productivity and efficiency afforded by technological tools has also increased a firm’s dependency on its technical infra- structure, which has led to an increase in a firm’s sensitivity to IT interruptions. Despite advances in technology such as high-availability storage area networks, self- healing virtual machine environments, and cloud computing, which have drastically reduced system recovery times [28], the IT DR practice still lacks a methodology to recover complex information systems (IS) in the wake of a catastrophic event.1 The DR practice, which dates back to military command and control (C2) doctrines (see Online Appendix A), largely assumes that decision making and authority need to be centralized, includes the tendency to over-detail, and treats adaptation as dysfunc- tional or harmful. Moreover, the continued use of traditional DR approaches has ignored the necessity of a distributed and adaptive response and recovery during a catastrophe [15], despite the need for adaptive and agile processes for the changing faces of disaster response. This begs the question: Have IT executives overempha- sized the resiliency of the technologies while neglecting the need for a systematic DR approach that can increase team readiness and expedite the recovery of complex IS? In this study, we explore the use of agile methodologies in providing a systema- tic and holistic approach to DR orchestration. Within the organizational context, agile capabilities have been highlighted in

software development methodologies, which also have parallels to IS recovery methodologies [3, 37]. Since the increase in the complexity of IS after the growth of the Internet, software development methodologies have focused on becoming more agile in order to meet the rapid changes in user requirements. Similarly, we

634 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

posit that dynamic organizational DR scenarios, which are characterized by cascad- ing consequences in a complex environment, should consider approaches that are more agile. Thus, the main research question that guides this study is: How can the use of agile project management methodologies improve disaster response and recovery (orchestration) efforts? To address the research question, we draw upon the extant literature to identify useful agile methodologies for the DR practice, which have the potential to improve the delivery of project requirements. In this study, a new agile approach to DR was adapted and tested using an action research approach to study the IT DR practice of a large enterprise.

Theoretical Foundation

Disaster Recovery Literature

We define disaster recovery as a subset of business continuity planning, which focuses on the process of “creating and executing a plan for how an organization will resume partially or completely interrupted IT, organizational, or business critical functions within a predetermined time after a disaster or disruption has occurred” [27, p. 1]. In this study, we focus on the DR of complex IS in organizations. Online Appendix B contains key terms related to IS DR, including complex DR (C-DR), which describes the environment in which the recovery process takes place. For simplicity, we will refer to the recovery efforts that take place within the C-DR context as DR. Methods of IS recovery largely focus on the use of IS recovery tools and the

acquisition of sophisticated hardware, while neglecting the wider context concerning how these tools integrate with existing processes. Within the DR context, the complexities presented by the interdependencies among interrelated IS pose serious challenges to DR orchestration. In particular, we identify the critical factors in executing DR orchestration with efficiency as the need for teams to be flexible and responsive to changing circumstances, maintain a shared common operating picture, and maintain a strong focus. In the extant literature, a few DR researchers attempt to study DR at the organizational

level by investigating the antecedents of effective DR.2 The demonstrated correlates of effective DR include planning organizational size, management support, internal and external collaborations, an organization’s financial condition, severity of the disaster, and economic climate [23, 25, 44]. Other research primarily focuses on developing prototypes and modeling techniques for managing disasters [34, 43, 50]. Despite these developments, there is still a dearth of seminal work in the area of DR. To our knowl- edge, there are no comprehensive methodologies for effectively managing complex DR efforts at the organizational level. This lack of holistic approaches for managing complex disaster environments has been observed previously [3, 11, 31]. An observable limitation in the discussed works is that they mainly focus on people or technology without consideration of a systematic approach for understanding relationships or the

DISASTER RECOVERY OF INFORMATION SYSTEMS 635

processes critical for successful DR. In particular, the linkage that exists between IT, people, policies, and processes is not addressed. Prior observations also point out that there is a general lack of interoperability of existing DR solutions [10, 11]. In fact, the popularly referenced four-stage model of disaster management, which is often the basis of existing solutions, has been shown not to be a good representation of reality [8]. With this in mind, we refer to elements of the four-stage model only as a common language for describing DR orchestration activities (response and recovery), which occur simul- taneously, without suggesting that its isolated stages are an adequate representation of the DR process. Lastly, while several disaster management IS have been proposed, to our knowledge, there are no known deployments and usage of such IS in the industry.

Agile Methodology Literature

To address the aforementioned needs of a DR environment for DR orchestration in organizations, we began by identifying the need for agility in the DR practice.3 Extant research indicates that the concept of agility first appeared in the mainstream business literature in the early 1990s [18]. Prior studies explore the concept of agility concerning manufacturing, management, product development, and other business research devel- opment [45, 48]. Despite the contributions by these fields, the term “agile” became widely popular after the advent of the Agile Manifesto [7] in 2001, which described a new approach to building software. Although we recognize that the roots of agile project management methodologies stem from fields both inside and outside of the business literature [47], our motivation to study the recovery of complex IS leads us to examine the concept of agility primarily within the software development context.

Theoretical Lens: Parallel Between DR Needs and Agile Practices

We examine agile methodologies for capabilities that might overcome the complex- ities identified by a DR environment. There are several benefits of agile methodol- ogies in completing organizational projects, including adaptability, flexibility, and project visibility [26]. For the DR practice, the need for a highly adaptive methodol- ogy with the ability to cope with sudden or frequent changes is critical to minimizing the downtime of complex IS [19]. Unfortunately, traditional DR approaches have negatively impacted the necessity of a distributed and adaptive response and recov- ery during a catastrophe [15]. The extant literature both inside and outside of the military domain (i.e., flooding, nature disasters) emphasizes situational awareness and collaboration to facilitate a response to dynamic recovery situations [22, 51]. Agile methodologies use short feedback cycles to provide timely and frequent updates, which are critical during the DR response and recovery. The focus of effort through a common operating picture is vital in facilitating situational awareness [13]. In addition, the ability to maintain focus on critical activities, allowing teams to self- manage toward resolving bottleneck issues, and the inherent operating picture presented is critical in DR and promoted by methodologies such as Kanban [1].

636 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

Many critical components, such as the ability to respond to ongoing emergency conditions, the coordination of recovery teams, and access to adequate communica- tion mediums, play a part in successful recovery efforts [34]. To rebuild entire information systems quickly, coordination among multiple teams must align with recovery opportunities and business priorities. Therefore, communication and colla- boration between and within DR stakeholders at all levels, horizontally and verti- cally, are vital to achieving high levels of DR orchestration efficiency. In summary, agile principles complement DR needs of adaptability, situational awareness, focus of effort, and orchestration efficiency. In this study, we use the twelve principles behind the agile manifesto [7] along with Conboy’s [12] definition of information IS development (ISD) agility to ground our notion of agile project management. Table 1 compares these twelve principles (represented by the numbers enclosed in parentheses) with the four key challenges of DR orchestration defined above. Thus, drawing on the extant literature, we seek to answer our research question as

posed in the introduction: How can the use of agile project management methodol- ogies improve disaster response and recovery efforts? To address this question, we studied the DR program of a large enterprise using an action research approach.

Action Research Approach

Action research (AR) has been defined as “research that involves practical problem solving which has theoretical relevance” [33, p. 12]. AR is particularly helpful in addressing both rigor and relevance by applying scientific research in the setting of a real-world problem. In contrast to traditional research approaches, the action researcher is actively engaged in the creation of organizational change. AR has its roots in the philosophy of pragmatism where the focus is on practice, and in particular, the outcomes or consequences of practice [39]. In the IS domain, Baskerville and Myers [5] cogently articulate the connection between pragmatism and AR. AR differs from case study research in that the former is directly involved with helping the organization to learn by conducting one or more experimental solutions [6]. In AR, findings are reflected upon and used in subsequent iterations. The IS literature contains guidelines for conducting AR [2, 4, 6], including those specific to several types of AR such as canonical [14, 29] and dialogical [30], which we combined in this study. A key feature of AR is its reflective and iterative cycle. In addition, AR

approaches commonly consist of two main parts, the action and the reaction. The action is the organizational intervention, which attempts to remedy the real-world problem, and the response is the response to the experimental stimulus. In this study, we incorporate dialogical AR [30] into Baskerville and Wood-Harper’s [6] AR cycle, which has been further formalized as canonical AR [14]. First, Baskerville’s five-phase AR cycle provides a framework by which practitioners and researchers work together to remedy a problem. Using this five-phase cyclical process, the research team (1) diagnoses the underlying causes of the organization’s desire for

DISASTER RECOVERY OF INFORMATION SYSTEMS 637

change, (2) specifies organizational actions, (3) takes the action, (4) evaluates the action’s outcomes, and (5) identifies knowledge gained during the process [4]. Second, dialogical AR recognizes the respective historical and social contexts of the researcher and the practitioner [30]. We selected dialogical AR, among several AR approaches in the extant literature [2, 6], because it is iterative, collaborative, and separates expertise into two separate entities: the researcher’s expertise and

Table 1. A Comparison of Disaster Recovery Needs and Agile Principles

DR needs Agility definition and principles

Adaptation and agility Definition Facilitate the creation of change (1). Agile Principles Welcome changing requirements, even late in development.

Agile processes harness change for the customer’s competitive advantage (2).

Situational awareness Agile Principles Business people and developers must work together daily

throughout the project (4). At regular intervals, the team reflects on how to become more

effective, then tunes and adjusts its behavior accordingly (12). Efficient orchestration Definition

Contains a methodology component that also contributes to perceived economy, quality, or simplicity but should not perform poorly in any of the three (2).

Continual readiness of the methodology component (3). Agile Principles Deliver working software frequently, from a couple of weeks to

a couple of months, with a preference to the shorter timescale (3).

Business people and developers must work together daily throughout the project (4).

Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely (8).

Simplicity—the art of maximizing the amount of work not done —is essential (10).

The best architectures, requirements, and designs emerge from self-organizing teams (11).

At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly (12).

Our highest priority is to satisfy the customer through early and continuous delivery of valuable software (1).

Focus of effort Agile Principles Continuous attention to technical excellence and good design enhances agility (7).

Notes: There are twelve principles behind the agile manifesto [7]. The numbers inside the brackets indicate which of the twelve principles behind the agile manifesto is being referenced.

638 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

practitioner’s expertise. Based on Schultz’s [42] concept of the scientific attitude and the natural attitudes of everyday life, dialogical AR treats these concepts as “quali- tatively different categories of knowledge and reasoning, each category being dis- tinguished by a dependent on its own social context” [30, p. 513]. Moreover, the one-on-one dialogue between the researcher and practitioner fit well with the research agreement and time schedules of the research team. Furthermore, dialogical AR fits well with the elements found in the AR cycle [30]. We drew the conceptual basis of our research approach from dialogical AR, while following the cyclical and iterative process of canonical AR. There were additional and practical reasons why an AR approach was appropriate for this study, which we discuss in Online Appendix C. Following Mårtensson and Lee [30], we refer to theory (theoria) as the world of scientific knowledge that is the basis of the researcher’s expertise. Here, the scientific knowledge on agile methodologies is used in developing an action plan in combination with the practitioner’s expertise (praxis). Such knowledge was used as the basis of the DR intervention. The results of each DR intervention were reflected upon and used in preparation for subsequent interventions as needed.

Empirical Setting of the Study

Research Setting

The research project emerged as part of the initiative of the host company, hereafter referred to as Alpha, to assess the appropriateness of agile approaches to DR orchestration. The wider IT unit of Alpha served as context to test the logic of a generic, agile methodology for the DR of complex systems. Alpha offers health-care services in the United States. Alpha has three data centers with an array of technol- ogies such as mainframes, midranges, and WINTEL platforms to support its offer- ings. Critical infrastructures at the data centers have N + 1 resilience that ensures system availability in the event of component failure. Alpha’s IT team has over 300 staff members that manage over 2,000 servers and support over 300 applications. Alpha has developed a mature DR practice with comprehensive DR plans. The research team consists of a lead researcher, who was previously employed by

Alpha and was the primary driver of the project, a lead practitioner, and two additional researchers, who provided insights from extant literature. The lead researcher is an academic action researcher, who spent approximately 200 hours working with Alpha’s DR department. The lead practitioner is Alpha’s DR manager, who works full-time with the company and is responsible for developing DR plans and leading all IT DR efforts. During the project, the DR manager hired contractors to help with the DR planning. Alpha’s IT division had largely neglected DR planning until a catastrophic event

decimated the region near to Alpha’s headquarters. Executives soon raised concerns over the company’s ability to survive a natural disaster. Despite hiring a DR manager, the sheer size of Alpha’s IT infrastructure, the constant IS changes to

DISASTER RECOVERY OF INFORMATION SYSTEMS 639

both hardware and software across multiple departments, and changing personnel rendered the DR manager unable to keep documentation updated at the rate of Alpha’s organization change. The company still needed an effective way to coordi- nate the efforts of separate departments during a DR scenario. In search of meth- odologies that respected the changing nature of DR efforts, the DR manager began to hear about and eventually research agile methodologies. Eventually, Alpha con- tacted the lead researcher to learn about how agile methods might be leveraged and implemented to improve the recovery time and efficiency of DR. A client–researcher agreement was made between the lead researcher and Alpha, which committed to developing a framework for the use of agile methodologies to improve DR orches- tration. The lead researcher was expected to work with Alpha’s agile work group to understand what works and what does not work at the company. This agreement initiated a four-month effort, May 2014 through August 2014, to improve DR orchestration using agile methodologies. This effort was later extended one month.

Data Collection Details

To produce a rich understanding of the project context, multiple data sources were used to triangulate findings including direct observation, DR documentation analy- sis, group meetings, semistructured interviews, and postmortem reports. The role of the lead researcher was that of an observer. He observed the interactions between Alpha’s DR department and other IT departments to gain insights into the behavior, interactions, and information that may not have been reported during the interviews. The lead researcher took notes of incidents and observed practices as well as direct and indirect influencers of Alpha’s DR program. The lead researcher gained knowl- edge of Alpha’s DR program by reviewing the DR department’s documents such as reports, manuals, requirements documents, and lessons learned. The lead researcher participated in group meetings both with the DR department and with IT executives, who provided feedback on the framework as it was being developed. In addition, the lead researcher conducted two formal interviews, which were recorded. The inter- view script is shown in Online Appendix D. A postmortem was conducted after each implementation of our DR framework. The feedback gathered after each intervention was reflective in nature. Feedback

was gathered from IT staff that participated in the DR events, which the research team used to refine the DR methodology. Both the lead researcher and the lead practitioner reflected on and analyzed the data. A summary of the data sources is presented in Table 2. The majority of the coding procedure was conducted at the completion of each cycle. The coding scheme focused on a number of core themes that related to DR needs and later evolved into additional themes [41]. The lead researcher then wrote a case narrative based on the data sources and reflective dialogue with the practitioners. Following Mårtensson and Lee’s [30] dialogical AR, the case descriptions present different perspectives whereby readers are able to make their own interpretation of the DR events described.

640 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

T ab le

2 . D at a C o ll ec ti o n In fo rm

at io n

M et h o d

S o u rc e

D at es

D ir e ct

o b se

rv a tio

n D R

m a n a g e r, D R

co n tr a ct o rs

4 /1 5 – 9 /3 0

D R

d o cu

m e n ta tio

n a n a ly si s

D R

p a p e rw

o rk

4 /1 – 9 /3 0

G ro u p m e e tin

g —

D R

d e p a rt m e n t

D R

m a n a g e r, D R

co n tr a ct o rs

(1 h o u r e a ch

) 6 /6 , 6 /1 3 , 7 /2 4 , 8 /1 9 , 9 /3 0

A g ile

co a ch

, D R

m a n a g e r (3 0 m in .)

7 /2 4

In te rv ie w

A g ile

co a ch

(1 h o u r)

7 /1 1

P o st m o rt e m — F ir st

e ve

n t

D R

m a n a g e r

8 /9

D ir e ct o r, IT

N e tw o rk

O p e ra tio

n s C e n te r (N

O C )

S ys

te m s E n g in e e ri n g A rc h ite

ct S U P V , IT -C

o m p u te r a n d N O C

O p e ra tio

n s

N O C

e n g in e e r

S r. N O C

e n g in e e r

M a n a g e r, IT

P ro d u ct io n S u p p ly

a n d N O C

M a n a g e r, IT -S ys

te m s S u p p o rt

P o st m o rt e m

A — S e co

n d e ve

n t

M a n a g e r, IT

S e rv e r E n g in e e ri n g

9 /2 1

IT m a n a g e r

9 /2 2

V P , A p p lic a tio

n S e rv ic e s

G ro u p m e e tin

g —

E xe

cu tiv e s (2 0 m in .)

C IO

1 0 /1 0

D ir e ct o r, T e ch

n o lo g y O ff ic e

V P , E n te rp ri se

In fr a st ru ct u re

P o st m o rt e m

B — S e co

n d e ve

n t

M a n a g e r, IT -S e rv ic e M a n a g e m e n t

1 0 /2 9

D ir e ct o r, IT

N e tw o rk

O p e ra tio

n s C e n te r

D R

m a n a g e r

In te rv ie w

D R

m a n a g e r (1

h o u r)

1 1 /6

DISASTER RECOVERY OF INFORMATION SYSTEMS 641

A 5-Month, Two-Cycle Dialogical AR Project

The five-phase AR cycle, taken from Susman and Evered [46] and later Baskerville [4], consists of diagnosing, action planning, action taking, evaluating, and specifying learning.

First Cycle, 4 Months

Diagnosing: The entry point of this diagnosis came during the initial meeting between the lead researcher and two of Alpha’s managers. The DR manager provided on overview of the problem and the initial research that he conducted regarding the use of agile methodologies for disaster management, emergency management, business continuity, and DR. Through his initial research from Gartner [26] on agile adoption, the DR manager believed that there was an align- ment between the needs of DR orchestration and agile methodologies, however, no references were found in the related literature concerning this relationship.

Action Planning: Following the diagnostic phase, the lead researcher worked closely with the IT leaders to conduct a systematic exploration of the opportunities and benefits of applying agile methodologies to Alpha’s DR program, specifically, the area of DR orchestration. The lead researcher synthesized findings from the literature and consulted with certified agile coaches to identify best practices for scaling agile methodologies. As a result, matching concepts between agile meth- odologies and DR needs were identified and a high-level framework was drafted in July 2014, which considered the challenges of large enterprises and catastrophic scenarios. A number of scaled frameworks were identified and carefully examined (e.g., Leffingwell’s Scaled Agile Framework). Ultimately, the lead researcher deter- mined that adapting Kanban, an agile methodology, aligned best with the goals of Alpha’s DR program. Although Kanban implementations may vary in complexity, the Kanban methodology is based on three basic principles of visualizing workflow, limiting the amount of work in progress, and managing workflow [1]. Additional details concerning what the methodology entails can be found in the references provided [1, 38, 45] as well as in the results section. The Kanban methodology fit well with the Alpha’s methodological preferences of simplicity, project visibility, adaptability, and flexibility. Online Appendix E provides a detailed justification for choosing a Kanban-based methodology for DR orchestration. In response to the research presented, the DR manager found an open-sourced Kanban board online, which he familiarized himself with in preparation for use during a disaster. Table 3 illustrates an example of how the perspectives of the research and practitioner worked toward a mutual understanding. We discuss how the development of an agile framework for DR was perceived and solved from a practical and research perspective using the framework from Mårtensson and Lee [30]. Online Appendixes F and G include additional illustrations.

642 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

Table 3. Developing an Agile Framework for Disaster Recovery

Practitioner’s perspective Researcher’s perspective

The practitioner saw the DR practice as an agile practice in the sense that DR requires a unified effort and the adaptation to change.

The researcher saw the DR practice as one that could adapt waterfall, agile, or other project management methodologies [40].

From the DR manager’s perspective, the need for agility was self-evident to those who were heavily involved in DR efforts. Thus, he saw agile principles being overlaid onto an existing DR framework. Therefore, he challenged the researcher to gain an understanding of COBIT, ITIL, and other IT frameworks, which he thought would help ground the agile DR framework. These IT frameworks were considered best practices for IT processes by Alpha’s IT managers.

The DR manager showed the lead researcher a framework for DR orchestration that he had developed. Although he struggled to explain the inherent agility of the DR practice to the other IT managers, he felt that the agile project management literature would contain existing frameworks and the vocabulary to articulate what he had trouble communicating.

From the researcher’s perspective, agile methodologies could be applied to DR to improve DR orchestration at Alpha. However, he saw a potential problem with introducing agile methodologies in Alpha’s DR practice. Attempting to fit agile approaches into traditional IT governance and service frameworks is problematic, because agile principles focus on producing working process over comprehensive documentation [7]. According to the extant literature, the incompatibility between an organization’s cultural assumptions and the assumptions built into the methodology have negative impacts on implementing process improvements [9, 35].

Dialogue and Action

In the reflective dialogue between the researcher and the practitioner, they discussed their different conceptualizations of applying agile methodologies to DR. From the practitioner’s perspective, the agile framework needed to be comprehensive and provide clarification concerning the stakeholder roles of a DR orchestration in a large enterprise, based on industry best practices as defined by COBIT and ITIL. Such a framework would be respected by Alpha’s managers who scrutinized the idea of applying agile approaches to DR. From the lead researcher’s perspective, the agile practice aims to develop simple, lean frameworks that guide organizational processes with flexibility. Because of the practitioner’s desire to capture a high level of detail, the researcher suggested taking an incremental approach to implementing agile instead of attempting to adopt a feature rich agile methodology wholesale. The practitioner found the Kanban methodology to be simple as the researcher suggested, yet scalable enough to include important DR roles and responsibilities. Thus, these two perspectives were merged into a mutual understanding through reflective dialogue [30].

Notes: Although reflective dialogue between the practitioner and the researcher occurred throughout the research cycle, we limited the number of reflective dialogue illustrations to three because of space limitations.

DISASTER RECOVERY OF INFORMATION SYSTEMS 643

Action Taking (t = 1): On August 9, 2014, the secondary data center of Alpha experienced an unexpected power loss during the testing of generators conducted by the hosting provider. Immediately, the DR plan was activated and teams were notified of the event, but the DR manager confronted some serious challenges. As he was assessing the damages caused by the power loss, he was bombarded with requests for updates. The chief information officer (CIO) wanted to know how long it would take to get the systems up and running. The chief operations officer (COO) wanted to know when operations would resume. The human resources (HR) director wanted to know whether to tell employees to report to work the next day. The technical teams wanted to know what the next steps were and where they could find team members to replace those who were affected by the disaster. As calls, e-mails, and text messages poured in, the DR manager realized that following the DR Plan in its original form was in jeopardy because of the increasing uncertainty and reports concerning the extent of the damages. Not only was the internal phone system down, but also cell phone reception was poor in the data center. Moreover, e-mail was rendered unusable because alerts generated by Alpha’s system monitoring service clogged the team’s inboxes. In an attempt to streamline communication, the DR manager implemented Kanban

boards via an open-sourced Kanban board in parallel with the use of set recovery sequences established for the recovery of the secondary site. This application allowed the DR manager to upload the recovery sequences from the DR cadence into a project backlog from which work items were reprioritized as new information became avail- able. Although the open-sourced application was restricted to a few users, soon separate calls, texts, and e-mails were replaced by a single Kanban board, which helped those involved to visualize the workflow while maintaining situational aware- ness across the IT department. The use of Kanban was limited to four participants and the lanes of work (Kanban columns) were broken down into eight categories as shown in Figure 1. The columns listed the actual tasks in the recovery sequences for the secondary data center. Members of the DR team completed work items in accordance with their roles and responsibilities. During the recovery, the DR manager added the “Critical Activities” column for work items that needed special attention. As hardware failures were discovered during recovery, recovery sequences were reevaluated and the DR orchestration adjusted within the Kanban board. After all pertinent work items were completed, power was restored to all systems.

Evaluating (Record Findings of t = 1): Following prior research, this phase necessitated the comparison of pre- and post-intervention states [29]. The people, processes, and technology involved in the DR effort were evaluated. In addition, qualitative feedback was solicited from key informants in the form of postmortem questionnaires and interviews. For instance, a one-hour formal interview was con- ducted with the DR manager shortly after the power loss in order to understand how useful the Kanban methodology was in orchestrating the recovery of the system. Findings were discussed and recorded. Overall, the outcome of the intervention seemed positive (as illustrated in Online Appendix F). Following the DR event, the DR manager described the benefits of using Kanban to Alpha’s management: “The

644 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

Kanban Boards give the opportunity for resource optimization and alignment of effort, allowing leadership to engage in a meaningful way, for the agile adaptation of priorities, and to clearly communicate at a glance to all involved.”

Specifying Learning: While learning was ongoing across all phases, we specify knowledge gained during this final phase. First, from the researcher’s perspec- tive, not only did the use of agile methodologies seem to match DR needs, as the researchers conceptualized from the literature, but the severity of the DR scenario motivated the DR team’s sense of urgency. We learned that while the ability to adapt to change and collaborate effectively were important in overcoming the classic challenges of traditional approaches, a noteworthy advantage of applying agile methods (Kanban) to DR was the facilitation of continuous delivery [1], which was key for an efficient DR orchestration. Specifically, team members found that the use of a web-based Kanban board improved project visibility, monitoring, and the overall focus of effort in a DR scenario. Second, the use of a web-based Kanban board overcame the spatial boundaries of using a physical board. As such, a web-based Kanban board was found to be appropriate for the distributed nature of the DR activities in a large enterprise, as not all pertinent stakeholders were able to assemble around a single physical board in the after- math of a catastrophic event. Third, each DR scenario presents unique chal- lenges. Therefore, the methodology would have to strike a balance between providing the guidance needed for a disciplined delivery and also being able to adapt to different scenarios. From the practitioner’s perspective, the DR manager found that Alpha’s printed

DR Plans were not as flexible as the Kanban boards to adjust to the realities of the disaster. Thus, the value of the plans was limited to developing the initial project backlog, while the Kanban boards were useful in orchestrating the recov- ery. Additionally, the DR manager identified incidents during the recovery that

***Sensitive information has been blurred for confidentiality

Figure 1. Kanban Board—Unexpected Power Loss.

DISASTER RECOVERY OF INFORMATION SYSTEMS 645

had not been clearly identified, for which exceptions or workarounds had to be developed. Incidents in the critical path of the recovery had to be resolved through a workaround. Resolution for incidents that were not in the critical path could be postponed, generating an exception in the recovery sequence. An example of an incident is hardware failure such as failed drives; failed drives that were not in the critical path of recovery needed to be handled through normal operations. As a result, the DR manager decided to make the following changes to the application of the methodology. First, the Issues column was added to capture all the tasks that would need to be executed for each incident. The DR manager opted to use the Issues column to prevent confusion between DR activities (Issues) and the Information Technology Infrastructure Library (ITIL) operational activities (Incident). Once all DR tasks were completed, the identified issues were entered into the IT Service Management System for resolution through normal operations. Second, the Critical Activities column was moved next to the To Do column, so participating teams could maintain focus on the mission. Overall, the result of implementing a visual artifact (e.g., Kanban board) to guide

workflow led to Alpha’s IT departments to feel more comfortable with using an agile approach for future DR events. After the systems were restored, the company began to critically discuss the comparative advantages of using a Kanban system compared to traditional DR approaches.

Second Cycle, 1 Month

The perceived success of the first intervention led to Alpha’s adoption of Kanban for its next DR effort. The initial four-month client-research agreement was extended for another month. The next event involved using Kanban to orchestrate the recovery of Alpha’s IS after a planned shutdown of its entire system.

Diagnosing: On September 20, 2014, the replacement of one of two faulty automatic transfer switches (ATS) required a full data center and corporate phone systems shutdown for approximately five hours. Alpha had little to no lag time to replace the ATS given the current threat profile of forecasted thunderstorms that could cause an unexpected power loss. While Alpha maintains N + 1 redundancy through two power switches, at this time both switches were faulty and required immediate repair.

Action Planning: After soliciting feedback from the IT managers involved in the systems recovery effort using Kanban, the DR manager sought a Kanban solution that was embedded in one of the company’s existing systems. Meanwhile, he learned about Trello [49], a software platform with Kanban functionality that was being piloted in another department. Trello offered a few enhancements over the open-sourced appli- cation used during the first event. These enhancements included: (1) the ability to add

646 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

vivid colors to differentiate tasks from one another, (2) a chat feature that allows team members to communicate with one another, (3) the ability to install on a mobile device, and (4) the ability to add more users. The DR manager decided to use Trello [49], as a secondary backup to the notification system. The DR manager engaged IT managers before the DR event began, and asked them to download the mobile version of the application. The night before the event, the Kanban boards were loaded into Trello using a modified version of the recovery steps listed in the DR plan. No Trello training was provided because the application was found to be intuitive to the users and familiar to those with prior experience with Kanban.

Action Taking (t = 2; Intervene Again Adjusting for Findings Revealed in t = 1): As Alpha’s systems were shutdown, the primary notification system unexpectedly failed, which made Trello (secondary) the primary notification system. Fortunately, Trello allowed the teams to share notifications across the leadership and staff, maintain situational awareness, and allow the DR team to sustain their response while the primary notification system was down. Users of the Kanban boards under Trello started to update their own tasks and to document the progress of activities. In an attempt to further maintain the team’s focus, the DR manager introduced new columns when needed and removed them when no longer used. Colors were also used to designate areas of responsibility to allow the teams to easily identify their tasks. The use grew organically over the seven hours of the power-up sequence with over 10 active users and 30 subscribers that remained informed from various locations. All systems were restored, tested, and validated. The operation environ- ment was fully restored with no incidents affecting Alpha’s operation on the next business day. The columns created for the recovery tasks are shown in Figure 2. All transactions from this event were automatically logged by Trello and were later reviewed by the teams as part of the DR lessons-learned process.

Evaluating (Record Findings of t = 2): The pre- and post-intervention states of the second intervention (t = 2) were evaluated, and were adjusted for findings discovered in the first intervention (t = 1). Again, qualitative feedback was solicited from key informants including interviews with the DR manager, IT-service management director, and IT Network Operations Center. Findings were discussed and recorded. Overall, the adjustments from what was learned during the first intervention, which led to the use of a more robust software platform, seemed to positively influence outcomes and improve the DR orchestration. For example, the Network Operations Center supervisor com- pared the use of the Kanban methodology to traditional DR approaches:

Overall, the time the PM [project manager] inputs into actually creating each task and board are minimal because of the time savings from traditional methods of manually keeping everyone updated and follow-up if issues arise. The tool allows the PM to focus on keeping the upcoming tasks on track, which helps the project progress smoothly.

We elaborate further on the improvements made over time in the next section.

DISASTER RECOVERY OF INFORMATION SYSTEMS 647

Specifying Learning: While learning was ongoing across all phases, we specify knowledge gained during the second cycle. First, from the researcher’s perspective, engaging IT managers before the DR event began allowed them to familiarize them- selves with the Kanban system. This allowed each department to load its own work items, which encouraged team autonomy and responsibility. The combination of more IT manager involvement, the use of the visual Kanban board, and the chat features in Trello encouraged project visibility, collaborative, coordination, and thus, a more efficient DR orchestration. From the practitioner’s perspective, the DR manager’s reuse of the Issues column was helpful for tracking incidents. Fortunately, Trello’s chat feature provided central communication for the team, insofar as many members had poor phone reception inside the data center. Thus, the DR manager was able to orchestrate the recovery more efficiently through the Kanban system within the soft- ware. After reflecting on the implementation, the practitioner determined that the addition of a chat feature in one of Alpha’s newest software tools was the only appropriate change needed for the software to function like Trello. Since this feature was due in an upcoming software release, the DR manager concluded that no further methodological changes were needed. After the completion of the second cycle, several senior managers, including former skeptics, lauded the use of our Kanban methodology and encouraged the team to present the findings of the project to all the IT managers. During the presentation, the vice president of Enterprise Infrastructure, stated:

“I have been with this company for 25 years, and I can say that this was the smoothest process we have ever had during an outage.”

Online Appendix G illustrates an example of how the problem of presenting the findings was solved.

***Sensitive information has been blurred for confidentiality

Figure 2. Kanban Board—Replacement of the Automatic Transfer Switches

648 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

Results—Kanban-Based Framework

Agile Methodologies and Disaster Recovery Orchestration

Our research question is: How can the use of agile project management methodol- ogies improve disaster response and recovery (orchestration) efforts? Following the use, reflection, adaptation, and reapplication of our Kanban methodology during two DR scenarios, we found that the use of agile methodologies can improve disaster response and recovery (DR orchestration) efforts. Our results show that agile methodologies improve disaster response and recovery efforts by improving adapt- ability, situational awareness, orchestration efficiency, focus of effort, and overall communication. In addition, the results suggest that the agile principles of project visibility and continuous delivery also play an important role in DR orchestration. Thus, our findings suggest a strong match between the application of the Kanban methodology and the DR contexts. Part of the AR process is to finish with lessons- learned principles [17]. These principles would provide the initial starting point for further AR projects.

Operationalizing Kanban Methodology Principles

As previously noted, the Kanban methodology is based on three basic principles of visualizing workflow, limiting the amount of work in progress, and managing workflow [1]. We discuss these principles in relation to our DR implementation.

Kanban Principle #1—Visualize the Workflow: The implementation of a visual artifact in the form of a Kanban board helped fulfill the DR team’s need for improved situation awareness as well as project transparency and visibility. First, the ability to provide situational awareness by providing frequent status updates as the work progresses was an important factor in the success of using an agile approach to DR. The Kanban board helped the DR team maintain a shared common operating picture and a reasonable level of situational awareness. Regarding the use of agile methodologies to improve situational awareness, the Network Operations Center supervisor explained:

The [Kanban] board helps with a visual of the multiple tasks assigned and interdependencies. It provides updates immediately if there are any issues that arise. Because everyone assigned to the project/board can see these updates, you instantly have all the support team available to help, which reduces time to resolve the issues, which help[s] streamline our processes. Before the [Kanban] board, this was a function of the project manager to contact and gather the appropriate team members together should an issue come up which is a longer process and takes longer to resolve issues with a traditional cadence project.

Second, project transparency and visibility [32] were emergent themes during our implementation. Our adaptation of Kanban provided a platform for improving

DISASTER RECOVERY OF INFORMATION SYSTEMS 649

situational awareness and focus of effort. The team was able to reap the benefits of greater project visibility as noted by the Network Operations Center supervisor: “[This process] was overall a great experience. The advantages of this process were the ability to ‘login from anywhere,’ [which] allowed me to view the near real-time status of the project from any device via Internet and the ability to provide feedback and ask questions in near real-time.” One of the IT managers added:

Just want to drop a note. Saturday ended up to be a very smooth power outage, in my opinion. I was able to keep it up by logging into Trello several times Saturday. Even without previously going over the cadence details, I was able to guess from the task updates in Trello and let my team know that the validation is coming (I guessed accurately two hours prior and sent the notification to my team to stand-by).

Kanban Principle #2—Manage/Enhance Flow: Efficient Orchestration. Once the steps for recovering the complex IS of Alpha were loaded into the software tool, the DR manager was able to orchestrate the recovery and adjust the sequence as necessary. Team members were able to see their assigned tasks and execute them using Kanban’s “pull” system. Team members also had the ability to move their tasks through the Kanban system and update their progress accordingly. Collaboration and coordination were simplified by the Kanban board, which dis- played the work backlog, the work in progress, and completed work. By using the communication feature within the tool, the usually cluttered phone line was clear, enabling the DR manager to provide situational awareness to specific stakeholders in a timely fashion. The Kanban tool improved upon previous efforts by enhancing the communication between the business and project teams. The DR manager commen- ted on the use of a Kanban board in improving the flow and team efficiency with the following statements:

When I define efficiency, I look at the critical path to full recovery. I look at the recovery time objective of our production environment. In this case, it would be the recovery time capability you have based on the people, pro- cesses, and the hardware that you have in place. The critical path for us kept on growing because we could not coordinate the efforts. Having the board allowed us to maintain the focus of what was the next critical step and this focus was provided from the person doing the work, the management level, and upper management—maintaining everybody aligned and informed with what we are trying to accomplish. That allowed us to reduce the timeline further even.

Quality Adaptation. Adaptation and agility were important factors of the DR team’s success. Two aspects of agility were observed during both DR events. First, the ability of the team to respond to environmental changes and quickly alter its course of direction played a pivotal role in the orchestration process. During the

650 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

postmortem after the first DR event, the DR manager pointed to the team’s ability to “change the cadence on the fly and to adapt to some of the realities that they were seeing in the data center that we did not expect,” which led to a successful DR effort. The ability to edit the Kanban boards played a key role in the team’s ability to

respond to environmental changes and continue to maintain the flow of the recovery. The DR manager elaborated:

The visual boards allowed us to quickly change activities around and being able to highlight the critical ones to all available. We had a column labeled as critical that we used to do this. The people that were using the tool were able to hone in on this. We were able to clear roadblocks that way, providing visibility to upper management of our efforts. So that went very well.

Second, the ability to quickly implement the methodology itself proved to be an agile process as well. Before the second event, the DR manager was able to set up a Kanban board prior to the DR event and had all pertinent stakeholders access the tool successfully. Using the tool, changes were managed systematically, users provided status reports, and efforts were communicated in a manner that was visible to the entire team. When asked about the team’s ability to adapt to change, the DR manager explained:

When we lost power unexpectedly on our secondary data center, we did not have anything in place, the systems crashed. We had our detailed recovery sequences ready to recover, but the testing of the generators was going to take 3 more hours, so we had a few hours to plan. That is when I decided to go ahead and implement the visual board, mostly for my own sake—to be able to manage the cadence in a more malleable way.

We were able to create task on the fly. We were able to remove and modify task and break dependencies, and all of that we could do in a very malleable way without having to make changes on a plan that we could not have shared because our systems were down.

Singular Focus. The use of a Kanban board improved the team’s ability to maintain a singular focus. During the second event, the DR manager used the chat feature on Trello to communicate easily with the team members by alerting them whenever it was time for them to act in accordance with the DR cadence. In similar previous events, the DR manager was being bombarded with different types of communica- tion from different types of stakeholders simultaneously. Calls (mostly voice mail since phones were down), e-mails, and text messages would pour in. The result was that the DR Plan in its original form became obsolete as the DR manager started the task of orchestrating the recovery, identifying which personnel or vendors were available to assist in restoring the IS, and gaining a better understanding of the impact to the technology and operation. The single point of focus minimized the normally chaotic event by providing a centralized hub for all the important informa- tion concerning DR orchestration. As a result, the teams were able to focus on the critical items as they saw them enter the queue. In addition, information concerning

DISASTER RECOVERY OF INFORMATION SYSTEMS 651

any changes that were made were posted and shared across the board, so the teams had awareness of those changes. The software provided the additional capability of tracking the history of activities as the work proceeded. The DR manager explained:

We continued the flow of the cadence. Then the server team got to their part of the cadence. They were very active in self-managing the steps. I had created the up next lane to give 15 min. warning to the teams. The app gave text notifications on their phone. Corporate phones systems were still not available . . . system by system—until we started following the normal flow that you would follow on a recovery event, but they continued to use the boards.

Overall, our results suggest that the use of Kanban may improve adaptability, situational awareness, orchestration efficiency, communications, focus of effort, project visibility, and continuous delivery in disaster response and recovery efforts.

Modifications

Although Kanban principles 1 and 2 were found to fit our DR context relatively well, we found principle 3—limit work in progress—less applicable to DR.

Kanban Principle #3—Limit Work in Progress: The application of Kanban in manufacturing sometimes uses Kanban cards, which signals to other team members that a given team has additional capacity and is ready to pull in more items [38]. However, in our DR context, teams were signaled by notifications from the DR manager according to the predefined cadence. In addition, most tasks were onetime actions rather than stocks of duplicate items. Therefore, we did not attempt to signal in the traditional sense of using a card system. Next, in ISD, work in progress (WIP) limits are put in place to prevent bottlenecks. We removed WIP limits for two main reasons. First, WIP is based on cycle times, which match the amount of WIP with a team’s capacity. Cycle times are determined during early iterations of the ISD implementation with a set team. As a contrast, our DR efforts were onetime, continuous efforts, in which team members were constantly changing. Thus, it was more advantageous to focus on enhancing flow through managing the cadence than measuring unstable cycle times. Second, because of the uncertainties previously discussed, the team was unable to determine the optimal WIP limit. Should it be three, five, seven, or more? Instead, the DR manager allowed the team to pull in as many work items as they felt comfortable with during a particular stage of the recovery. In the end, we were able to divide the Kanban board into separate lanes for each team using Trello, while allowing the DR manager to orchestrate the recovery according the preestablished cadence.

652 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

Discussion

Reflections on the Use of AR in DR

Upon reflecting on the details of our study, we believe that it is important to highlight some of the key lessons we learned by using AR, particularly as they relate our contribution to the following elements of this Special Issue: Discussions of relationships between information systems researchers and research clients (e.g., DR practitioners and executives) demonstrating how action research can lead to improved organizational situations. First, we learned that AR enabled the researchers to gain trust with the stakeholders

through our visibility and knowledge contribution. By employing AR, we were able to better understand and appreciate the company environment, which led to a much richer and robust dialogue with those involved in the DR project and in turn helped us in the fulfillment of our research objectives. In particular, we observed that the direct knowledge contribution of our framework for DR increased the relevance of IS research in the eyes of Alpha’s practitioners. In addition, finding support in the extant literature, our framework provided a more scientific approach to problem solving than traditional organizational consulting, which is motivated more by commercial benefit than science [4]. In our study, AR allowed us to examine Alpha’s DR practice within the constraints proposed by the company. The constraints of the project, namely, its limited contract duration (initially four months, later extended to five months) and its limited access to stakeholders, were challenges likely to have been insurmountable using quantitative methods. The lead researcher’s presence was felt during day-to-day operations and department meetings, and led to his contributions being valued. Trust was fostered, which led to an extension of the original contract, from four months to five months. AR may be a viable starting place for academics seeking to establish the relevance of their work with organizations in which they have not established legitimacy. In our case, the commitment to deliver business value was more effective in establishing a relationship with Alpha’s stakeholders who were not fond of completing surveys and had had no specific dealings with academics in the past. Second, we learned that AR could be useful for developing new frameworks in

conjunction with practitioners. This is especially useful in areas where IS knowledge is limited. In our study, we combined a more classic, canonical AR cyclic framework with dialogical AR. This relatively new combination was useful, as we sought to adapt and test the effectiveness of a framework commonly found in one IS domain in a different IS domain, and investigate an area in which we had little specific domain knowledge. The dialogical AR approach allowed us to leverage the DR expertise of the practitioners and the research expertise of the academicians, who had no a priori knowledge or hypotheses that could be used to solve the practical problem presented in this study. During this study, we found that not only were agile methodologies compatible with DR, but in many ways the DR teams showed little resistance to agile approaches, which appears to run counter to the ways many software development teams respond to new methodologies. AR allowed us to collaborate with the practitioners using one-on-one dialogue, which enabled us to

DISASTER RECOVERY OF INFORMATION SYSTEMS 653

evolve a framework that solved a practical problem and was theoretically relevant. This collaborative process not only aided our understanding Alpha’s corporate context and the work conditions of its stakeholders, but this also gave us the opportunity to revise our framework as we received new information. Overall, AR provides a mechanism for more collaborative work with practitioners and the vetting of an emergent theoretical model in a real-world setting. We believe that this kind of research approach will be valuable in many settings and wish to promote its acceptance in the IS research domain. Third, the use of AR was not without its challenges. While rewarding, we found

conducting AR to be more time consuming than traditional quantitative methods. The lead researcher not only had to interact with practitioners multiple times a week to develop a framework but also needed to invest significant amounts of time under- standing the organizational environment and its challenges. These interactions did not lend themselves to the type of unbiased, objective, and distant examination of phenom- ena that is so prominent within IS research. Instead, the researcher became recognized as a “part of the team,” which was necessary to understand the organization and gain further access. Furthermore, using AR is not the straightforward, template-driven, rule- following research approach associated with the orthodox functionalist methods. It is complicated, involves the need to be creative and adaptable in its application, and requires both the researchers and practitioners to trust one another. As we noted above, AR can facilitate the building of such trust; but concomitantly, it can blur the distinction between “fact” and “opinion.” Furthermore, using AR within the confines of this single organization leads to wondering whether the “results of the research” are generalizable to other organizations, or are simply the product of this specific DR experience. While we firmly believe that AR generated useful insights about DR, how well these insights will transfer to other DR environments, is of course an open question.

Theoretical Contributions

The effectiveness of dialogical AR can be evaluated based on whether (1) the practitioner considers the real-world problem facing him or her to be solved or satisfactorily remedied, (2) there is an improvement in the practitioner’s expertise, and (3) there is an improvement in the scientific researcher’s expertise [30]. We have demonstrated the effectiveness of our AR approach, which we will draw upon to discuss our theoretical contributions. Beginning with our theoretical lens (see Table 1), we summarize how our approach of AR in the context of DR impacts the initial AR cycle. This helps us to illustrate the theoretical contributions of our work more clearly. First, we have shown how our AR approach helped us deal with the unique needs

of Alpha’s DR program and consequently how agile methodologies can improve DR orchestration. We were able to identify four fundamental needs of the DR practice that were identified in the literature by agile principles. Agile principles provided a theoretical lens to examine Alpha’s DR program in the sense that it not only helped to describe DR needs but also questioned and identified contradictions in prior

654 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

approaches that led to subpar results [29]. As prior research points out, the fit among the elements in a DR methodology may affect recovery efforts [8]. The Kanban methodology provided visibility to work tasks while not requiring prescriptive roles and responsibilities that may not be available during a disaster. Our results suggest that agile principles are more compatible than those found in traditional C2 meth- odologies in addressing DR needs of adaptability, situational awareness, orchestra- tion efficiency, and focus of effort (as shown in Table 4). Both our cases show that the fit among DR people, agile processes, and agile technology (i.e., electronic Kanban board) positively affected recovery efforts. Second, we observed improvements in the DR manager’s expertise (praxis) of exist-

ing agile methodologies, how they could be leveraged in DR, and how their benefits could be presented to senior managers. For instance, the DR manager who had little knowledge of agile methodologies at the start of the project, provided a helpful sugges- tion about how to present agile outcomes to senior managers (see Online Appendix G). In addition, the DR manager identified the potential improvements that Trello provided and engaged staff members to familiarize themselves with the software before the planned outage. These examples demonstrate an improved understanding of agile methodologies and their application in DR scenarios. Similarly, we observed improve- ments in the DR manager’s understanding of AR throughout the study. Third, we used pattern matching to test whether there was an improvement in

the scientific researcher’s expertise [53]. The match between the pattern antici- pated by the theory refined after the first intervention and the pattern observed in response to the action of the second intervention constituted an improvement in the researcher’s expertise [53]. The key refinements applied after the first inter- vention were relaxing the Kanban WIP limits and using a mobile application for greater team flexibility. These results, which were excepted by Alpha’s execu- tives, support the validity of the researcher’s theory that agile methodologies can improve DR orchestration. Other contributions relate to the existing literature on DR and agile project

management. Our DR approach adapts and extends Kanban as an agile methodology for the DR context. Our DR approach addresses complexities introduced by com- pany size and dependencies related to complex IS. The results of each scenario indicated that the tailoring of formalized agile methodologies are appropriate for the DR practice as they address many of the shortcomings that still exist with traditional methodologies. The implementation of Kanban served as a proof of concept [36] for Alpha thus providing empirical evidence to support the parallel between DR needs and agile principles. To our knowledge, this implementation was the first of its kind. Thus, our work provides the DR practice with its first methodology that is tailored to improve the efficiency and effectiveness of DR orchestration using agile project management. Given the success of our agile implementation in a DR program, our findings are valuable to both IS and DR practices as they demonstrate a new way to think about DR orchestration. Using a visual inspired by Malaurent and Avison [29], Table 5 summarizes our theoretical and practical contributions, which are discussed in the next section.

DISASTER RECOVERY OF INFORMATION SYSTEMS 655

T ab le

4 . M et h o d o lo g y C o m p ar is o n

D is as te r re co v er y

A g il e m et h o d o lo g ie s

W at er fa ll m et h o d o lo g ie s

P ri o ri tiz e d d e liv e ry

P ri o ri tiz e d d e liv e ry

(a d a p tiv e )

P ri o ri tiz e d d e liv e ry

(p la n d ri ve

n )

P ro ce

d u re s w ri tt e n in

a st e p w is e , se

q u e n tia

l fo rm

a t

L ig h t d o cu

m e n ta tio

n H e a vy

d o cu

m e n ta tio

n R e sp

o n d in g to

o n g o in g e m e rg e n cy

co n d iti o n s

A d a p tiv e to

ch a n g e ; re sp

o n d in g to

ch a n g in g u se

r re q u ir e m e n ts

F ix e d sc

o p e

C o o rd in a tin

g te a m s

S e lf- o rg a n iz in g te a m s

T ra d iti o n a l p ro je ct

m a n a g e m e n t

R o le

o f te a m

le a d e r

F a ci lit a to r

M a n a g e r

F re q u e n t fe e d b a ck

lo o p s

F re q u e n t fe e d b a ck

lo o p s

S in g le

p a ss

C a d e n ce

It e ra tiv e w o rk

cy cl e s;

w o rk in g rh yt h m

S e q u e n tia

l; lin e a r p ro ce

ss T e a m

co m m u n ic a tio

n D a ily

st a n d u p s;

co lla b o ra tiv e m e e tin

g s;

re tr o sp

e ct iv e s

A s n e e d e d

T e a m

w o rk

e n vi ro n m e n t

C o lla b o ra tiv e

S ilo e d

P ro je ct

vi si b ili ty

H ig h

L o w

C o n tin

u o u s d e liv e ry

C o n tin

u o u s d e liv e ry

S in g le

d e liv e ry

656 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

Table 5. Practical and Theoretical Contributions Gained from AR Project

Stages Contributions

Diagnosing The formalization of this AR project was a great way to secure the mutual commitment of researchers and practitioners to develop a framework to improve DR orchestration.

It was also necessary to understand the causes of the organization’s desire for change and to establish the theoretical connection between DR needs and agile principles.

Theory Agile principles provided an appropriate theoretical lens to examine Alpha’s DR program in the sense that it not only helped to describe DR needs but also questioned and identified contradictions in prior approaches that led to subpar results. It also helped to question the consequences of prior DR approaches for Alpha. The Kanban methodology helped to coordinate DR people, processes, and technology throughout a dynamic DR effort.

Action planning

Given the lack of prior working history between the researchers and practitioners, AR was a great way to foster trust between the researchers and practitioners. AR also provided the structure needed for meaningful dialogue and reflection.

Dialogue In addition, a dialogical AR approach was useful for investigating the DR domain in which the researchers were unfamiliar. Through the iterations of dialogue and reflection, a Kanban-based agile framework for DR was developed that specified organizational actions after a disaster (1, 2).

Action taking DR needs were analyzed using agile principles as a theoretical lens, and a new Kanban approach was implemented, first using web-based application (1) and later using a mobile application (2).

Evaluating The benefits (1, 2) of our approach to DR and modifications (1) needed were identified. This led to modifications in Kanban workflow requirements (2).

Specifying learning

On reflection, we learned the following:

● That the DR context enhanced the team’s urgency and focus, which helped with throughput in a Kanban system (1);

● How relaxing Kanban WIP limits increased DR efficiency and flow; ● How web (1) and mobile (2) applications can overcome a team’s

spatial boundaries; ● That familiarizing teams with the application beforehand improves

their efficiency when using the application during a live disaster (2); ● How communication mechanisms built into the application overcome

issues with traditional means of communication, which may be com- promised during a disaster (2); and

● How a more feature rich and well-designed user interface improved interteam communication (2).

Notes: Modified from Malaurent and Avison [29]. (1) = Action cycle 1; (2) = Action cycle 2.

DISASTER RECOVERY OF INFORMATION SYSTEMS 657

Implications for Practice

This research also has several practical implications. First, the findings inform disaster planning managers of the need to incorporate agile principles in DR plans. Despite the unpredictable and catastrophic nature of disasters, DR practitioners have focused on creating and applying static plans to meet the needs of DR. The use of agile methodologies helps planners to conceptualize revisions to the set procedures commonly found in static plans. For instance, a manager may ensure that multiple team members are able to complete a single work item in the Kanban project backlog. Second, during disaster response, it is not uncommon for DR plans to get outdated,

making them ineffective when responding to disasters. The use of agile methodol- ogies facilitates revisions to DR plans that enable the response team to adapt to the unstable, tumultuous disaster conditions. Using Kanban allowed Alpha’s DR teams to adapt to having both limited personnel and evolving priorities. We recognize that organizations will not have all their resources available. Therefore, it is essential that they implement a solution that allows team members to join the DR as the situation allows. Historically, DR practitioners have relied on risk management, which by nature is

limited to known risk that the practitioners can anticipate, to address a wide range of possible scenarios. However, this is not realistic because organizations’ environ- ments are characterized by constant change due to changing business requirements, customer needs, and technology advancements. Therefore, we developed our frame- work to assist DR teams in determining the impacts of a disaster once the risks have materialized rather than solely analyzing known risks. Moreover, efforts to anticipate exact DR scenarios have been shown to be nearly

impossible, as recent disasters demonstrate. For instance, recent power outages due to an unexpected “500-year flood” [52] in Baton Rouge, Louisiana, caused a wide- spread AT&T cell outage affecting over 50,000 homes [21]. In light of such disasters, we recognize that change and adaptability are requirements for every DR effort and provide the DR practice with an agile approach. Third, the use of agile methodologies in DR provides the ability to make updates

in real-time, which increases the likelihood of a successful recovery. Agile DR methodologies provide the flexibility needed for effectively coping with the after- math of an unplanned disaster. These methodologies may be applied to any disaster scenario, including cybersecurity attacks. For instance, Britain’s National Health System was recently the victim of a cybersecurity attack that paralyzed operations across several hospitals [16]. An agile DR approach could be applied to this type of disaster as follows: The web-based Kanban boards would need to be created; the boards would detail specific information pertinent to the attack, such as, a list of affected systems, severity of impact to each system and to hospital operations (i.e., processes affected, number of users, etc.), DR team and their responsibilities, and contact information of stakeholders. The Kanban boards would need to be updated throughout the DR with minute-by-minute details of the attack, resource needs and

658 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

allocations, as well as new incidents and their severity. This would facilitate situation awareness, visibility of the overall DR process, and timely communication between responders. Other implications relate to the disciplines of DR and project management. Given

that DR approaches are heavily based on those of project management, the results of this study hold significant implications for both disciplines. The use of agile principles in DR broadens the application of agile methodologies from a project management perspective. The DR practice presents unique challenges for project management and highlights the need for project management frameworks that address continual readiness in a variety of ways. Our results suggest that the emergent factors of project visibility and continuous delivery play an important role in DR orchestration where teams with interchanging members work together in a onetime effort. Similar to the shift from waterfall to agile ISD, companies that develop project management systems should expand the features of such systems to accommodate the unique challenges of DR scenarios.

Conclusions

Limitations and Recommendations for Future Research

Like any study, ours has several limitations. However, these limitations offer future research opportunities. First, we only tested our DR agility framework at one U.S. site within a single industry. Our adaption of Kanban involved the tailoring of an agile approach to fit the needs of a specific DR program. Future research can test our methodology in multiple sites, contexts, or industries, which will allow for between- case analyses. Future investigation is needed in order to understand the differences in DR agility between firms of different attributes. Second, our study also contained a limited number of participants. Although we were able to garner feedback from a variety of IT managers, future studies could examine the phenomenon by including a more comprehensive set of stakeholders. Third, our project teams were limited by the security requirements of Alpha, a highly privatized company. Teams were not able to load information (i.e., actual server names and locations, employee informa- tion). Instead, they worked using a codified language. Given the sensitive nature of the information contained in DR activities, Alpha codified all server names, infor- mation on people, and proprietary information to refrain from potential exposure. Bringing the Kanban and agile tools in-house, a controlled environment, would allow teams to fully integrate the solution into the DR orchestration activities and further define best practices around other operational frameworks such as ITIL or COBIT. Fourth, we answered the research question we proposed here by adapting a simple, agile methodology for DR orchestration, primarily by visualizing the work through a Kanban board. However, the software development practice has expanded the Kanban methodology by introducing metrics to chart team progress and inte- grating features from other agile methodologies like Scrum. Future research could

DISASTER RECOVERY OF INFORMATION SYSTEMS 659

examine ways of enhancing agile DR orchestration by introducing agile techniques such as scrum meetings and sprint retrospectives. In addition, the use of more sophisticated tools might enable future teams to respond more efficiently. We originally conceptualized using multiple Kanban boards in a client/server type of relationship; however, we were not able to identify a software tool that would allow us to test such a model. Future research could evaluate the effectiveness of scaling the implementation described here using multiple Kanban boards. Despite these limitations, we were able to explore the fit between DR needs and agile project management methodologies, and to determine how agile methodologies could improve disaster response and recovery efforts. The results suggest that agile methodologies improve disaster response and recovery efforts by improving adapt- ability, situational awareness, orchestration efficiency, focus of effort, and overall communication. The results also suggest that the agile principles of project visibility and continuous delivery also play an important role in DR orchestration. In conclu- sion, this study extends the extant literature and assists future researchers in under- standing DR orchestration.

Supplemental File

Supplemental data for this article can be accessed on the publisher’s website at 10.1080/07421222.2017.1372996

NOTES

1. We acknowledge that the term “information systems” broadly pertains to people, processes, and technology that are used to handle and interpret information. However, the term is also used in a restrictive sense to refer only to the computer networks, systems, and software used in an organization. We adopt the latter approach in this study.

2 We are aware that there are many types of disasters such as natural disasters, man-made disasters, and onset disasters. In this study, we focus on disasters at the organizational level.

3. The lead researcher was contacted by a company wanting to know more about agile methodologies and how agile might be leveraged and implemented to improve the recovery time and efficiency of DR. Although other capabilities were considered, the potential relation- ship between DR needs and agile methodologies were of interest to the company and needed to be resolved.

REFERENCES 1. Anderson, D., and Carmichael, A. Essential Kanban Condensed. Seattle, WA: Blue

Hole Press, 2016. 2. Avison, D.; Lau, F.; Myers, M.; and Nielsen, P. Action research. Communications of the

ACM, 42, 1 (1999), 94–97. 3. Baham, C.; Calderon, A.; and Hirschheim, R. Applying a layered framework to

disaster recovery. Communications of the Association for Information Systems, 40, 1 (2017), 277–293.

4. Baskerville, R. Investigating information systems with action research. Communications of the Association for Information Systems, 2, 19 (1999), 1–32.

660 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

5. Baskerville, R., and Myers. M. Special issue on action research in information systems: Making IS research relevant to practice: Foreword. MIS Quarterly, 28, 3 (2004), 329–335.

6. Baskerville, R., and Wood-Harper, A. A critical perspective on action research as a method for information systems research. Journal of Information Technology, 11, 3 (1996), 235–246.

7. Beck, K.; Beedle, M.; Van Bennekum, A.; Cockburn, A.; Cunningham, W.; Fowler, M.; Grenning, J.; Highsmith, J.; Hunt, A.; Jeffries, R.; and Kern, J. Manifesto for agile software development, 2001.

8. Berke, P.; Kartez, J.; and Wenger, D. Recovery after disaster: Achieving sustainable development, mitigation and equity. Disasters, 17, 2 (1993), 93–109.

9. Boehm, B., and Turner, R. Management challenges to implementing agile processes in traditional development organizations. IEEE Software, 22, 5 (2005), 30–39. 10. Chen, R.; Sharman, R.; Rao, H.; and Upadhyaya, S. Coordination in emergency

response management. Communications of the ACM, 51, 5 (2008), 66–73. 11. Chen, R.; Sharman, R.; Rao, H.; and Upadhyaya, S. Data model development for fire

related extreme events: An activity theory approach. MIS Quarterly, 37, 1 (2013), 125–147. 12. Conboy, K. Agility from first principles: Reconstructing the concept of agility in

information systems development. Information Systems Research, 20, 3 (2009), 329–354. 13. Copeland, J. Emergency Response: Unity of Effort through a Common Operational

Picture. Carlisle Barracks, PA: Army War College, 2008. 14. Davison, R.; Martinsons, M.; and Kock, N. Principles of canonical action research.

Information Systems Journal, 14, 1 (2004), 65–86. 15. Dynes, R. Community emergency planning: False assumptions and inappropriate ana-

logies. International Journal of Mass Emergencies and Disasters, 12, 2 (1994), 141. 16. Erlanger, S.; Bilefsky, D.; and Chan, S. U.K. health service ignored warnings for

months. New York Times, May 12, 2017. 17. Fruhling, A., and Vreede, G. Field experiences with eXtreme programming: Developing

an emergency response system. Journal of Management Information Systems, 22, 4 (2006) 39–68. 18. Goldman, S.; Nagel, R.; Preiss, K.; and Dove, R. Iacocca Institute: 21st Century

Manufacturing Enterprise Strategy: An Industry Led View. Bethlehem, PA: Iacocca Institute, 1991. 19. Harrald, J. Agility and discipline: Critical success factors for disaster response. Annals

of the American Academy of Political and Social Science, 604, 1 (2006), 256–272. 20. Harris, C. IT downtime costs $26.5 billion in lost revenue. InformationWeek. 2010.

Available at: http://www.informationweek.com/it-downtime-costs-$265-billion-in-lost- revenue/d/d-id/1097919?. (accessed on June 25, 2014) 21. Hasselle, D. Hours-long AT&T outage in Baton Rouge, Livingston areas undermines

rescue efforts, but some say service returning. New Orleans Advocate. 2016. Available at: www.theadvocate.com/new_orleans/news/article_4fc33f0e-6255-11e6-9dd0-d3e354adba6e. html. (accessed on August 14, 2016) 22. Ireson, N. Local community situational awareness during an emergency. In Proceedings

of the Third IEEE International Conference on Digital Ecosystems and Technologies. June 1– 3, 2009, pp. 49–54. 23. Ivancevich, D.; Hermanson, D.; and Smith, L. The association of perceived disaster

recovery plan strength with organizational characteristics. Journal of Information Systems, 12, 1 (1998), 31–43. 24. Kappelman, L.; Mclean, E.; Luftman, J.; and Johnson, V. Key issues of IT organizations

and their leadership: The 2013 SIM IT trends study. MIS Quarterly Executive, 12, 4 (2013), 227–240. 25. Kendall, K.; Kendall, J.; and Lee, K. Understanding disaster recovery planning through

a theatre metaphor: Rehearsing for a show that might never open. Communications of the Association for Information Systems, 16, 1 (2005), 1001–1012. 26. Kenefick, S. Agile development methodologies (ID: G00211991). Available at: https://

www.gartner.com/doc/1694216/agile-development-methodologies. (accessed on June 1, 2014)

DISASTER RECOVERY OF INFORMATION SYSTEMS 661

27. Lawler, C.; Szygenda, S.; and Thornton, M. Techniques for disaster tolerant information technology systems. In Systems Conference, 2007 1st Annual IEEE (2007), 1-6. 28. Lenk, A., and Tai, S. Cloud standby: Disaster recovery of distributed systems in the

cloud, in service-oriented and cloud computing. In M. Villari, W. Zimmermann, and K. Lau (eds.), European Conference on Service-Oriented and Cloud Computing. Lecture Notes in Computer Science, vol. 8745. Berlin, Germany: Springer, 2014, pp. 32–46. 29. Malaurent, J., and Avison, D. Reconciling global and local needs: A canonical action

research project to deal with workarounds. Information Systems Journal, 26, 3 (2015), 227–257. 30. Mårtensson, P., and Lee, A. Dialogical action research at omega corporation. MIS

Quarterly, 28, 3 (2004), 507–536. 31. Mcentire, D., and Fuller, C. The need for a holistic theoretical approach: An examina-

tion from the El Niño disasters in Peru. Disaster Prevention and Management, 11, 2 (2002), 128–140. 32. Mchugh, O.; Conboy, K.; and Lang, M. Agile practices: The impact on trust in software

project teams. Software IEEE, 29, 3 (2012), 71–76. 33. Mumford, E. Advice for an action researcher. Information Technology and People, 14, 1

(2001), 12–27. 34. National Incident Management System (NIMS) Integration Center. National Incident

Management System. Washington DC: U.S. Department of Homeland Security, 2004. 35. Ngwenyama, O., and Nielsen, P.A. Competing values in software process improvement:

An assumption analysis of CMM from an organizational culture perspective. IEEE Transactions on Engineering Management, 50, 1 (2003), 100–112. 36. Nunamaker, J. Jr.; Briggs, R.; Derrick, D.; and Schwabe, G. The last research mile:

Achieving both rigor and relevance in information systems research. Journal of Management Information Systems, 32, 3 (2015), 10–47. 37. OpsCentre. Does the concept of agile recovery make sense? Disaster Recovery Journal,

2015. Available at: www.drj.com/industry/industry-hot-news/does-the-concept-of-agile-recov ery-make-sense.html. (accessed on June 2, 2015) 38. Radigan, D. A brief introduction to Kanban. Atlassian, 2015. Available at: www.

atlassian.com/agile/kanban. (accessed on June 3, 2015) 39. Reason, P. Pragmatist philosophy and action research. Action Research, 1, 1 (2003),

103–123. 40. Rose, K. A Guide to the Project Management Body of Knowledge (PMBOK® Guide),

5th ed. Project Management Journal, 44, 3, (2013), e1. 41. Rubin, H., and Rubin, I. Qualitative Interviewing: The Art of Hearing Data. Thousand

Oaks: Sage, 1995. 42. Schultz, T. Reflections on investment in man. Journal of Political Economy, 58, 1

(1962), 1–8. 43. Shao, B. Optimal redundancy allocation for disaster recovery planning in the network

economy. In H. Chen, R. Moore, D. Zeng, and J. Leavitt (eds.) Intelligence and Security Informatics. Lecture Notes in Computer Science, vol. 3073. Berlin, Germany: Springer, 2004, pp. 484–491. 44. Skipper, J.; Hall, D.; and Hanna, J. Top management support, external and internal

organizational collaboration, and organizational flexibility in preparation for extreme events. Journal of Information System Security, 5, 1 (2009), 32–60. 45. Sugimori, Y.; Kusunoki, K.; Cho, F.; and Uchikawa, S. Toyota production system and

Kanban system materialization of just-in-time and respect-for-human system. International Journal of Production Research, 15, 6 (1977), 553–564. 46. Susman, G., and Evered, R. An assessment of the scientific merits of action research.

Administrative Science Quarterly, 23, 4 (1978), 582–603. 47. Sutherland, J., and Schwaber, K. Business object design and implementation workshop.

In Proceedings of the OOPSLA ‘95. Austin, Texas, October 15–19, 1995, pp. 170–175. 48. Takeuchi, H., and Nonaka, I. The new product development game. Harvard Business

Review, 64, 1 (1986), 137–146. 49. Trello. Trello Inc. Delaware, 2014. Available at: http://trello.com. (accessed on August

15, 2014)

662 BAHAM, HIRSCHHEIM, CALDERON, AND KISEKKA

50. Wang, K.; Li, L.; Yuan, F.; and Zhou, L. Disaster recovery system model with e-government case study. Natural Hazards Review, 7, 4 (2006), 145–149. 51. Webb, G.; Tierney, K.; and Dahlhamer, J. Predicting long-term business recovery from

disaster: A comparison of the Loma Prieta earthquake and Hurricane Andrew. Global Environmental Change Part B: Environmental Hazards, 4, 2 (2002), 45–58. 52. Yan, H. Louisiana’s mammoth flooding: By the numbers. CNN, 2016. Available at:

www.cnn.com/2016/08/16/us/louisiana-flooding-by-the-numbers/. (accessed on August 15, 2014) 53. Yin, R. Case Study Research: Design and Methods. Thousand Oaks, CA: Sage, 2008.

DISASTER RECOVERY OF INFORMATION SYSTEMS 663

Copyright of Journal of Management Information Systems is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.

  • Abstract
  • Theoretical Foundation
    • Disaster Recovery Literature
    • Agile Methodology Literature
    • Theoretical Lens: Parallel Between DR Needs and Agile Practices
    • Action Research Approach
  • Empirical Setting of the Study
    • Research Setting
    • Data Collection Details
  • A 5-Month, Two-Cycle Dialogical AR Project
    • First Cycle, 4 Months
    • Second Cycle, 1 Month
  • Results—Kanban-Based Framework
    • Agile Methodologies and Disaster Recovery Orchestration
    • Operationalizing Kanban Methodology Principles
    • Modifications
  • Discussion
    • Reflections on the Use of AR in DR
    • Theoretical Contributions
    • Implications for Practice
  • Conclusions
    • Limitations and Recommendations for Future Research
  • Supplemental File
  • Notes
  • References