The GrAde
Disaster Recovery Planning
MSIS 4253/5253
Disasters Can Happen
Unprecedented Data breaches at Equifax and Yahoo expose millions to identity theft
Oklahoma has become the most active seismic state in the US
Micro-Burst Storm knocks out
Power and Closes Business
Ice and Freezing Rain Knockout Power to 500,000 in Tulsa Oklahoma.
Pandemic Flu – H1N1
Terrorism – OKC Bombing, Mass Shootings & 9/11
Hurricanes Harvey & Irma devastate Texas and Puerto Rico
Cooling issue causes Microsoft’s Azure service to shut down in some areas
2
9/11, tsunami, 4 hurricanes in FL, Hurricanes Katrina and Rita, tornadoes in Midwest, wildfires in southern CA – all have raised consciousness about the devastating impact of natural and human caused events.
There are many potential disasters – floods, hurricanes, earthquakes, tornadoes, and wildfires, to name a few. But it doesn’t take a widespread disaster to stall a small business. Something as small as an isolated power failure, broken water pipes, computer or hard drive failures can close you down.
Small businesses are particularly vulnerable in disasters. Statistics show that, of the small businesses that close because of disaster, at least one in four never reopens. Reality is probably higher than that, because most statistics just cover the first two years, and some businesses hang on for 2 to 5 years before they give up.
Definition
A DRP is a documented process or set of procedures to recover and protect a business IT infrastructure in the event of a disaster. It is a comprehensive statement of consistent actions to be taken before, during and after a disaster.
A subset of a Business Continuity Plan (BCP)
Primary focus is on IT
Getting started
According to Geoffrey H. Wold of the Disaster Recovery Journal, the entire process involved in developing a Disaster Recovery Plan consists of 10 steps
Obtaining top management commitment
Establishing a disaster recovery committee
Performing a risk assessment and a business impact analysis
Establishing priorities for processing and business operations
Determining recovery strategies
Collecting data
Organizing and documenting a written plan
Developing testing criteria and plan procedures
Testing the plan
Obtaining approval
Wold, Geoffrey H. (1997). "Disaster Recovery Planning Process". Disaster Recovery Journal. Adapted from Volume 5 #1. Disaster Recovery World. Archived from the original on 15 August 2012. Retrieved 8 August 2012.
Obtaining Top Management Support
Discussion of why a DRP (or update is needed)
Prepare to talk time and money
ALEs will go a long way toward convincing senior leadership
Indirect losses (lack of productivity)
Establishing a planning committee
Should include representatives from all key business areas (operations, HR, etc.)
Must have authority to make speak for their department
Defines the scope of the plan
Performing a risk assessment and a business impact analysis
Risk analysis
You know how do to these (NIST SP 800-30)
Consider all major a broad range of possible disasters
Includes worst case scenario
Business Impact Analysis
Differentiates critical (urgent) and non-critical (non-urgent) organization functions/activities
Some may be dictated by law
Recover Point Objective (RPO)
Recovery Time Objective (RTO)
Maximum Tolerable Period of Disruption (MTPoD)
Establishing priorities for processing and business operations
Limited resources for recovery
Essential, Important, Non-essential
Areas reviewed
functional operations
key personnel and their functions
information flow
processing systems used
services provided
existing documentation
historical records
department's policies and procedures.
Determining Recovery Strategies
During this phase, the most practical alternatives for processing in case of a disaster are researched and evaluated
Physical facilities, hardware, software, communications, data, databases, customer service
Hot sites, cold sites, warm sites, reciprocal agreements, consortium agreements, lease of equipment
Written agreements for specific recovery alternatives
Contract duration, termination conditions, system testing, cost, any special security procedures, procedure for the notification of system changes, hours of operation, the specific hardware and other equipment required for processing, personnel requirements, definition of the circumstances constituting an emergency, process to negotiate service extensions, guarantee of compatibility, availability and other contractual issues
Solution Design
The solution phase determines:
crisis management command structure
secondary work sites
telecommunication architecture between primary and secondary work sites
data replication methodology between primary and secondary work sites
applications and data required at the secondary work site
physical data requirements at the secondary work site.
Collecting data
Usually in the form of lists:
employee backup position listing
critical telephone numbers list
master call list
master vendor list
notification checklist
inventories (communications equipment, documentation, office equipment, forms, insurance policies, workgroup and data center computer hardware, microcomputer hardware and software, office supply, off-site storage location equipment, telephones
Organizing and documenting a written plan
Codified in writing and lays out what specific responsibilities are
Teams assigned responsibilities for specific areas
Assignment managers and back-ups
Facilities, computers, communications, logistics, user support, backup and recovery
Management team
Especially important for command and control
Assesses the disaster, activates the recovery plan, and contacts team managers
Developing testing criteria and plan procedures
Determining the feasibility and compatibility of backup facilities and procedures.
Identifying areas in the plan that need modification.
Providing training to the team managers and team members.
Demonstrating the ability of the organization to recover.
Providing motivation for maintaining and updating the disaster recovery plan
Testing the plan
“Dry run”
Tests
Checklist tests
Simulation tests
Parallel tests
Full interruption tests
Obtaining approval
Top Management needs to approval and sign off
Plan become part of the Business Continuity Plan (BCP)
Should review the plan annually
Should provide guidance and authority for plan testing
Issues and Pitfalls
Lack of buy in
Incomplete RTO and RPOs
System myopia (vpn example, cell phone example)
Lack of security
Outdate plans
Changes in organization structure
Changes in technology
Changes in mission
Failure to test
Summary
DRP is a subset of the BCP
Ten steps that have to be accomplished
Prioritize disasters and objectives
Time to look at BCPs