Incident Response Plan

profilesepola
bd_ch_10_sect_02_06.html

Recovering from Incidents

Once the incident has been contained and system control has been regained, incident recovery can begin. As in the incident reaction phase, the first task is to inform the appropriate human resources. Almost simultaneously, the CSIRT must assess the full extent of the damage so as to determine what must be done to restore the systems. Each individual involved should begin recovery operations based on the appropriate incident recovery section of the IR plan.

The immediate determination of the scope of the breach of confidentiality, integrity, and availability of information and information assets is called incident damage assessment. Incident damage assessment can take days or weeks, depending on the extent of the damage. The damage can range from minor (a curious hacker snooping around) to severe (hundreds of computer systems infected by malware). System logs, intrusion detection logs, configuration logs, and other documents, as well as the documentation from the incident response, provide information on the type, scope, and extent of damage. Using this information, the CSIRT assesses the current state of the information and systems and compares it to a known state. Individuals who document the damage from actual incidents must be trained to collect and preserve evidence, in case the incident is part of a crime or results in a civil action.

Once the extent of the damage has been determined, the recovery process begins. According to noted security consultant and author Donald Pipkin, this process involves the following steps:*

Pipkin, D. Information Security: Protecting the Global Enterprise. Upper Saddle River, NJ: Prentice Hall PTR, 2000: 285.

  • Identify the vulnerabilities that allowed the incident to occur and spread. Resolve them.

  • Address the safeguards that failed to stop or limit the incident or were missing from the system in the first place. Install, replace, or upgrade them.

  • Evaluate monitoring capabilities (if present). Improve detection and reporting methods or install new monitoring capabilities.

  • Restore the data from backups. The IR team must understand the backup strategy used by the organization, restore the data contained in backups, and then use the appropriate recovery processes from incremental backups or database journals to recreate any data that was created or modified since the last backup.

  • Restore the services and processes in use. Compromised services and processes must be examined, cleaned, and then restored. If services or processes were interrupted in the course of regaining control of the systems, they need to be brought back online.

  • Continuously monitor the system. If an incident happened once, it could easily happen again. Hackers frequently boast of their exploits in chat rooms and dare their peers to match their efforts. If word gets out, others may be tempted to try the same or different attacks on your systems. It is therefore important to maintain vigilance during the entire IR process.

  • Restore the confidence of the members of the organization’s communities of interest. Management, following the recommendation from the CSIRT, may want to issue a short memorandum outlining the incident and assuring all that the incident was handled and the damage was controlled. If the incident was minor, say so. If the incident was major or severely damaged systems or data, reassure users that they can expect operations to return to normal as soon as possible. The objective of this communication is to prevent panic or confusion from causing additional disruption to the operations of the organization.

Before returning to its routine duties, the CSIRT must conduct an after-action review (AAR) A detailed examination and discussion of the events that occurred during an incident or disaster, from first detection to final recovery. . The AAR is an opportunity for everyone who was involved in an incident or disaster to sit down and discuss what happened. In an AAR, a designated person acts as a moderator and allows everyone to share what happened from their own perspective while ensuring there is no blame or finger-pointing. All team members review their actions during the incident and identify areas where the IR plan worked, did not work, or should improve. Once completed, the AAR is written up and shared. All key players review their notes and the AAR and verify that the IR documentation is accurate and precise. The AAR allows the team to update the plan and brings the reaction team’s actions to a close. The AAR can serve as a training case for future staff.

According to McAfee, there are 10 common mistakes that an organization’s CSIRTs make in IR:

  1. Failure to appoint a clear chain of command with a specified individual in charge

  2. Failure to establish a central operations center

  3. Failure to “know your enemy,” as described in Chapters 1 and 6

  4. Failure to develop a comprehensive IR plan with containment strategies

  5. Failure to record IR activities at all phases, especially help desk tickets to detect incidents

  6. Failure to document the events as they occur in a timeline

  7. Failure to distinguish incident containment from incident remediation (as part of reaction)

  8. Failure to secure and monitor networks and network devices

  9. Failure to establish and manage system and network logging

  10. Failure to establish and support effective anti-virus and antimalware solutions*

    McAfee. “Emergency Incident Response: 10 Common Mistakes of Incident Responders.” Accessed 7/12/15 from www.mcafee.com/us/resources/white-papers/foundstone/wp-10-common-mistakes-incident-responders.pdf.

NIST SP 800-61, Rev. 2 makes the following recommendations for handling incidents:

  • Acquire Tools and Resources That May Be of Value During Incident Handling—The team will be more efficient at handling incidents if various tools and resources are already available to them. Examples include contact lists, encryption software, network diagrams, backup devices, digital forensic software, and port lists.

  • Prevent Incidents from Occurring by Ensuring That Networks, Systems, and Applications Are Sufficiently Secure—Preventing incidents is beneficial to the organization and reduces the workload of the incident response team. Performing periodic risk assessments and reducing the identified risks to an acceptable level are effective in reducing the number of incidents. Awareness of security policies and procedures by users, IT staff, and management is also very important.

  • Identify Precursors and Indicators Through Alerts Generated by Several Types of Security Software—Intrusion detection and prevention systems, antivirus software, and file integrity checking software are valuable for detecting signs of incidents. Each type of software may detect incidents that the other types cannot, so the use of several types of computer security software is highly recommended. Third-party monitoring services can also be helpful.

  • Establish Mechanisms for Outside Parties to Report Incidents—Outside parties may want to report incidents to the organization—for example, they may believe that one of the organization’s users is attacking them. Organizations should publish a phone number and e-mail address that outside parties can use to report such incidents.

  • Require a Baseline Level of Logging and Auditing on All Systems, and a Higher Baseline Level on All Critical Systems—Logs from operating systems, services, and applications frequently provide value during incident analysis, particularly if auditing was enabled. The logs can provide information such as which accounts were accessed and what actions were performed.

  • Profile Networks and Systems—Profiling measures the characteristics of expected activity levels so that changes in patterns can be more easily identified. If the profiling process is automated, deviations from expected activity levels can be detected and reported to administrators quickly, leading to faster detection of incidents and operational issues.

  • Understand the Normal Behaviors of Networks, Systems, and Applications—Team members who understand normal behavior should be able to recognize abnormal behavior more easily. This knowledge can best be gained by reviewing log entries and security alerts; the handlers should become familiar with typical data and can investigate unusual entries to gain more knowledge.

  • Create a Log Retention Policy—Information about an incident may be recorded in several places. Creating and implementing a log retention policy that specifies how long log data should be maintained may be extremely helpful in analysis because older log entries may show reconnaissance activity or previous instances of similar attacks.

  • Perform Event Correlation—Evidence of an incident may be captured in several logs. Correlating events among multiple sources can be invaluable in collecting all the available information for an incident and validating whether the incident occurred.

  • Keep All Host Clocks Synchronized—If the devices that report events have inconsistent clock settings, event correlation will be more complicated. Clock discrepancies may also cause problems from an evidentiary standpoint.

  • Maintain and Use a Knowledge Base of Information—Handlers need to reference information quickly during incident analysis; a centralized knowledge base provides a consistent, maintainable source of information. The knowledge base should include general information such as data on precursors and indicators of previous incidents.

  • Start Recording All Information as Soon as the Team Suspects That an Incident Has Occurred—Every step taken, from the time the incident was detected to its final resolution, should be documented and time-stamped. Information of this nature can serve as evidence in a court of law if legal prosecution is pursued. Recording the steps performed can also lead to a more efficient, systematic, and less error-prone handling of the problem.

  • Safeguard Incident Data—This data often contains sensitive information about vulnerabilities, security breaches, and users who may have performed inappropriate actions. The team should ensure that access to incident data is properly restricted, both logically and physically.

  • Prioritize Handling of the Incidents Based on the Relevant Factors—Because of resource limitations, incidents should not be handled on a first-come, first-served basis. Instead, organizations should establish written guidelines that outline how quickly the team must respond to the incident and what actions should be performed, based on relevant factors such as the functional and information impact of the incident, and the likely recoverability from the incident. This saves time for the incident handlers and provides a justification to management and system owners for their actions. Organizations should also establish an escalation process for instances when the team does not respond to an incident within the designated time.

  • Include Provisions for Incident Reporting in the Organization’s Incident Response Policy—Organizations should specify which incidents must be reported, when they must be reported, and to whom. The parties most commonly notified are the CIO, head of information security, local information security officer, other incident response teams within the organization, and system owners.

  • Establish Strategies and Procedures for Containing Incidents—It is important to contain incidents quickly and effectively limit their business impact. Organizations should define acceptable risks in containing incidents and develop strategies and procedures accordingly. Containment strategies should vary based on the type of incident.

  • Follow Established Procedures for Evidence Gathering and Handling—The team should clearly document how all evidence has been preserved. Evidence should be accounted for at all times. The team should meet with legal staff and law enforcement agencies to discuss evidence handling, then develop procedures based on those discussions.

  • Capture Volatile Data from Systems as Evidence—This data includes lists of network connections, processes, login sessions, open files, network interface configurations, and the contents of memory. Running carefully chosen commands from trusted media can collect the necessary information without damaging the system’s evidence.

  • Obtain System Snapshots Through Full Forensic Disk Images, Not File System Backups—Disk images should be made to sanitized write-protectable or write-once media. This process is superior to a file system backup for investigatory and evidentiary purposes. Imaging is also valuable in that it is much safer to analyze an image than it is to perform analysis on the original system because the analysis may inadvertently alter the original.

  • Hold Lessons-Learned Meetings After Major Incidents—Lessons-learned meetings are extremely helpful in improving security measures and the incident handling process itself.*

    Cichonski, P., Millar, T., Grance, T. and Scarfone, K. “Special Publication 800-61, Rev 2: Computer Security Incident Handling Guide.” Accessed 7/12/15 from http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r2.pdf.

Note that most of these recommendations were covered earlier in this section. CSIRT members should be very familiar with these tools and techniques prior to an incident. Trying to use unfamiliar procedures in the middle of an incident could prove very costly to the organization and cause more harm than good.

For more information on incident handling, read the Incident Handlers Handbook, which is available from the SANS reading room at www.sans.org/reading-room/whitepapers/incident/incident-handlers-handbook-33901, or search for other incident handling papers at www.sans.org/reading-room/whitepapers/incident/.

Organizational Philosophy on Incident and Disaster Handling

Eventually the organization will encounter incidents and disasters that stem from an intentional attack on its information assets by an individual or group, as opposed to one from an unintentional source, such as a service outage, employee mistake, or natural disaster.

At that point, the organization must choose one of two philosophies that will affect its approach to IR and DR as well as subsequent involvement of digital forensics and law enforcement, as you will learn later in this chapter:

  • Protect and forget The organizational CP philosophy that focuses on the defense of information assets and preventing reoccurrence rather than the attacker’s identification and prosecution. Also known as “patch and proceed.” —This approach, also known as “patch and proceed,” focuses on the defense of data and the systems that house, use, and transmit it. An investigation that takes this approach focuses on the detection and analysis of events to determine how they happened and to prevent reoccurrence. Once the current event is over, the questions of who caused it and why are almost immaterial.

  • Apprehend and prosecute The organizational CP philosophy that focuses on an attacker’s identification and prosecution, the defense of information assets, and preventing reoccurrence. Also known as “pursue and prosecute.” —This approach, also known as “pursue and prosecute,” focuses on the identification and apprehension of responsible individuals, with additional attention paid to the collection and preservation of potential evidentiary material that might support administrative or criminal prosecution. This approach requires much more attention to detail to prevent contamination of evidence that might hinder prosecution.

An organization might find it impossible to retain enough data to successfully handle even administrative penalties, but it should certainly adopt the latter approach if it wants to pursue formal administrative penalties, especially if the employee is likely to challenge these penalties. The use of digital forensics to aid in IR and DR when dealing with intentional attacks is discussed later in this chapter, along with information for when or if to involve law enforcement agencies.

View Point The Causes of Incidents and Disasters

Karen Scarfone, Principal Consultant, Scarfone Cybersecurity Karen Scarfone, Principal Consultant, Scarfone Cybersecurity

The term incident has somewhat different meanings in the contexts of incident response and disaster recovery. People in the incident response community generally think of an “incident” as being caused by a malicious attack and a “disaster” as being caused by natural causes (fire, flood, earthquake, etc.). Meanwhile, people in the disaster recovery community tend to use the term incident in a cause-free manner, with the cause of the incident or disaster generally being irrelevant and the difference between the two being based solely on the scope of the event’s impact. An incident is a milder event and a disaster is a more serious event.

The result of this is that people who are deeply embedded in the incident response community often think of incident response as being largely unrelated to disaster recovery, because they think of a disaster as being caused by a natural disaster, not an attack. Incident responders also often think of operational problems, such as major service failures, as being neither incidents nor disasters. Meanwhile, people who are deeply embedded in the disaster recovery community see incident response and disaster recovery as being much more similar and covering a much more comprehensive range of problems.

So where does the truth lie? Well, it depends on the organization. Some organizations take a more integrated approach to business continuity and have their incident response, disaster recovery, and other business continuity components closely integrated with one another so that they work together fairly seamlessly. Other organizations treat these business continuity components as more discrete elements and focus on making each element strong rather than establishing strong commonalities and linkages among the components. There are pluses and minuses to each of these approaches.

Personally, I find that the most important thing is to avoid turf wars between the business continuity component teams. There is nothing more frustrating than delaying the response to an incident or disaster because people disagree on its cause. The security folks say it’s an operational problem, the operational folks say it’s a disaster, and the disaster folks say it’s a security incident. So, like a hot potato, the event gets passed from team to team while people argue about its cause. In reality, for some problems, the cause is not immediately apparent.

What’s really important to any organization is that each adverse event, regardless of the cause, be assessed and prioritized as quickly as possible. That means that teams need to be willing to step up and address adverse events whether or not the event is clearly their responsibility. The impact of the incident is generally the same, no matter what the cause is. If later information shows that there’s a particular cause that better fits a different team, the handling of the event can be transferred to the other team. Teams should be prepared to transfer events to other teams and to receive transferred events from other teams at any time.

The “CSI Computer Crime and Security Survey, 2010/2011” describes how organizations have responded to intrusions. Although the survey is a bit dated (and is no longer conducted) it still provides a valuable look into how the average organization prepares for and recovers from attack-based incidents (intrusions):

  • 62.3 percent—Patched any vulnerable software

  • 49.3 percent—Patched or remediated other vulnerable hardware or infrastructure

  • 48.6 percent—Installed additional computer security software

  • 44.2 percent—Conducted an internal forensic investigation

  • 42.0 percent—Provided additional security awareness training to their end users

  • 40.6 percent—Changed their organization’s security policies

  • 32.6 percent—Changed or replaced software or systems

  • 27.5 percent—Reported the intrusion(s) to a law enforcement agency

  • 26.8 percent—Installed additional computer security hardware

  • 26.1 percent—Reported intrusion(s) to their legal counsel

  • 25.4 percent—Did not report the intrusion(s) to anyone outside the organization

  • 23.9 percent—Attempted to identify perpetrator using their own resources

  • 18.1 percent—Reported the intrusion(s) to individuals whose personal data was breached

  • 15.9 percent—Provided new security services to users/customers

  • 14.5 percent—Reported the intrusion(s) to business partners or contractors

  • 13.8 percent—Contracted a third-party forensic investigator

  • 3.6 percent—Reported the intrusion(s) to public media*

    “CSI Computer Crime and Security Survey, 2010/2011.” Computer Security Institute. Accessed 7/12/15 from http://reports.informationweek.com/abstract/21/7377/Security/research-2010-2011-csi-survey.html.

What is shocking is how few organizations notify individuals that their personal data has been breached. Should it ever be exposed to the public, those organizations could find themselves confronted with criminal charges or corporate negligence suits. Laws like the Sarbanes-Oxley Act of 2002 specifically implement personal ethical liability requirements for organizational management. Failure to report loss of personal data can run directly afoul of these laws.

Listen webReader by ReadSpeaker Open/close toolbar