Dissecting a Breach: The Process of Incident Response

On December 5, 2017, at 1 pm Eastern, Cisco Security Incident Response Service experts will present a webinar titled “Dissecting a Breach: An Incident Responder’s Perspective.” This webinar will describe how our team performs incident handling within the service and what you should expect during any incident engagement, regardless of who is doing the work. We will be mapping the incident response process described in this blog to some case studies. To attend our webinar, please register here. To learn more, read on.

Incident response is a complex process which involves the systematic analysis, containment, and recovery from a security breach. A breach, or incident, is a compromise of the confidentiality, integrity, or availability of an information system, or the data it contains. The following two definitions will help to clear up what we mean when we talk about security breaches and incident response, or incident management:

“A “computer security incident” is a violation … of computer security policies, acceptable use policies, or standard security practices.” – NIST SP 800-61r2
“Incident management includes detecting and responding to computer security incidents as well as protecting critical data, assets, and systems to prevent incidents from happening.” – US-CERT

What is clear with these two widely accepted definitions is that a breach does not necessarily require malicious intent. A breach is any time a security policy is violated. Every breach requires some level of investigation for an organization to make rational conclusions on the impact of an incident. Many organizations take a “set-it and forget-it” approach to their security tools. While security appliances are always increasing in effectiveness, they will still require the watchful eye of an analyst and the keen skills of an incident responder when an adversary successfully breaches security using a novel technique or when an insider discloses confidential information.

Every mature field requires repeatable processes. On the Cisco Security Incident Response Service team, we use an Incident Response process based on the NIST 800-61r2, as seen in the figure below. By mapping our process to an industry standard, we can adapt when those standards are changed ensuring that our activities will meet compliance. The process also allows our team to clearly communicate where in the process we are during the investigation, frame communications around the process, and identify the types of tools needed to accomplish the goals in each step.

Figure 1: Incident Response Process

Preparation summarizes all activities before an incident actually occurs. The first and most important aspect of the Preparation phase is writing an incident response plan. When you start to focus your mind on the implications of a cyber security incident through formally documenting a plan, other security practices may fall in line as well. An incident response plan should be more than a checkbox in the compliance audit. It needs to be a living document built on input from all stakeholders.

While writing an incident response plan, you may realize that the current security controls in place are not sufficient to execute the plan. Basing the plan on industry best practices can help to justify initiatives to cover the security gaps. For example, many organizations lack the capability to perform digital forensics, so they either build that capability in-house or they utilize incident response retainers to ensure that the capability is available when it is needed. Logging may also be inefficient to investigate an incident or endpoints may not be properly monitored. These gaps should become evident once the appropriate policies and plans are in place.

For the Cisco Security Incident Response Service, the Preparation phase involves filling in every available space of time between customer engagements with self-development and process improvement. To stay on top of the needs of our customers, we have to keep moving forward. Internal incident response teams should exercise the same process of continuous improvement.

Preparation is also fed directly by lessons learned during Post-Incident Activity. When a compromise is successful, some failure in security control must have occurred. There will also likely be failures in process which can be corrected for the next incident. No incident response activity goes perfectly smoothly. There are a lot of moving parts. Capturing those missteps, then practicing how incident response should be performed with table-top or red-blue team exercises is a necessary part of the Post-Incident Activity and Preparation phases.

Detection and Analysis are two distinct, but directly supporting, functions that are used to determine whether there is an incident and the scope of impact for the incident. Detection is typically performed by a Security Operations Center or by the user base. When a suspicious event has occurred, it is the responsibility of the Incident Response team to determine whether the event is worthy of formally declaring an incident.

Incident responders will then work with the information technology department and other stakeholders to determine a containment plan.

Containment is a key response action to any serious incident. Ignoring containment is a recipe for the disaster of reinfection. Containment will nearly always be business impacting. The risk of not containing compromised hosts is that they will continue to be a source of compromise while attempts at eradication and recovery go without success. Not much is more frustrating during an incident response then to spend hours remediating hosts, only to have them infected again with the same family of malware.

To properly contain a host, it must be completely cut off from communicating to enterprise network resources. We often recommend for customers who do not have other means of containment, to configure their routers and switches to set up a “containment” virtual local area network (VLAN) and prevent routing between the “containment” VLAN and the normal production network. Making appropriate containment configurations ahead of time can allow for quick response times.

In order to minimize impact of containment on a business, be sure to establish proper backups of critical hosts during the Preparation phase. If a critical host needs to be contained, there may be an opportunity to restore a new host from the backup prior to the compromise.

Eradication and Recovery are two processes which can sometimes happen simultaneously. The goal of this step is to ensure that there is no infection left on the information systems.

For incidents which involve the compromise through malicious logic, it is often recommended that these hosts be entirely reimaged. When an adversary gains system-level access to a host, which is trivial to do, they have the capability to modify the integrity of the operating system. Apart from outliers, most hosts whose integrity has been violated in such a way should be treated as untrusted. The quickest way to restore trust in that system is to reimage with a known clean image. Outliers do exist though. Some critical services simply cannot be handled through reimaging. In those cases, the incident responders can work with stakeholders to devise alternative eradication procedures.

Recovery is only complete when the enterprise is returned to a fully operational state. This certainly does not mean the same state as before the compromise though. As previously mentioned, a compromise occurs because of some exposed vulnerability. Throughout the incident response process, steps to prevent reinfection should be identified and vulnerabilities mitigated or monitored. This may result in changes to the configuration of the enterprise. The goal is to make those changes while minimizing the impact to business operations.

At any time during the Detection and Analysis through the Eradication and Recovery phases, if there are new indicators of compromise discovered, those must be handled through the proper processes. This can result in a feeling of taking a step backward. It can be difficult to explain to leadership that the incident response process is not as far along as had been previously reported. We should approach the process of incident response through the eye of disassociated rational decision making. Sometimes that will mean giving bad news. If we are honest about the situation and present the news with solutions, then we can minimize the shock.

Post-Incident Activity is simply all activity after the Eradication and Recovery phase is completed. Again, this phase can only be undertaken after the compromise is completely removed and when all business services are restored. The most important piece of Post-Incident Activity is the lessons learned meeting. During the lessons learned meeting, the intention should be to identify what went right, what went wrong, and do so quantitatively through the use of metrics such as dwell time, time to detect, and time to recover.

The results of Post-Incident Activity will impact how Preparation is performed for the next compromise. This will likely result in changes to configurations and processes. It may also result in the recommendation to procure new security products to cover gaps identified. Remember, this entire process is a continuous cycle of improvement.

If you have any questions about incident response, feel free to reach out to me on Twitter @aubsec or send me an email at maaubert@cisco.com. Do not forget about our webinar on December 5. We will be demonstrating this process to use through a couple of case studies.

Cisco Blogs

Security

Dissecting a Breach: The Process of Incident Response

Authors

Matt Aubert

Senior Incident Response Analyst

Cisco Cybersecurity Viewpoints

Why Cisco Security?