A More Resilient Future with Automated Ransomware Recovery

The constant evolution of the digital world has not only presented an abundance of opportunities, but also raised an equal amount of security challenges, ransomware being one of the most sinister. In response to this growing threat, our team of Principal engineers at Cisco (including myself under the guidance of our project sponsors from Cisco’s Security Business Group and Cisco IT), embarked on a journey towards automating ransomware recovery not just for our own enterprise, but for everyone.

The underlying problem we sought to address was the ability to automatically recover hosts from a ransomware attack. An intricate analysis of assumptions and facts was necessary, as our initial assumptions had to be validated against reality. We began by knowing all incidents require an eradication and recovery process. This responsive process could leverage automation or orchestration. Furthermore, we believed that ransomware could be mitigated by response initiated from events or alerts. This meant that activities that normally would be considered administrative in nature or “living off the land” had to be considered in detecting adversarial activity.

We began looking at all the prevalent sources of threat intelligence on ransomware activities and analysis from sources like our own Talos Intelligence, CISA ransomware[1] guide, Splunk SURGe, our internal Cisco IT, and others. As our journey progressed, we identified new facts that shaped our approach to automated ransomware recovery. We found that effective responses needed to be close to the source, and the alerts often lacked a clear progression to the ransomware target(s).

A significant revelation was the limited window for response, typically less than 45 minutes[2], which drove us to think critically about the time-sensitive nature of ransomware recovery. Microsoft Windows is the predominate operating system used for ransomware operations. However, there have been Linux variants of ransomware too, so we needed a solution that could help in the most severe situations.

As we began exploring various conceptual solutions, we considered three main options:

API Responsive Recovery: Using Automation on Endpoint Recovery using third-party integration seemed promising, especially with the easy applicability of cloud capabilities. However, this solution might lead to the loss of locally stored data on user systems.

Selective Response: Selective response on critical systems stood out as a solution that allows for fast recovery and rollback to the last known good state for systems. However, database and transactional systems could pose challenges for recovery.

Operating System Centric: Windows Volume Shadow Copy Service (VSS) administration with protection drivers, a Windows-only feature, was an intriguing solution. Despite its limitations, it offered multiple benefits, such as local storage limits and immunity to restore the system, effectively disabling the attacker’s capabilities which is why almost all of the ransomware attacks target this native Windows capability.

Our long-term recommendation centered around the preventive measures, which include the development of a Secure Endpoint Transformation Roadmap. Incorporating endpoint integrations with memory or device protection drivers is vital for advanced protection. New recovery options for Windows systems and protection for native capabilities, and endpoint policy advancement with permit and deny lists, means that adversaries would have a harder time disabling a service that the system has access to.

Linux doesn’t have a “volume shadow service”, and yet by creating our protection driver(s), we’ll be able to add a service like Linux Volume Management to “snap” the image to a location for protection in the future.

We also evaluated third-party solutions like virtual systems protection from Cohesity, Endpoints with Code42, and thin-client architectures like Citrix. Some other innovative solutions, like Bitdefender and Trellix, keep a small copy of recovery data either in-memory or on disk, providing additional layers of security.

Moving forward, we intend to thoroughly analyze the assumptions underlying our project. For instance, we need to decide on the systems we can protect effectively, including the most at risk (servers), the most volatile (customer devices), and the least impacted (cloud devices).

A critical part of our project was learning from real-world ransomware attack cases. We understand that while commodity malware provides significant value from a recovery model focused on the endpoint, targeted attacks require more prescriptive and preventative capabilities.

We are considering two main models for remediation:

Shutdown Everything: This model involves predicting suspicious behavior and preemptively backing up data, then restoring to that last known configuration. Predicting suspicious behavior is difficult, because you can’t just use one event or parts of multiple events. You really needed to correlate an attack pattern and then preemptively backup and recover.

Just in Time: Here, we notice suspicious behavior and backup changes as they occur, like Bitdefender’s module. Giving the analyst a way to surgically restore objects within the operating system on the fly.

We had two final recommendations that have driven our innovation and efforts into this blog and future capabilities. We knew we needed something now that would help all measures of customers. Our smaller customers are underserved by not having all the resources to create synchronized, effective recovery options for their environments.

We determined that API Responsive Recovery option was less than adequate, while pretty much readily available now and does provide a measure of protection, but at the selection of cost and potential to storm a backup solution with “snaps” or backup requests along with the load to recover all systems.

Traditional API implementation with a SIEM/SOAR solution would be chaotic to manage effectively and lack the ability to provide enough context related to the systems that are impacted. This solution provides the most customizable solution and mostly customer created. This isolates teams with lean IT options to ensure that the SOC and IT have adequate controls prior to recovery options. While this capability was well within our grasp, it left us wanting more.

Moving on to Selective Response, which focused on only recovering critical systems. During our interview with our team of experts at Cisco, we found a common theme: recovery processes needed to be for the most important systems first, think Business Continuity Plan. Individual computers in a disaster recovery scenario weren’t always the first systems to be recovered. We needed to restore and recover the most critical systems that served the business. We also identified this as a critical task for all teams, including the smallest. A lot of times small teams are forced to pay the ransom because they can’t trust the restoration processes based on individual recovery software, or the data loss is too great.

This is where our partner Cohesity comes into the picture. Cohesity provides a comprehensive protection plan for virtual systems[3]. One of the best defensive capabilities for ransomware is a solid recovery process for those systems. Virtualizing systems has become the standard for most hybrid data centers to allow for efficient resource allocation and high availability capabilities, but it lacked features for restoration of combined application services systems. Cohesity, which works with the Cisco UCS chassis[4] for virtualization, provides configurable recovery point objective for systems assigned to a protection plan. Cohesity Helios coalesces the data recovery needs of separate application services by synchronizing the restoration process of disparate system snapshots into a single recovery process. For example: Being able to protect a database with a one-hour recovery point objective (RPO), application server with a four-hour RPO, and web server with a twelve-hour RPOs into a single protection plan. This recovery capability allows you to restore your application service under protection with a minimal amount of effort and maximized service restoration by restoring the images at the same recovery point while protecting it from adversarial tampering

We started our ransomware recovery partnership with Cohesity and SecureX, which provided us with the capability to recover after the backup solution found a ransomware event. Now, Cisco XDR steps this up a level, leveraging true detection and correlation and integrated response capabilities. Cisco XDR and Cohesity can help you protect and recover from ransomware events rapidly, matching the speed of an attack.

The proven recovery capabilities of Cohesity are enhanced by allowing XDR to send a just-in-time request to snapshot a server. For example, in a Ryuk ransomware campaign, the adversary will infect the first target, use lateral movement to infect another system with malware to establish both persistence and a command-and-control point. This leads to the last infected system to “kerberoast” the domain controller or infecting other sensitive systems. These events from email, endpoint, network and identity protection products creates a correlated attack chain of events to XDR incidents, which then signals XDR to automatically execute a built-in Automate workflow to request a snapshot for any asset in the incident from Cohesity Helios. If a plan exists for an asset, Helios sends back the last known good snapshot of the protection plan and any data sensitivity information it knows about the protection plan, and immediately starts a new snapshot process. Using Coherity’s DataHawk, customers will be provided a data classification which is great for incident responders, because knowing that an asset has HIPAA, PCI, PII or any defined sensitive information, can change the scope of the investigation and provides a better asset contextual understanding.

The Cisco XDR response plan has an existing integration for requesting a ServiceNow request for system recovery that would include the known backup information, the request of the snapshot and the sensitivity classification of the system. This will allow backup administrators to act quickly to restore the system back to full functioning capability. To avoid snapshot or recovery storms, Cohesity has built in a back off capability that alerts everyone that an existing snapshot request was executed with last known runtime back off. Meaning that if the snapshot took two hours last time, the snapshot would have to wait two hours until the next request or when the last request is finished whichever occurs first.

We did not forget about our other option, Operating System Centric. This capability exists, but few systems can use them effectively, because the attackers know about them and actively disable them. So, we need drivers to isolate the service and protect it from tampering and misuse. This transformational capability is in the roadmap for our Secure Endpoint module of Secure Client.

Ultimately, the development and implementation of automated ransomware recovery is a complex yet essential task. We have some additional work to complete before this integration can be completed and released as a feature to Cisco XDR. For existing XDR customers, (which is now generally available) you will need to have a valid Cohesity license and API credentials. If you have Cisco XDR and you want to purchase Cohesity, please reach out to your Cisco or Cohesity sales representative.

As we progress on our journey, we remain committed to developing an effective solution to strengthen cybersecurity and resilience against ransomware threats, providing our customers with a secure and reliable digital environment.

View our integration in action:

Stay tuned for more updates as we continue to build our solution for the future!

Security

A More Resilient Future with Automated Ransomware Recovery

RELATED LINKS/RESOURCES

Authors

Rob Gresham

Principal Engineer

Threat Detection and Response

Cisco Cybersecurity Viewpoints

Why Cisco Security?

Security

A More Resilient Future with Automated Ransomware Recovery

RELATED LINKS/RESOURCES

Authors

Rob Gresham

Principal Engineer

Threat Detection and Response

Cisco Cybersecurity Viewpoints

Why Cisco Security?

CONNECT WITH US