NCSAM Tip #13: Understanding Operational Security Metrics
Many people often think that information and network security is just about technology and how reliable or sophisticated these technologies are. Additionally, many people ask why after spending tons of money on network and security gear, their network still gets hacked, information is lost and business continuity is disrupted. For example, often questions like these run through their minds: “Am I not buying the right security products? Am I not configuring or deploying them correctly? Do I have the right staff to run my network?”
The lack of credible and relevant network security operational metrics can contribute to this paradigm. The understanding security operational metrics doesn’t require classes on Nobel Prize-winning theories or very complicated math that may make the process too complicated to even execute. You have to understand what you are trying to protect and first establish a high level process map via your own research. Use common knowledge a broad survey to validate and identify metrics in each procedure or operational area. For instance, build a set of metrics for things like, but not limited to, the following: Incident Management
- Patch Management
- Device Compliance
- Security Device Monitoring
- Network and Internet Access
- Device Identity Management
- User Identity Management
- User Access
- Application Robustness
- User Security Awareness
These are just some examples, the list can be much longer. The goal is to define a set of subprocesses for each high-level process (or operational area), then build metrics for each sub-process. More importantly, assemble these metrics into a model which can be used to track operational improvement.
I will give some examples of metrics you can collect and examine for each of the processes or operational areas I mentioned.
Operational Metrics for Incident Management
An incident is a chain of events that may signal an attack in your network. It is, of course, very important to have a good methodology to simplify and expedite the detection, mitigation, reporting, and analysis of an incident. All this information can be captured in a case report with a case management tool and escalated to the relevant personnel. So, my question is, how effective are you or your organization in the detection, mitigation, reporting, and analysis of an incident in your network? You should at the very minimum ask the following questions and collect the corresponding metrics.
- How long does it take to identify an event?
- How long does it take to identify an incident?
- How long does it take to contain or mitigate an incident?
Let’s look at the following figure:
- To – is the time when an event occurs on the network
- Te – is the time when the event is detected on the network
- Ti – is the time when the event is classified as an incident
- Tc – is the time when the incident is contained on the network
Measure the time that takes your organization for each step and try to understand how to reduce it and be more effective.
Operational Metrics for Patch Management
Everyone understands that patch management is a critical issue. We also understand that every organization must create a consistently environment that is patched or configured against known vulnerabilities. Unfortunately, good and practical solutions aren’t often applied as everyone thinks. How do you know how effective you are in patch management, if you do not collect operational metrics that can help you measuring success or identifying gaps in such process.
The following figure summarizes (in a high-level) a typical patch management process.
First a vendor identifies and announces a vulnerability for a product or software. In the case of Cisco, we announce all of our vulnerabilities on the Cisco Security Intelligence Operations portal.
Note: The Cisco Product Security Incident Response Team (PSIRT) — which is my team — creates and maintains publications for security issues that affect Cisco products. For more information, review a description of the types of documents and the issues that they address.
If you are subscribed to receive notifications, you quickly identify all the devices that are affected within your network for that vulnerability/advisory. It is possible that the vendor has identified workarounds that you can implement quickly. You need to identify those workarounds and understand how to apply them in your environment. While a workaround is being placed, you obtain the patch or the fix for that vulnerability. In most cases you do not have a lot of time to test the new software before you deploy that patch or fix in your network. However, once the patch or image is certified, you schedule a maintenance window to roll it out into the production environment.
You will always need to keep up with the vulnerability announcements from the vendors. You can do this by subscribing to RSS, CVE announcements, monitoring aliases such as bugtraq, or any other mechanism that vendors use to notify their customers that a vulnerability exists. Additionally you must have an understanding of what devices are affected within your network so you can easily implement any workaround.
The following are some of the questions you can ask yourself to start building the operational metrics that will help you in patch management:
- How long does it take you to become aware of the new vulnerability announcements from vendors?
- How long does it take to identify affected devices?
- How long does it take to implement workarounds (when available)?
- How long does it take for you to test and implement the fix/patch?
Operational Metrics for Device Compliance
Since we talked about patch management, let me share a few metrics you can collect for device compliance. The first question is:
- Do you have devices that are not using a “certified image/version”?
The biggest risk in running a “non-certified” software version is the exposure to software vulnerabilities. If a new security advisory is released with a highly-critical vulnerability that may even impact hundreds of different products, it will be difficult to identity the impacted devices in a timely fashion. Furthermore, software version control is a best practice while deploying consistent software versions on similar network devices. This improves the chance for validation and testing on the chosen software versions and greatly limits the amount of software defects and interoperability issues found in the network. Limited software versions also reduce the risk of unexpected behavior with user interfaces, command or management output, upgrade behavior and feature behavior. This makes the environment less complex and easier to support. Overall, software version control improves network availability and helps lower reactive support costs.
The following are a few more questions
- What percent of devices are in compliance with certified software images?
- What percent of devices are in compliance with standard configuration templates?
I always recommend creating standard configurations for each device classification, such as routers, switches, firewalls, and any other security or network device. Each standard configuration should contain the global, media, and protocol configuration commands necessary to maintain network consistency, resiliency, and overall security. You can use several global configuration commands or templates in all devices that are alike and include things such as service commands, IP commands, TACACS commands, vty configuration, banners, SNMP configuration, and Network Time Protocol (NTP) configuration. Additionally, make sure to document device
and interface “descriptors”. These “descriptors” includes the purpose and location of the interface, other devices or locations connected to the interface, and circuit identifiers. This helps your support and security groups to better understand the scope of problems related to an interface and allows faster resolution of problems, such as security incidents.
Operational Metrics for Monitoring
One of the first steps in the process of preparing your network and staff to successfully identify security threats is achieving complete network visibility. You cannot protect against or mitigate what you cannot view/detect. You can achieve this level of network visibility through existing features on network devices you already have and on devices whose potential you do not even realize. In addition, you should create strategic network diagrams to clearly illustrate your packet flows and where, within the network, you may enable security mechanisms to identify, classify, and mitigate the threat. Remember that network security is a constant war. When defending against the enemy, you must know your own territory and implement defense mechanisms in place.
Security monitoring is similar to network monitoring, except it focuses on detecting changes in the network that may indicate a security incident. You must also understand and identify what the level of monitoring is required based on the threat to a system or network. For instance, a firewall is considered a high-risk network device, which indicates that you should monitor it with high priority. This means that you always should check for things such as failed login attempts, unusual traffic, changes to the firewall, access granted to the firewall, and connections setup through the firewall, etc.
Following this example, create a monitoring policy for each area identified in your risk analysis. It is often recommended monitoring low-risk equipment weekly, medium-risk equipment daily, and high-risk equipment hourly. If you require more rapid detection, monitor on a shorter time frame. However, this all depends on your environment and your staff.
These are some high-level questions that you should always ask when building operational metrics for monitoring and visibility:
- What percent of network and security devices are being successfully remotely monitored?
- What percent of the network is adequately documented?
- What percent of unauthorized data flows are found on firewalls and other networking devices?
- How often do you audit and analyze your network traffic baselines?
Operational Metrics for Network and Internet Access
The following are a few questions that you can use to develop metrics for general Internet and network access.
- What percent of edge interfaces are protected by anti-spoofing mechanisms?
- How often do you audit your firewall rules?
- How often do you audit the configuration of network and security devices that are considered critical?
- What percent of unauthorized data flows are found on the firewalls?
- What percent of devices are logging administrative logins and configuration changes?
- What often do you audit AAA systems for unauthorized users?
- What percent of unauthorized users have attempted access to network infrastructure devices?
Some of these questions tie back to the items I described earlier.
Operational Metrics for Device Identity Management
Device identity is the understanding of what a specific network device is on the network, what is its function and purpose. The following are a few questions that you can use to develop metrics for device identity management.
- What percent of unauthorized devices are on the network?
- How long does it take to locate a device from its IP address in real-time?
- How long does it take to locate a device from its IP address using historical logs?
I invite you to do a quick test within your organization. Ask any network engineer or security engineer those questions. Especially, how long does it take for them to locate a device from its IP address in real-time and by using historical logs. They will provide you some answer (i.e., 5 minutes, 10 minutes, an hour, etc.). Ask them to prove it. You will be surprised with the actual results!
Operational Metrics for User Identity Management
You can ask similar questions to develop metrics for user identity management.
- What percent of unauthorized users are on the network
- How long does it take to identify a user from its IP address in real-time?
- How long does it take to identify a user from its IP address from historical logs?
There’s no right or wrong answer for some of these metrics/questions. However, what is important is to understand these metrics and use them develop better processes and procedures to reduce the time that it takes someone to identify a device.
Do you often ask yourself these questions and track similar metrics? Please share what you are currently doing to improve and completely understand operational security metrics.