It’s Not The Network
All too often we networkers spend our time defending the network not only from security threats but from blame as the root cause (actual or perceived) of performance problems. The network is guilty until proven innocent. So how do we counter these arguments, put the issue to rest, and uphold the integrity of the network? Logs, logs, logs.
Logs are evidence to support your hypothesis. There are a couple of different types of logs I’d like to talk through and the roles they provide in a tiered approach to troubleshooting.
SNMP – This is one of the first places I go to when an issue is reported. This provides a look at the current state of the network based on polling intervals and traps, and also a place to explore data patterns and trends. Most enterprises will have an NMS solution in place and in my experience this is also a great place to learn the topology of the network(s) when joining a new company. There are many commercial and open source products available and I suggest trying a few different options to find out which works best for you and your team as they all organize and present the data in slightly different manners.
Syslogs – Let’s take it a level deeper. Whether you’re using the CLI or an NMS to review this information, syslogs can provide much insight as to what is happening in a network device based on the level of monitoring you’re doing. The trick is being able to interpret the data and debug messages spewed out to verify one’s assumptions and follow the path to a logical explanation.
Packet Captures – The truth is on the wire. A packet capture can be the defining piece of evidence, the smoking gun as they say. By spanning or mirroring a port on an interface, to a packet capture device, and then reproducing the problem, the result is a complete history of what traversed the wire at that point of interception. This information can then be filtered with software such as Wireshark. Becoming proficient with packet captures and Wireshark is a great skill to have.
Sometimes though the issue is with the network and when it is, it’s great to be able to quickly point out the gremlin and present a resolution. Even if it’s not quick, having a trail of breadcrumbs to follow is invaluable in figuring out the root cause of an issue.