For many years I worked in various roles within Cisco’s Enterprise Solutions Engineering (ESE) and Service Provider Solutions Engineering (SPSE) groups. Our job was to take all the different pieces and parts of Cisco equipment (and partner offerings) and mix them together into best practices. Like a chef making a multi-course meal or a great stew, it required all sorts of ingredients to get it just right. Some ingredients were highly visible (Routing, QoS, Security), while others were fringe elements but extremely valuable (GRE Tunnels, ERSPAN, etc.). At the end of the day, we were extremely lucky to have such a broad mix of features and tools to choose from to create our best practices.
Often times we’d present these solutions to customers or partners and they would question why so many features or tools were included. They’d question the cost of this or the need to learn that. These were all fair questions as the job of managing Data Center (or Network-Wide) resources is challenging enough without adding more layers or stuff to learn. We’d do the best we could to explain scenarios where this feature or that tool would come in handy. Sometimes we were successful, other times we’d get the, “whatever vendor guy, I’ve heard that story before”.
I had sort of forgotten about some of those challenges as I began working on new projects. But over the last several months, the discussion seems to have bubbled to the top of pot again. Customers are starting to deploy these converged solutions and have run into scenarios where the need for a feature or a tool seems important again. Sometimes the realization is centered on the idea that you can deploy something faster because you have an integrated tool (note: this type of free “tool” comes from several vendors). Sometimes the realization happens when a problem occurs and two groups have to come together to resolve the issue, except now that issue “virtually” crosses over the boundaries that previously existed (eg. switching within an ESX host). A good discussion of this took place recently on the Virtumania podcast (see Episode 14).
I’m often amazed at how difficult it is to get people to take the free tools, but I’d highly recommend grabbing them and at least putting them in your toolbox as they might come in handy someday and you’ll already have them nearby. The non-free tools/features are a different story. Those typically require a justification. So let’s take a simple example, the Nexus 1000v. List price is $695 per CPU. Let’s say you have a 4 CPU server, so the cost is $2780 (or let’s call it $1700 with conservative discounts). How do we justify $1700 vs. the vSwitch that comes bundled ($0) from VMware in ESX? First thing that happens is the ESX admin goes out and buys a ESX book to learn about managing the virtual networks ($40). They deploy the vSwitch and things are fine for a while. Until that first issue where communications between VMs on the same ESX host are having problems. Things are slow, or data isn’t updating properly. The ESX admin starts troubleshooting. After 1-2 hours, they call the people on the network team for some assistance. Assuming the network team isn’t upset that they don’t have visibility between the VMs, they attempt to help. The network people ask for details on past trends, as they normally would get from their network management tools. Don’t have that? Uh oh!! After reconfiguring the vSwitch to allow a Network Sniffer to see traffic, the issue eventually gets resolved. Let’s say that took another 1-2hrs, including the time to put the vSwitch config back. So now we’re at 4hrs and consumed the resources of 2-4 people. Let’s say those people cost $50/hr (probably low), so now we’re talking about 8-16 manhours, or $400-800 worth of troubleshooting time. At this point we’re at half the cost of the Nexus 1000v for one incident (ignoring the value of all lost user productivity), and the network team is still slightly upset that they don’t have “network-level visibility” in the ESX hosts. Assuming this happens once a month, and you have an ROI of 2-3 months for the 1000v, not to mention peaceful co-existance between the Server/Virtualization team and the Network team.
That was a simple (but common) scenario, but let’s take something more complicated. I’ve recently been working with a customer that wants to deploy multiple sets of Active/Active data center pairs around the world. As I started exploring the Enhanced Business Continuous technologies required to do this (OTV, Long Distance vMotion, Load-Balancers, Storage Caching or Storage vMotion, etc.), it once again became an exercise in building a solution that spanned many technologies. But as I started to dig into just one part of it, Cisco OTV, I began to realize that it leveraged several existing technologies that we’ve used at Cisco for many years (Multicast, GRE, etc.). Once again, features that don’t garner alot of headlines but are very important in making these next-generation Data Center use cases possible. It’s good to have them in the toolbox (or in this case, embedded in the NX-OS).
It’s not easy being a Data Center administrator (or IT professional) these days because the pace of change is ever increasing and the lines between functional areas is blurring very quickly. It’s a lot to keep up with, let alone master. And it doesn’t matter if your trying to do this internally (eg. Private Cloud) or externally (eg. Public Cloud), somebody still needs to make it work.
We all strive to use the KISS principle, but sometimes it helps to have some additional tools in the toolbox to help offer creative solutions or solve challenging problems. The tools are quickly adapting to the ever-changing environment, and sometimes those “old” features continue to be valuable. My suggestion is to use them as needed to help you deliver create solutions to your end-users. And I won’t take offense if you tell this old vendor guy that you’ve heard the story before.