TechWiseTV covered San Francisco to St. Louis for the reality of multi-site data center design.
I know this is certainly not the only thing…but it is one of the most challenging aspects we like to address with networking. Two hurdles that make this interesting is the desire to balance geographic redundancy and workload mobility.
We had a lot of fun (as we do with most shows) mostly because of the experts we got to work with. Our primary Cisco expert was Cloud Architect Wayne Ogozaly who had written a blog series on a new Cisco Validated Design Business Continuity and Workload Mobility for Private Cloud. We chatted at length with him at Cisco Live San Francisco and then decided, that more of this story should be hands on…the best place to do that would be St. Louis. Home of many great things…but for us, home of the incredible Cisco partner, World Wide Technology (WWT).
WWT turned out to have a great group of really smart people who were ready to bend over backwards to help us.
Joe Weber was our primary guy, you can see him on the show…perhaps he had a lot of good dirt on his co-workers..or perhaps, they were just legitimaly good people. We were only there for a few days crawling around their labs and learning about their ATC (Advanced Technology Center)
But even in that short time, we kept running into big name customers who were either touring the facilities, teaming with the different WWT engineers or just taking classes of some sort…(way beyond just Cisco of course).
The ability to do geographic redundancy in a truly stateful fashion has been next to impossible. Workload mobility and our dependence on these resources push on the need for answers in this area.
Why is this difficult? Maintaining or managing state refers to keeping track of a process. The Internet is intrinsically stateless since each request for a new web page can be processed without any knowledge of previous page requests. This greatly simplifies much of the design while at the same time being one of the chief drawbacks to the HTTP protocol.
Maintaining state is not only extremely useful, it is a fundamental requirement for building a modern data center. The designs that Wayne, Joe and the numerous people they represent in this episode illustrate for us that it is now possible.
Truly stateful, business continuity and workload mobility across multi-site designs with minimal disruption to applications and users.
This show won’t cover everything in the design of course…you need to go get that yourself if it sounds interesting. The VMDC Business Continuity and Workload Mobility solution
The unique distinction tackled in this design guide centers on the availability of applications with transparency through all layers. Past designs have offered layer 2 and 3 workload mobility that fall apart when confronted with the reality of workload mobility.
Think about it, no sane operator would attempt a hot move in the middle of a business process day. You were guaranteed some data loss thanks to stateful firewalls and their timers, load balancers, analog conversion and the use of multiple vendor hypervisors found in any normal design. But this is what we need.
WWT built the reference design in their real world testing environments, and as we learned, it was even a little bit different than how Cisco built it. Common to both however was the inclusion of the complete application environment: firewalls, load balancers, tenancy, network QoS, and WAN connections to users all while spanning multi-site topologies.
Part 1 addressed the challenge of multi-site DC design based on distance and transferring critical apps back and forth. Many vendors claim it, however, they gloss over the L4-L7 services and network security that are critical to proper data flow and especially an SDN ready network.
Part 2 took us out to the WWT ATC to put the CVD to the test. This demo featured an Active-Active design with two data centers at a metro or regional distance of less than 200 km and 10 ms Round Trip Time. It’s a common design practice for Metro data centers to operate as a single virtual data center spanning a metro distance to support active-active scenarios…but how well does this work in actual practice?
Part 3 was our post mortem. What did we really learn from all of this? How have customers reacted to this new way to deploy RTO/RPO (Recovery Time and/or Point Objectives)?
Be sure and watch the full show. It is well worth your time.