When playing in the high speed switching game -- timing is everything. Timing ‘sets the pace’ for visibility to established the ‘where and when,’ correlation across a broad computing environment plus compliance and digital forensics with precision time stamps. Every element of the data center requires accurate timing at a level that leaves no room for error.
Speed is the other, more celebrated, if not obvious requirement, for the high speed switching game. Speed that is measured in increments requiring some new additions to my vocabulary.
When looking at the ways in which we measure speed and regulate time throughout the network, I was of course familiar with NTP or Network Time Protocol. NTP provides millisecond timing…which, crazy enough…is WAY TOO SLOW for this high speed market. Now being from the South, I may blink a little slower than other people but I read that the average time it takes to blink an eye…is 300 to 400 milliseconds! A millisecond is a thousandth of a second. That is considered slow?
Turns out ‘micro-second’ level detail is our next consideration. A microsecond is equal to one millionth (10−6 or 1/1,000,000) of a second. One microsecond is to one second as one second is to 11.54 days. To keep our blinking example alive: 350,000 microseconds. Still too slow.
Next unit of measure? The Nanosecond. A nanosecond is one billionth of a second. One nanosecond is to one second as one second is to 31.7 years. Time to blink is just silly at this point.
At one point in time I used to think higher speeds were attainable with higher degrees of bandwidth. This may be why the idea of ‘low latency’ seems so counter-intuitive. As you hopefully understand at this point, there are limitations to how fast data can move and that real gains in this area can only be achieved through gains in efficiency -- in other words, the elimination (as much as possible) of latency.
For ethernet, speed really is about latency. Ethernet switch latency is defined as the time it takes for a switch to forward a packet from its ingress port to its egress port. The lower the latency, the faster the device can transmit packets to its final destination. Also important within this ‘need for speed’ is avoiding packet loss. The magic is in within the balancing act: speed and accuracy that challenge our understanding of traditional physics.
Cisco’s latest entrant to the world of high speed trading brings us the Nexus 3548. A slim 48 port line rate switch with latency as low as 190 nanoseconds. It includes a Warp switch port analyzer (SPAN) feature that facilitates the efficient delivery of stock market data to financial trading servers in as littles as 50 nanoseconds and multiple other tweaks we uncover in this 1 hour deep dive into the fastest switch on the market. The first new member of the 2nd generation Nexus 3000 family. (We featured the first generation Nexus 3000 series in April 2011)
This is a great show -- it moves fast!
- Robb & Jimmy Ray with Keys to the Show
- Berna Devrim introduces us to Cisco Algo Boost and the Nexus 3548
- Will Ochandarena gives us a hardware show and tell
- Jacob Rapp walks us through a few live simulations
- Chih-Tsung, ASIC designer walks us through the custom silicon
Since we started shipping theNexus 3548with AlgoBoostto our customers in the beginning of November, there has been more and more interest in testing and verifying the switch’s latency in different traffic scenarios. What we have found so far is while network engineers might be well experienced in testing the throughput capabilities of a switch, verifying the latency can be challenging, especially when latency is measured in the tens and low hundreds of nanoseconds!
I discussed this topic briefly when doing a hands-on demo for TechWise TV a short time ago.
The goal of this post is to give an overview of the most common latency tests, how the Nexus 3548 performs in those tests, and to detail some subtleties of low latency testing for multicast traffic. This post will also address some confusion we’ve heard some vendors try to emphasize with the two source multicast tests.
The most common test case is to verify throughput and latency when sending unicast traffic. RFC 2544 provides a standard for this test case. The most stressful version of the RFC 2544 test uses 64-byte packets in a full mesh, at 100 percent line rate. Full mesh means that all ports send traffic at the configured rate to all other ports in a full mesh pattern.
Figure 1 – Full Mesh traffic pattern
The following graph shows the Nexus 3548 latency results for Layer 3 RFC 2544 full mesh unicast test, with the Nexus 3548 operating in warp mode.
Figure 2 -- Layer 3 RFC 2544 full mesh unicast test
We can see that the Nexus 3548 consistently forwards packets of all sizes under 200 nanoseconds at 50% load, and less than 240 nanoseconds at 100% load.
On November 5th I posted part 2 of the Algo Boost series with a fantastic discussion around Customer proof points on the Nexus 3548. In our third and final segment in the series I interviewed Chih-Tsung Huang, Director of Engineering in the Server, Switching, & Virtualization Product Group to shed some light on Cisco’s continued commitment to innovate with Algo Boost technology.
GD: What is the primary difference between existing Nexus 3000 switches and the new Nexus 3548? And how do we differentiate from the competition?
CH: As we all know, the current generation Nexus 3000 uses merchant silicon while the new Nexus 3548 uses a full layer 2 bridging and layer 3 routing Cisco ASIC – designed and built from ground up to optimize switch latency. Prior to the Nexus 3548 announcement, industry best was greater than 500 nanoseconds.
One of the stated elements of our corporate culture is “No Technology Religion”. The underlying concept is that we have the freedom to choose the solution that allows us to best meet our customer’s needs and not get locked into ideological silos.
Cisco continues to invest and drive innovations and standardization efforts with the development of our own ASICs because this allows us to deliver a complete value add solution to our customers. However, we do take advantage of merchant silicon in specific use cases where features and innovation are not needed.
GD: Does the introduction of Algo Boost indicate a complete shift away from merchant silicon?
CH: Absolutely not. Cisco has and will continue to adopt a flexible silicon strategy, meaning we will buy off-the-shelf ASICs when they can immediately fill a market need, and we continue to add value through silicon innovation by designing our own ASICs. The Nexus 3548 is an example of a highly integrated Software, Hardware and ASIC solution that cannot be achieved with off the shelf components.
GD: It sounds like we are very much committed to developing our own ASICs. How many ASICS are used in Cisco Solutions today, and how much do we invest in R&D?
CH: Cisco has developed hundreds of ASICs to perform various forwarding functions in switches and routers. Cisco has developed over 20 ASICs to power the Nexus portfolio alone. We have an annual R&D budget of $5.8 billion which is greater than Juniper’s entire revenues and roughly equal to the R&D budgets of HP and Huawei combined.
GD: Algo Boost clearly addresses needs in the financial sector. Are there any other segments that will benefit from these groundbreaking features?
CH: Since mid-2011, the Nexus 3000 family has had a significant presence in massively scalable data centers. We believe these environments will see further benefits with the performance visibility tools we’re building into our portfolio, as well as the programmability and automation features in the Cisco ONE offering.
We also believe that there is an important role for custom silicon in the software-defined networking world. We feel that customers will continue to be willing to pay for advanced hardware innovation because of the value they derive from tightly integrating advanced software and hardware engineering. Customers derive the greatest value from emerging software approaches, such as SDN, when they effectively leverage the underlying infrastructure which Cisco silicon innovation enables them to do.
Additionally, the 190 nanosecond ultra low latency of the Nexus 3548 switch enables applications to innovate not only to High Performance Trading Fabrics but also into Massively Scalable Data Center, Software Defined Network, and beyond.
I’d like to thank Chih-Tsung for this valuable information. To see an actual Algo Boost powered ASIC, view the TechWiseTV segment below..
Alas… the much anticipated FCS (First Customer Shipment) of the earth shattering Cisco 3548 Switch is now available. When your business depends on nanoseconds, this switch enables unprecedented advantages in lowest latency with a full feature set.
Customers should also bear in mind that in conjunction to announcing shipping units, our ecosystem partnership is more robust than ever. Our Nexus Engineering Team released a whitepaper detailing how the partnership validated end-to-end latency with real time NASDAQ market data from Universal E-Business Solutions, leveraged precise measurement tools from TS-Associates, Feed Handler processing from Enyx FPGA, and Network Test Access Points from Datacom Systems.
Read the whitepaper below in its entirety to see how Cisco continues to work with best of breed infrastructure vendors to deliver a complete solution in High Performance Trading.
On October 5th I posted part 1 of the Algo Boost series with a fantastic discussion around the latency innovations on the Nexus 3548. Today, we announced that these units are now shipping to customersand the much anticipated wait is over to get this game changing technology! This is perfect timing as I introduce part 2 of the series with Errol Roberts, Distinguished System Engineer for the top Data Center accounts, to bring a customer perspective to the ultra-low latency Nexus 3548 in a High Performance Trading fabric.
[GD] I know that you spend a lot of your time talking with customers. What are our Financial Services customers telling you about their environments and requirements?
[ER] When meeting with these customers, I like to ask a single question -- “What value can an infrastructure company provide to high-performance trading workloads”. Key points relating to the switching are captured by the following:
First, customers ask for a network solution and architecture that provides them with the fastest end-to-end functionality. Providing the “Lowest latency possible” is one vector, another vector being a rich “feature-set” answering the different architecture and network requirements end-to-end. Naturally there is a need for speed while at the same time providing the features within the same device. For example in collocated High Frequency Trading environments; the lowest latency being key; it’s not the only factor; support for the routing protocol such as BGP, multicast with PIM Sparse mode, ultra low latency SPAN at linerate with multiple ports; this is achieved with the technology called Warp SPAN.
Next, “Handle microbursts”. Volatility is correlated. When you are running cross-asset class, cross-liquidity venue strategies, there is often short-lived congestion that increases latency. These volatile periods are often the most opportunity rich.
Also, “Unique features”. They want features like Network Address Translation to meet their business needs. You don’t want these features to add latency. In fact you don’t want to have any of the L4 or services applied on the network to add latency.
Next, “Flexibility and Programmability”. They want to control their traffic flow, mirror relevant traffic, have fine-grained flexibility and also have reactivity on events. Python scripting language is a good example of automation. With Python script, you can have the switch react on different environmental changes such as a sanity check when the device comes online as well as for example triggering emails when the burst happening at the buffer level exceed for example 10 nanoseconds.
In addition, facilitate “Precision Time”. You cannot control what you cannot measure. Without precision time, you invest in an infrastructure and just hope you get optimal performance. With precision time protocol you can keep all of your servers and network elements highly synchronized at the nanosecond level. You can even measure the accuracy of the tool through a 1 pulse per second output port. Also, the Nexus 3548 can timestamp traffic with IEEE 1588, which allows analyzers to replay events.