NetApp’s newest storage operating system, clustered Data ONTAP (cDOT), leverages a backend of Cisco Nexus switches for it’s cluster interconnect network.
When configuring the switch/cluster ports for use with cDOT, the best practice is to turn flow control off as per TR-4182. In fact, that happens to be the recommendation for normal data ports as well. Why is that? Before we get into that, let’s cover the basics…
What is flow control?
Flow control is a mechanism used to help manage the rate of data transfer between two devices. This is done to help prevent a source evice from overwhelming a destination device by sending more packets than the destination can handle. These scenarios can occur if a source device is faster than the destination device (CPU, RAM, NIC, etc). This can also happen if the source is intentionally trying to flood the destination via a malicious Denial of Service (DoS) attack.
Flow control can be enacted for send or receive packets, or both. It can be hardware or software based. It can occur at multiple layers of the OSI model
For a real world analogy to flow control, think of how dams work. A dam will be installed to control the flow of water on a river, usually to create lakes or reservoirs. Dams can be used to adjust the water flow to prevent flooding, depending on rainfall. Network flow control does pretty much the same thing – it prevents data floods.
Data link flow control
Data link flow control is one common type of flow control. The main two data link layer (layer 2) types are:
Stop and Wait
In this type of data link flow control, when flow control kicks in, the destination device will not ACK a packet until it’s ready to do so. It’s simple, but in its simplicity, it is also poorly performing, as the source must get ACK’d to send the next packet.
In this type of data link flow control, the destination device will send adjusted window sizes* (and always ACK) to the source device to advertise what size packet the source should be sending. The size of the window will depend on how full the window size already is. If the window fills to capacity, the destination will send a zero-size window to the source to inform the source that it cannot receive any more data. If the source continues to send packets to the destination after the window size is advertised as zero, the destination will handle the packets depending on how the firmware running on the device was designed.
This is considered to be much better performing than “stop and wait” as the destination will always ACK and traffic will flow based on window size.
*Send and receive window sizes can be controlled and modified via client configuration. See your specific vendor for information on what sizes are best to set for your specific application.
Ethernet flow control
There is also the notion of ethernet flow control. This is done at the transport layer (layer 4). There are a few main types of Ethernet flow control, including:
This type leverages pause frame flow control, in which an overwhelmed destination device would send a packet to the source indicating that the source should wait for a specific period of time to send the next packet. An indicator of this would be incrementing XON/XOFF counters on NICs.
This is the type of flow control referred to by NetApp in their best practices. This is not the same as congestion control.
Priority flow control (802.1Qbb) is a follow on to 802.3x and is seen in FCoE environments. It is still performing flow control, but unlike 802.3x, 802.1Qbb operates on individual priorities and will manage the requests even when 802.3x is disabled.
Cisco has some info on flow control for various switches, such as the Catalyst 6500:
Why should flow control be disabled in clustered Data ONTAP?
Three reasons for this, in my opinion…
First: Buffer limitations on some switches.
A buffer is a physical allocation of memory on a device to allow storage of data until it can be moved elsewhere. As modern computing has advanced, data has gotten BIGGER. More data = more traffic. More traffic = need for more buffer to handle that traffic. When flow control is in use, the data buffer is used to store data while the other data it received is processed.
However, switch hardware can’t always keep up with the demand for data buffers without affecting cost.
Second: More data, better hardware.
While data is getting bigger, so are the source and destination devices, as well as network pipes. Modern devices are now more capable of handling all that data and processing it fast enough to where flow control is not only unnecessary, but actually a hindrance to better performance. If you don’t believe me, this guy has a real world example
Third: Congestion control.
The general idea is to let the flow control be managed higher up the stack in the form of congestion control. This can be done by applications, and honestly, should be done by the applications as hardware flow control is not application aware
What about other vendors? Do they recommend turning flow control off, too?
The short answer? “It depends.”
The long answer? It’s different for everyone. For example, the latest documentation for ESXi (5.5) says to leave flow control enabled. I have not seen the best practice recommendation for vSphere 6, yet. The best bet is to contact your specific vendor for their recommendation.
Keep in mind that these recommendations can change based on new information, issues seen, etc. So always be sure to stay tuned for updates to best practices (including NetApp’s).
How do I disable flow control?
If you do choose to disable flow control, it makes the most sense to disable it on both endpoints. Mismatched configuration could potentially cause performance issues or other problems. Ideally, depending on vendor recommendation, the flow control setting would be the same across the board.
To disable flow control in Cisco IOS, see your specific switch version’s man pages.
To disable flow control in clustered Data ONTAP:
cluster::> net port modify -node <node that owns port> -port <port> -flowcontrol-admin none
Keep in mind that changing flow control on a port will result in a brief blip in connectivity, as the port will reset to read the new configuration. This blip can vary in time depending on firmware, load, etc. Setting flow control should only be done in a maintenance window.
Hopefully this helps clear up some confusion on flow control and best practice recommendations!