This guest post was authored by Cisco Designated VIP David Peñaloza Seijas.
One of the main features used in Cisco SD-WAN is Application Aware Routing (AAR). It is often advertised as an intelligent mechanism that automatically changes the routing path of applications, thanks to its active monitoring of WAN circuits to detect anomalies and brownout conditions.
Customers and engineers alike love to wield the power to steer the application traffic away from unhealthy circuits and broken paths. However, many may overlook the complex processes that work in the background to provide such a flexible instrument.
In this blog, we will discuss the nuts and bolts that make the promises of AAR a reality and the conditions that must be met for it to work effectively.
Setting the stage
To understand what AAR can and cannot do, it’s important to understand how it works and the underlying mechanisms running in unison to deliver its promises.
To begin, let’s first define what AAR entails and its accomplices:
Application Aware Routing (AAR) allows the solution to recognize applications and/or traffic flows and set preferred paths throughout the network to serve them appropriately according to their application requirements. AAR relies on Bidirectional Forwarding Detection (BFD) probes to track data path characteristics and liveliness so that data plane tunnels between Cisco SD-WAN edge devices can be established, monitored, and their statistics logged. It uses the collected information to determine the optimal paths through which data plane traffic is sent inside IPsec tunnels. These characteristics encompass packet loss, latency, and jitter.
The information above describes the relationship between AAR and BFD, but it’s crucial to note that they are separate mechanisms. AAR relies on the BFD daemon by polling its results to determine the preferred path configured, based on the results of the BFD probes sent through each data plane tunnel.
It is a logical next step to explain how BFD works in SD-WAN as described in the Cisco SD-WAN Design Guide:
On Cisco WAN Edge routers, BFD is automatically started between peers and cannot be disabled. It runs between all WAN Edge routers in the topology encapsulated in the IPsec tunnels and across all transports. BFD operates in echo mode, which means when BFD packets are sent by a WAN Edge router, the receiving WAN Edge router returns them without processing them. Its purpose is to detect path liveliness and it can also perform quality measurements for application aware routing, like loss, latency, and jitter. BFD is used to detect both black-out and brown-out scenarios.
Searching for ‘the why’
Understanding the mechanism behind AAR is essential to comprehend its creation and purpose. Why are these measurements taken, and what do we hope to achieve from them? As Uncle Ben once said to Spider-Man, “With great power comes great responsibility.”
Abstraction power and transport independence require significant control and management. Every tunnel built requires a reliable underlay, making your overlay only as good as the underlay it uses.
Service Level Agreements (SLAs) are crucial for ensuring your underlay stays healthy and peachy, and your contracted services (circuits) are performing as expected. While SLAs are a legal agreement, they may not always be effective in ensuring providers fulfill their part of the bargain. In the end, it boils down to what you can demonstrate to ensure that providers keep their i’s dotted and their t’s crossed.
In SD-WAN, you can configure SLAs within the AAR policies to match your application’s requirements or your providers’ agreements.
Remember the averaged calculations I mentioned before? They will be compared against configured thresholds (SLAs) in the AAR policy. Anything not satisfying those SLAs will be flagged, logged, and won’t be used for AAR path selections.
Measure, measure, measure!
Having covered the what, who, and the often-overlooked why, it’s time to turn our attention to the how! ?
As noted previously, BFD measures link liveliness and quality. In other words, collecting, registering, and logging the resulting data. Once logged, the next step is to normalize and compare the data by subsequently averaging the measurements.
Now, how does SD-WAN calculate these average values? By default, quality measurements are collected and represented in buckets. Those buckets are then averaged over time. The default values consist of 6 buckets, also called poll intervals, with each bucket being 10 minutes long, and each hello sent at 1000 msec intervals.
Putting it all together (by default):
- 6 buckets
- Each bucket is 10 minutes long
- One hello per second, or 1000 msec intervals
- 600 hellos are sent per bucket
- The average calculation is based on all buckets
Finding the sweet spot
It’s important to remember that these calculations are meant to be compared against the configured SLAs. As the result is a moving average, voltage drops or outages may not be considered by AAR immediately (but they might already be flagged by BFD). It takes around 3 poll intervals to motivate the removal of a certain transport locator (TLOC) from the AAR calculation, when using default values.
Can these values be tweaked for faster AAR decision making? Yes, but it will be a trade-off between stability and responsiveness. Modifying the buckets, multipliers (numbers of BFD hello packets), and frequency may be too aggressive for some circuits to meet their SLAs.
Let’s recall that these calculations are meant to be compared against SLAs configured.
Phew, who would have thought that magic can be so mathematically pleasing? ?
Closing comments
AAR is a complex yet marvelous tool to have when well understood. By knowing and understanding your tools’ capabilities, you can define your own potential. Make sure you wield the power of SD-WAN in a way that makes Uncle Ben proud! ?
This blog has focused only on the inner workings of AAR’s features, leaving out interactions with other mechanisms and design considerations. Be sure to stay tuned for the next post. Thank you for reading!
David Peñaloza Seijas is a Principal Engineer at Verizon. He currently holds multiple Cisco certifications and is currently en route to earning his CCDE certification. David is an avid participant in the Cisco Learning Network community, a Cisco Designated VIP and Cisco Champion, and is often spotted sporting a cape at Cisco Live.
Follow David on Twitter @davidsamuelps.
Join the Cisco Learning Network today for free.
Follow Cisco Learning & Certifications
Twitter | Facebook | LinkedIn | Instagram | YouTube
Use #CiscoCert to join the conversation.
Cool job David 🙂