Power claims for networking equipment are often difficult to understand. It’s not always clear whether chip, card, or system power is being stated and what conditions were used for the evaluation. If we’re going to pursue more energy-efficient routers and switches, we need to understand what that really means.
Let’s take a step back and consider a better way of keeping score. In my opinion – and hopefully soon yours too – these are the day-to-day system power consumption and the provisioned power (number and capacity of PSUs to support redundancy and worst-case conditions). This post will focus on exploring the components of system-level power consumption.
Let’s start by defining a unit for the metric. I propose “watts per 100G without optics” based on system power consumption. It represents a common unit of bandwidth and (for high-end routers) currently yields numbers in the 1-50W range that most of us are comfortable with. Optics should be subtracted because they’re highly variable based on module type (e.g., 400G ZR+ is twice the power of FR4). As optics may impact cooling requirements, a more specific definition of “Watts per 100G at typical temperature with 2km optics minus optics module power” is a good benchmark. Variations on this, such as ZR+ optics or fewer cards or ports, may also be defined for operator-specific comparisons.
System power consumption is the key metric to focus on. Chip power is an interesting way to compare devices, but it’s not relevant when it comes to paying the power bill or assessing environmental impact. Let’s look at what makes up system power by reviewing the following key components:
- Electrical data path
- Mechanical design
- Control plane
- Losses in voltage conversion and signaling
Electrical data path
The key components of the electrical data path power comprise PHYs, retimers, and forwarding ASICs. PHYs are small devices used for functions such as gearboxes which combine Serializer/Deserializers (SerDes, used for fast I/O through PCBs) or split them into different speeds. A common gearbox operation is to convert 2x 56G SerDes from the ASIC to 4x 28G SerDes connected to a QSFP28 100 GbE port. Retimers are used to extend electrical signals. Retimers are commonly used for long traces such as connecting optics to ASICs in the “back row” of a line card.
Power-wise, ASICs are the stars of the show, so it’s understandable that they get most of the attention. They’re the most complex components and are the key driver of system power for large, modular systems like the Cisco 8800. ASIC power can be broken into three main categories: SerDes, idle core power, and dynamic core power. In high-bandwidth chips like the Cisco Silicon One Q200, these are roughly equal. I’ll be focusing on this class of chips in this blog. It includes devices such as Broadcom’s Jericho 2 (in NCS 5500) and Trident/Tomahawk (NCS 5000). Edge-class chips such as the Lighspeed Plus used in Cisco’s ASR 9000 Series have similar characteristics but consume more power in complex per-packet operations (e.g., more counters) and less in SerDes due to lower bandwidth.
Each of these categories has different characteristics of power consumption. SerDes require constant power independent of traffic load, but they can be powered down to save power. On the 8000 Series, this occurs when ports are shut down or when a fabric or line card slot is empty. In the latter case, the corresponding SerDes on the connecting card are shut down (e.g., some Fabric Card (FC) SerDes are disabled when Line Card (LC) slots are empty).
ASIC idle power is exactly what it sounds like. It’s the power required to keep the chip’s internal blocks up and running with no traffic load. Dynamic power occurs when packets are forwarded. Some dynamic power is consumed for per-packet operations (pps load) and some is needed for moving the bandwidth along the forwarding path (bps load).
Core (non-SerDes) ASIC power will vary greatly among devices as networking ASICs serve many roles with a variety of requirements. Core ASIC power may vary based on several factors:
- Function – DC vs. SP Edge vs. SP Core perform different levels of per-packet operations
- Network vs. fabric capacity – ASIC interfaces that only can connect to a switch fabric are much simpler and lower power
- Process node (7nm is the current generation, 16nm and 28nm were the previous TSMC nodes)
- Design – Newer microarchitectures tend to have better efficiency than chips carrying legacy constraints. The skill of the design engineers is also significant and shows up in the power bottom line.
- Device temperature – A hot ASIC has marginally higher power.
- Load – The number of transistors flipping at a given time (mainly a function of pps and bandwidth, but also of the design’s ability to gate unused logic)
Most legacy SP ASIC designs were optimized for large modular systems with switch fabrics. This approach resulted in some of the ASIC I/O (fabric requires speedup so usually a bit more than 50% to account for overhead) being hard coded for fabric operations. This greatly simplifies the design relative to chips that can dedicate their full bandwidth to the network. The logic for fabric lookups is dramatically simpler than network operations and, thus, requires less silicon real estate and power.
The downside of fabric-bound I/O shows up when used in single-chip systems. The fabric bandwidth is lost and cannot be redirected to the network. This can be offset by using two chips back with a point-to-point “fabric”, but that’s still less efficient than a single chip that can connect all its bandwidth to the network. When two ASICs are required, the chip power is doubled. If more chips and a switch fabric are required for small (2-3 RU) systems, the power goes even higher.
Optics are the divas of router hardware. They consume significant power, provide a small footprint for heat sinks, and their max temperature is 50C cooler than ASICs. The thermal management of optical modules was one of the most difficult challenges for the initial mechanical design of 400G-class systems – and then ZR+ modules doubled the requirement from 11W (DR4/FR4) to 22W (ZR+).
Optical module power is relatively consistent for each reach (e.g., LR8 or FR4) among vendors. The system power to cool the modules varies. For example, it may be more difficult to cool a QSFP-DD module in a traditional chassis with vertical airflow (i.e., each module preheats air for the one above it) than a newer orthogonal chassis (direct cold air to all modules). The QSFP-DD and OSFP designs also impact power required for cooling. Both can support high power optics, but the QSFP-DD design allows for ongoing heat sink innovation customized to the chassis. This results in less airflow needed to cool the modules and thus lower system power than OSFP which is locked in a standardized heat sink design that limited innovation.
Routers are air-cooled; fans push or pull air through the chassis. The airflow path is important as each component heats the air for downstream devices within the chassis. Fan power is highly variable among systems and different operating conditions. For many systems, the highest fan power may be 10 times what is considered typical. This great range is due to the nature of fan power consumption. It is roughly cubic with the air moved. When fan speeds get into the 80-100% range, their power goes up dramatically so care should be taken to minimize the need for high-speed fan operation by only setting the fans to the maximum speed when absolutely required.
Cooling isn’t the only aspect of mechanical design. Other decisions include the card height, number and size of fans, board layouts, heat sinks, and more. For example, small differences in line card height can impact the size of heat sinks and how constricted the airflow is. These both have an impact on chip temperatures and fan power.
The control plane includes CPUs, DRAMs, SSDs, internal ethernet switches, and small components such as consoles and management ethernet. CPU power is relatively low compared to ASICs. The other control plane components don’t consume significant power.
Losses in voltage conversion and signaling
The visible components aren’t the only consumers of power. Five to 10% of total power is lost when the power supplies convert the facility voltage (e.g., 48V DC or 220V AC) to the system voltage (e.g., 12V or 54V DC). Further power is lost as the system voltage is converted to the levels used by the electronics (e.g., 3.3V for optics). Power is also lost to the resistance of the PCBs as signals move from device to device.
In the past, these factors could be ignored by users, but as power consumption becomes more critical, they shouldn’t be any longer. The 8000 Series includes many “behind the scenes” optimizations in these areas such as new power supplies and more efficient internal power distribution.
Metrics for the Cisco 8000 Series
At the beginning of this paper, the metric of watts per 100G without optics was proposed. Let’s conclude by looking at those values for some specific systems.
Much is made of chip-level power efficiency, but that isn’t the ultimate goal – merely a means to an end. Operators shouldn’t care if power is used to flip transistors or spin fans, they just want the system to be efficient. Evaluating systems should include benchmarks that can be standardized as well as network-specific configurations which may include multiple generations of hardware, a partially filled chassis, or different traffic loads.
In summary, the key metric for power consumption is system power in your typical conditions. When router shopping, ask vendors for system-level data that meets your real requirements and for details on how they were tested or calculated. Make sure to also look closely at timelines for qualification and ramp to deployment so that you evaluate systems available in the same generation and don’t compare deployable hardware based on press releases alone.
At Cisco, we believe an inclusive future for all must take everyone into account, including future generations. That’s why we’re setting long-term, actionable goals to address the environmental impacts of our products and operations. As part of this effort, we’re focused on increasing energy efficiency and reducing emissions. Our sustainability initiatives include more than 440 energy efficiency projects in the past five years, and there are plenty more to come. We encourage you to spend a few minutes reviewing our journey toward a sustainable, inclusive future to understand what this means for you and your organization.