Most days of the week, you can expect to see AI- and/or sustainability-related headlines in every major technology outlet. But finding a solution that is future ready with capacity, scale and flexibility needed for generative AI requirements and with sustainability in mind, well that’s scarce.
Cisco is evaluating the intersection of just that – sustainability and technology – to create a more sustainable AI infrastructure that addresses the implications of what generative AI will do to the amount of compute needed in our future world. Expanding on the challenges and opportunities in today’s AI/ML data center infrastructure, advancements in this area can be at odds with goals related to energy consumption and greenhouse gas (GHG) emissions.
Addressing this challenge entails an examination of multiple factors, including performance, power, cooling, space, and the impact on network infrastructure. There’s a lot to consider. The following list lays out some important issues and opportunities related to AI data center environments designed with sustainability in mind:
- Performance Challenges: The use of Graphics Processing Units (GPUs) is essential for AI/ML training and inference, but it can pose challenges for data center IT infrastructure from power and cooling perspectives. As AI workloads require increasingly powerful GPUs, data centers often struggle to keep up with the demand for high-performance computing resources. Data center managers and developers, therefore, benefit from strategic deployment of GPUs to optimize their use and energy efficiency.
- Power Constraints: AI/ML infrastructure is constrained primarily by compute and memory limits. The network plays a crucial role in connecting multiple processing elements, often sharding compute functions across various nodes. This places significant demands on power capacity and efficiency. Meeting stringent latency and throughput requirements while minimizing energy consumption is a complex task requiring innovative solutions.
- Cooling Dilemma: Cooling is another critical aspect of managing energy consumption in AI/ML implementations. Traditional air-cooling methods can be inadequate in AI/ML data center deployments, and they can also be environmentally burdensome. Liquid cooling solutions offer a more efficient alternative, but they require careful integration into data center infrastructure. Liquid cooling reduces energy consumption as compared to the amount of energy required using forced air cooling of data centers.
- Space Efficiency: As the demand for AI/ML compute resources continues to grow, there is a need for data center infrastructure that is both high-density and compact in its form factor. Designing with these considerations in mind can improve efficient space utilization and high throughput. Deploying infrastructure that maximizes cross-sectional link utilization across both compute and networking components is a particularly important consideration.
- Investment Trends: Looking at broader industry trends, research from IDC predicts substantial growth in spending on AI software, hardware, and services. The projection indicates that this spending will reach $300 billion in 2026, a considerable increase from a projected $154 billion for the current year. This surge in AI investments has direct implications for data center operations, particularly in terms of accommodating the increased computational demands and aligning with ESG goals.
- Network Implications: Ethernet is currently the dominant underpinning for AI for the majority of use cases that require cost economics, scale and ease of support. According to the Dell’Oro Group, by 2027, as much as 20% of all data center switch ports will be allocated to AI servers. This highlights the growing significance of AI workloads in data center networking. Furthermore, the challenge of integrating small form factor GPUs into data center infrastructure is a noteworthy concern from both a power and cooling perspective. It may require substantial modifications, such as the adoption of liquid cooling solutions and adjustments to power capacity.
- Adopter Strategies: Early adopters of next-gen AI technologies have recognized that accommodating high-density AI workloads often necessitates the use of multisite or micro data centers. These smaller-scale data centers are designed to handle the intensive computational demands of AI applications. However, this approach places additional pressure on the network infrastructure, which must be high-performing and resilient to support the distributed nature of these data center deployments.
As a leader in designing and supplying the infrastructure for internet connectivity that carries the world’s internet traffic, Cisco is focused on accelerating the growth of AI and ML in data centers with efficient energy consumption, cooling, performance, and space efficiency in mind.
These challenges are intertwined with the growing investments in AI technologies and the implications for data center operations. Addressing sustainability goals while delivering the necessary computational capabilities for AI workloads requires innovative solutions, such as liquid cooling, and a strategic approach to network infrastructure.
The new Cisco AI Readiness Index shows that 97% of companies say the urgency to deploy AI-powered technologies has increased. To address the near-term demands, innovative solutions must address key themes — density, power, cooling, networking, compute, and acceleration/offload challenges. Please visit our website to learn more about Cisco Data Center Networking Solutions.
We want to start a conversation with you about the development of resilient and more sustainable AI-centric data center environments – wherever you are on your sustainability journey. What are your biggest concerns and challenges for readiness to improve sustainability for AI data center solutions?