Cisco UCS 480 ML M5 Server - Performance and Capacity for AI

KD’s blog is a great overview of how AI has developed from the mid-50s to how it’s defining a new era of IT. I’m here to dig deeper into the hardware our customers are going to use to mine all that value out of their data.

When Cisco announced our M5 servers, my favorite was the C480 M5 because of its modular design. It supports six GPUs, but the support is based on the PCIe bus. When you are trying to optimize a system design for AI/ML/DL (artificial intelligence, machine learning, and deep learning), the PCIe bus can actually be an inhibitor. So our incredible engineers took a great modular design and transformed it specifically for deep learning.

No-compromise Balance of Performance and Capacity

The C480 ML M5 rack server, developed in partnership with NVIDIA, a leader in AI computing, supports eight NVIDIA Tesla V100 Tensor Core GPUs with NVIDIA NVLink interconnect. The V100 is the world’s first GPU to break the 100 teraflops barrier of deep learning performance with a whopping 640 Tensor Cores. NVLink provides 10x the bandwidth of PCIe and connects all of the GPUs in a point-to-point network (hybrid cube mesh) that provides optimal performance for these super-fast GPUs.

To power and cool this monster, our engineers take advantage of the modular chassis airflow and height differential heatsinks. In the photo, you can see the 24 SFF drive bays at the bottom, then the CPU tray in the middle, and at the top, a blank tray.

Four 92 mm fans pull in cool air past the hard drive bays and CPU/blank trays and cool the first four GPUs directly. How then to cool the 2^nd row of GPUs placed directly behind the 1^st row? If you look at the render, you’ll see the four rear GPU heatsinks are taller than the first four. This allows the cool air from the blank tray area to flow unencumbered and unheated directly on to the rear most heatsinks providing plenty of cooling.

Another key feature of the C480 ML M5 is the amount of storage supported. Those 24 SFF drive bays support over 182 TB of SSD storage. This means you have room for many different data sets without having to rely on external storage which reduces the overall cost of the solution. If you need even faster storage than a SSD can provide, six of the bays support NVMe.

What else does the C480 ML M5 offer?

Dual Intel Xeon Scalable processors up to 28 cores
Up to 3 TB of memory using 128 GB 2666 MHz DDR4 DIMMS
4 PCI Express (PCIe) 3.0 slots for 100G Cisco UCS VIC 1495 or other adapters.
Modular internal Flex storage option: M.2 SATA
Cisco Integrated Management Controller (IMC)
Two 10Base-T Gbps LOM Ethernet ports

The datacenter is where the data is

Any vendor can offer you a server with GPUs. What makes Cisco different is only Cisco offers you a system. What do I mean by a system? At its core, UCS is a fabric-centric architecture with a centralized management model. All of Cisco’s servers: B-Series blades, C-Series rack, S-Series dense storage servers, and HyperFlex are manageable with a single tool, Cisco Intersight. Intersight is cloud-based system management platform augmented by analytics and machine learning. It enables organizations to achieve a higher level of automation, simplicity, and operational efficiency. It provides a holistic and unified approach to managing distributed computing environments regardless of the server form factor, workload, or location. Only Cisco can deliver AI/ML/DL solutions delivered as part of an integrated system that supports processing data whether it is at the edge or in the datacenter regardless of which server in the portfolio is the best match to solve your problem. Intersight extends the functionality of UCS Manager. All UCS servers can use a VIC with a Fabric Interconnect and UCS Manager. UCS Manager provides unified, embedded management of all servers via service profiles. Service profiles allow you to config and manage your hardware.

Do you have to use Cisco Intersight or UCS Manager? Of course not. Like all C-Series rack servers, the C480 ML M5 has a CIMC. You can manage the server directly through the CIMC, through its API, or industry-standard management protocols including Redfish and SNMP to name a few.

Wrapping up

There are many use cases that can benefit from have GPU enabled servers: fraud detection, medical research, experience personalization, and targeted marketing just to name a few. Regardless of industry, you won’t do better than the C480 ML M5 with the NVIDIA Tesla V100 Tensor Core GPUs. The system provides maximum performance that’s easy to consume as part of the UCS platform with the industry’s only uniform, cloud-powered, automated operations model.

AI models that would consume weeks of computing resources can now be trained in a few hours. With this dramatic reduction in training time, a whole new world of problems will now be solvable with AI and we’re arming IT with the scalable solution for deep learning at enterprise scale.

If you want to learn more about the our new platform for artificial intelligence and machine learning, contact your Cisco sales team or Cisco partner to discuss your AI/ML/DL workloads and how the C480 ML M5 server can meet your needs.

You can also find more information on the C480 ML M5 at www.cisco.com/go/ai-compute.

For more information on the entire Cisco UCS portfolio, go to http://www.cisco.com/go/ucs.

Robb Boyd says:

September 10, 2018 at 8:05 am

Great write up Bill. Nice partnership with the NVIDIA folks.
Derek M. Wilhite says:

September 10, 2018 at 6:10 pm

Great product with tremendous capabilities to improve efficiency and generate quanitifiable results. Great work!

Comments are closed.

Data Center

Cisco UCS 480 ML M5 Server – Performance and Capacity for AI

Authors

Bill Shields

Senior Marketing Manager

Product and Solutions Marketing Team

2 Comments