Enterprises are making bold moves into AI, and Cisco AI PODs provide a powerful, pre-validated foundation for deploying AI infrastructure at scale. They bring together compute, storage, and networking in a modular design that simplifies procurement and deployment. However, deploying hardware is only the beginning. The next critical step is making this powerful infrastructure consumable as a service.
This is where Rafay complements Cisco AI PODs. Rafay’s GPU Platform as a Service (PaaS) adds the critical consumption layer, turning the hardware into a governed, self-service GPU cloud. Together, Cisco and Rafay enable organizations to operationalize AI faster by offering secure, multi-tenant access, standardized workload SKUs, and policy-driven governance.
This post explores how this joint solution transforms raw GPU power into a production-ready AI platform, enabling developer self-service while maintaining enterprise-grade control.
From Infrastructure to Consumption: The Platform Challenge
Organizations have accelerated investments in AI infrastructure, deploying platforms like Cisco AI PODs with the latest NVIDIA hardware to enable generative AI, Retrieval-Augmented Generation (RAG), and large-scale inference. As adoption grows, a new challenge emerges: how to enable multiple teams to safely and efficiently consume this shared infrastructure.
Platform teams must balance access across different groups, each with unique needs and security requirements. Without a standardized consumption layer, this leads to several problems:
- Underutilized GPUs: Industry benchmarks report average GPU utilization rates often fall below 30%. This is partly because AI workloads are “bursty” and most environments lack the mechanisms to slice and share GPU resources efficiently. When expensive GPUs sit idle, it represents a significant opportunity cost.
- Manual Provisioning: Platform teams often rely on manual configurations, ad-hoc scripts, and service tickets to manage access. These workflows slow down delivery, introduce inconsistencies, and make it difficult to enforce governance.
- Siloed Resources: Without a unified platform, GPU infrastructure often becomes siloed by team, limiting sharing and preventing a holistic view of utilization and costs. Developers and researchers must navigate complex internal processes just to run a job.
To solve this, enterprises need to operate their GPU infrastructure as a service—one that supports shared resources, multitenant isolation, and automated policy enforcement.
The Joint Solution: Cisco AI PODs + Rafay GPU PaaS
Cisco and Rafay have collaborated to deliver a modular, fully validated GPU cloud architecture. This solution combines Cisco’s best-in-class AI POD infrastructure with Rafay’s GPU Platform as a Service, transforming GPU hardware into a secure, self-service, multitenant cloud.
- Cisco AI PODs provide the compute, fabric, storage, and pre-validated design. Based on Cisco Validated Designs (CVDs), they integrate next-generation Cisco UCS platforms (like the C885A M8 Server) and the latest NVIDIA GPUs to power the entire AI lifecycle.
- Rafay GPU PaaS delivers the orchestration, policy enforcement, and developer abstraction layer. It transforms the foundational hardware into a production-grade GPU cloud that is simple to consume.
This combined architecture enables organizations to rapidly launch and operate GPU clouds with full-stack orchestration, declarative SKU provisioning, and built-in cost attribution.
Developer Self-Service Through a Curated Catalog
At the core of Rafay’s platform is the SKU Studio, a purpose-built catalog system that empowers platform teams to deliver AI-ready infrastructure and applications as reusable SKUs.
Each SKU is a modular abstraction that bundles:
- Compute Configuration: GPU/MIG profiles, CPU, memory, and storage.
- Application Stack: Pre-integrated tools like vLLM, Triton, or Jupyter Notebooks.
- Policy Controls: Time-to-Live (TTLs), RBAC, multitenancy, and quotas.
- Billing Metadata: Usage units and cost attribution.
Developers can access GPU environments instantly through a self-service portal (GUI, API, or CLI) without needing to file support tickets. For example, a data scientist can select an “H100-Inference-vLLM” SKU, which automatically provisions a specific GPU slice, deploys a secure container, and applies a 48-hour TTL. This streamlines workflows and ensures security best practices are applied consistently.
Secure Multi-Tenancy and Governance
Sharing expensive GPU resources requires strict isolation and governance. Rafay provides native, secure multi-tenancy that allows teams to safely share infrastructure without interference.
Key security controls are automatically enforced:
- Hierarchical RBAC: Defines permissions and access scope for tenants, projects, and workspaces.
- Namespace Isolation: Ensures workloads are separated at the cluster and network level.
- Resource Quotas: Prevents any single team or job from monopolizing resources.
- Centralized Audit Logs: Provides a complete audit trail of user actions for compliance.
These built-in protections allow platform teams to maintain complete oversight and control while empowering developers with the freedom they need to innovate.
Comprehensive GPU Management and Visibility
To maximize ROI, you need to know how your GPUs are being used. Rafay provides end-to-end visibility, metering, and cost attribution tailored for multitenant environments.
Platform teams can use declarative blueprints to standardize GPU operator configurations and slicing strategies (like MIG) across all clusters. Multi-tenant dashboards offer detailed insights into:
- GPU inventory and allocation
- SKU usage patterns
- Instance-level activity and user attribution
- Health status and uptime trends
A billing metrics API aggregates usage data, calculates billable compute, and generates auditable reports, enabling chargebacks and financial accountability.
Who Benefits from a Unified GPU Cloud?
This jointly validated solution is designed for a diverse range of customers who need to operationalize GPU infrastructure with security, speed, and scale.
- Enterprise IT Teams: Gain federated self-service, quota enforcement, and centralized visibility. This reduces infrastructure duplication and embeds governance into daily operations.
- Sovereign & Public Sector Organizations: Meet compliance needs in air-gapped environments with secure multitenancy, policy enforcement, and centralized audit logging.
- Cloud & Managed Service Providers: Monetize GPU infrastructure with a white-labeled, multitenant platform that includes automated tenant onboarding and built-in chargeback metering.
- Existing Cisco Customers: Extend the ROI of current UCS deployments by adding GPU orchestration as a seamless overlay with no re-architecture required.
- Greenfield AI Builders: Start fresh with a pre-validated, fully integrated solution that reduces the time from procurement to operational AI services from months to weeks.
Operationalize Your AI Infrastructure Today
Pairing Cisco’s validated AI infrastructure with Rafay’s GPU PaaS control plane allows organizations to transform GPU systems into fully governed internal platforms. The result is a consumption-driven architecture where developers gain self-service access, operators enforce quotas and track consumption, and the business maximizes the value of its AI investments.
This architecture offers a clear path forward: deliver GPU infrastructure as a service, enable secure and compliant multitenancy, and make consumption predictable and cost-aligned from day one.
To see this powerful solution in action, join our upcoming webinar. Experts from Cisco and Rafay will demonstrate how to transform your GPU infrastructure into a production-ready AI service.
Live Webinar: From AI PODs to GPU Cloud
October 21, 2025 at 8:00 a.m. PST / 3:00 p.m. GMT
Register for the Webinar Today!
We’d love to hear what you think. Ask a Question, Comment Below, and Stay Connected with #CiscoPartners on social!
Cisco Partners Facebook | @CiscoPartners X/Twitter | Cisco Partners LinkedIn