Articles
Optimizing AI Workloads with NVIDIA GPUs, Time Slicing, and Karpenter (Part 2)
8 min read
Introduction: Overcoming GPU Management Challenges In Part 1 of this blog series, we explored the challenges of hosting large language models (LLMs) on CPU-based workloads within an EKS cluster. We discussed the inefficiencies associated with using CPUs for such tasks, primarily due to the large model sizes and slower inference speeds. The introduction of GPU […]
Optimizing AI Workloads with NVIDIA GPUs, Time Slicing, and Karpenter
4 min read
Explore how to deploy GPU-based workloads in an EKS cluster using the Nvidia Device Plugin, and maximize GPU efficiency and scalability in your Kubernetes environment.