Loading…
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Friday November 15, 2024 4:00pm - 4:35pm MST
Kubernetes is the ideal platform for running AI and ML workloads, such as LLMs. GPU nodes are often used for their parallel processing capabilities and higher performance benefits; however, they are known to be costly. Many factors impact the cost of running AI/ML workloads such as GPU utilization, GPU VM size, idle time, etc. These costs are often ignored and considered inherent in running GPU workloads. But if running workloads at scale and left unoptimized, costs will quickly spin out of control. In this talk, we leverage NVIDIA DCGM exporter with Prometheus for GPU metrics monitoring alongside OpenCost to measure the Kubernetes spend of our GPU workloads. We will provide an overview of OpenCost, highlighting its role in bridging the gap between the developer and platform teams through visibility and accountability of spend. We will demonstrate how to use the NVIDIA GPU Operator and how techniques such as partitioning can lead to significant cost savings.
Speakers
avatar for Ally Ford

Ally Ford

Product Manager, Microsoft
Ally is a Product Manager on the Azure Kubernetes Service (AKS) team at Microsoft Azure. She spends her days collaborating with customers to design features that improve the end to end operator experience for both Linux and Windows users. Formerly she was a UX designer and project... Read More →
avatar for Kaysie

Kaysie

Product Manager, Microsoft
Kaysie Yu is a Product Manager on the Azure Kubernetes Service team at Microsoft. She works on cost management and optimization and is passionate about the convergence of FinOps and GreenOps, advocating for best practices that help organizations achieve cost efficiency while contributing... Read More →
Friday November 15, 2024 4:00pm - 4:35pm MST
Salt Palace | Level 2 | 255 E
  AI + ML
Log in to leave feedback.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link