Loading…
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Friday November 15, 2024 11:00am - 11:35am MST
AI/ML workloads on Kubernetes demand ultra-high performance. If your training or multi-GPU inference job spans nodes, your GPUs will use the network, talking through a NIC over local PCIe. But not all NICs are equal! To get the best performance, you need a NIC which is as "close" to the GPU as possible. Unfortunately, the Kubernetes extended resources API does not have enough information and does not give you control over which specific devices are assigned. Dynamic Resource Allocation, the successor API, gives you this power. Come to this session to learn about DRA, how it is improving overall device support in K8s, and how to use it to allocate multiple GPUs, NICs, and TPUs to get the maximum performance out of your infrastructure.
Speakers
avatar for Patrick Ohly

Patrick Ohly

Principal Engineer, Intel
Patrick Ohly is a software engineer at Intel GmbH, Germany. In the past he has worked on performance analysis software for HPC clusters ("Intel Trace Analyzer and Collector") and cluster technology in general (PTP and hardware time stamping). Since January 2009 he has worked for Intel... Read More →
avatar for John Belamaric

John Belamaric

Senior Staff Software Engineer, Google
John is a Sr Staff SWE, co-chair of K8s SIG Architecture and of K8s WG Device Management, helping lead efforts to improve how GPUs, TPUs, NICs and other devices are selected, shared, and configured in Kubernetes. He is also co-founder of Nephio, an LF project for K8s-based automation... Read More →
Friday November 15, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 250
  AI + ML

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link