Loading…
Attending this event?
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Wednesday November 13, 2024 9:30am - 9:45am MST
Training large-scale foundation models on Kubernetes brings a new set of challenges compared to traditional workloads. With tens of thousands of interconnected GPUs, even small hardware failures can lead to significant performance bottlenecks. This talk will dive into real-world lessons learned while building Kubernetes clusters at scale, including tackling hardware failures, optimizing GPU scheduling, and improving observability. We'll also explore how CNCF projects and Kubernetes provide the best platform for managing the complex infrastructure required for generative AI, making it easier to monitor and maintain AI workloads with the right observability tools. Attendees will walk away with actionable insights into how to navigate these challenges and build robust, scalable systems for training foundation models.

Speakers
avatar for Peter Salanki

Peter Salanki

Chief Technology Officer, CoreWeave
Peter Salanki is the Chief Technology Officer of CoreWeave, where he spearheads the development of innovative cloud solutions tailored for high-performance workloads. Originally from Sweden, Peter was recruited at just 18 to become Director of Engineering at Bahnhof AB before moving... Read More →
avatar for Chen Goldberg

Chen Goldberg

Senior Vice President of Engineering, CoreWeave
Chen Goldberg has more than 25 years of expertise leading global engineering teams, product R&D initiatives, and high-profile customer engagements with Fortune 500 enterprises. She is Senior Vice President of Engineering at CoreWeave, joining the executive team to lead and grow engineering... Read More →
Wednesday November 13, 2024 9:30am - 9:45am MST
Salt Palace | Level 1 | Hall DE

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link