Loading…
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Wednesday November 13, 2024 11:15am - 11:50am MST
With the proliferation of Large Language Models, Ray, a distributed open-source framework for scaling AI/ML, has developed many advanced techniques for serving LLMs in a distributed environment. In this session, Andrew Sy Kim and Kai-Hsun Chen will provide an in-depth exploration of advanced model serving techniques using Ray, covering model composition, model multiplexing and fractional GPU scheduling. Additionally, they will discuss ongoing initiatives in Ray focused on GPU-native communication, which, when combined with Kubernetes DRA, offers a scalable approach to tensor parallelism, a technique used to fit large models across multiple GPUs. Finally, they will present a live demo, demonstrating how KubeRay enables the practical application of these techniques to real-world LLM deployments on Kubernetes. The demo will showcase Ray’s powerful capabilities to scale, compose and orchestrate popular open-source models across a diverse set of hardware accelerators and failure domains.
Speakers
avatar for Andrew Sy Kim

Andrew Sy Kim

Software Engineer, Google
Andrew Sy Kim is a software engineer at Google working on Kubernetes and GKE.
avatar for Kai-Hsun Chen

Kai-Hsun Chen

Software Engineer, Anyscale
Kai-Hsun Chen is a software engineer on the Ray Core team at Anyscale and the primary maintainer of KubeRay. He is also an open-source enthusiast, as well as a committer and PMC member of Apache Submarine.
Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 2 | 255 B
  AI + ML
Log in to leave feedback.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link