Loading…
Attending this event?
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Wednesday November 13, 2024 11:15am - 11:50am MST
With the proliferation of Large Language Models, Ray, a distributed open-source framework for scaling AI/ML, has developed many advanced techniques for serving LLMs in a distributed environment. In this session, Andrew Sy Kim and Kai-Hsun Chen will provide an in-depth exploration of advanced model serving techniques using Ray, covering model composition, model multiplexing and fractional GPU scheduling. Additionally, they will discuss ongoing initiatives in Ray focused on GPU-native communication, which, when combined with Kubernetes DRA, offers a scalable approach to tensor parallelism, a technique used to fit large models across multiple GPUs. Finally, they will present a live demo, demonstrating how KubeRay enables the practical application of these techniques to real-world LLM deployments on Kubernetes. The demo will showcase Ray’s powerful capabilities to scale, compose and orchestrate popular open-source models across a diverse set of hardware accelerators and failure domains.
Speakers
avatar for Andrew Sy Kim

Andrew Sy Kim

Software Engineer, Google
Andrew Sy Kim is a software engineer at Google working on Kubernetes and GKE.
avatar for Kai-Hsun Chen

Kai-Hsun Chen

Software Engineer, Anyscale
Kai-Hsun Chen is a software engineer on the Ray Core team at Anyscale and the primary maintainer of KubeRay. He is also an open-source enthusiast, as well as a committer and PMC member of Apache Submarine.
Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 2 | 255 EF
  AI + ML

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link