Loading…
Attending this event?
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Salt Palace | Level 2 | 255 EF clear filter
arrow_back View All Dates
Wednesday, November 13
 

11:15am MST

Advanced Model Serving Techniques with Ray on Kubernetes - Andrew Sy Kim, Google & Kai-Hsun Chen, Anyscale
Wednesday November 13, 2024 11:15am - 11:50am MST
With the proliferation of Large Language Models, Ray, a distributed open-source framework for scaling AI/ML, has developed many advanced techniques for serving LLMs in a distributed environment. In this session, Andrew Sy Kim and Kai-Hsun Chen will provide an in-depth exploration of advanced model serving techniques using Ray, covering model composition, model multiplexing and fractional GPU scheduling. Additionally, they will discuss ongoing initiatives in Ray focused on GPU-native communication, which, when combined with Kubernetes DRA, offers a scalable approach to tensor parallelism, a technique used to fit large models across multiple GPUs. Finally, they will present a live demo, demonstrating how KubeRay enables the practical application of these techniques to real-world LLM deployments on Kubernetes. The demo will showcase Ray’s powerful capabilities to scale, compose and orchestrate popular open-source models across a diverse set of hardware accelerators and failure domains.
Speakers
avatar for Andrew Sy Kim

Andrew Sy Kim

Software Engineer, Google
Andrew Sy Kim is a software engineer at Google working on Kubernetes and GKE.
avatar for Kai-Hsun Chen

Kai-Hsun Chen

Software Engineer, Anyscale
Kai-Hsun Chen is a software engineer on the Ray Core team at Anyscale and the primary maintainer of KubeRay. He is also an open-source enthusiast, as well as a committer and PMC member of Apache Submarine.
Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 2 | 255 EF
  AI + ML

12:10pm MST

AI and ML: Let’s Talk About the Boring (yet Critical!) Operational Side - Rob Koch, Slalom Build & Milad Vafaeifard, Epam
Wednesday November 13, 2024 12:10pm - 12:45pm MST
As AI and ML become increasingly prevalent, it’s worth looking harder at the operational side of running these applications. We need a lot of compute and access to GPU workloads. We also need to be reliable, while providing rock-solid separation between datasets and training processes. And we need great observability in case things go wrong, and must be simple to operate. Let's build our ML applications on top of a service mesh instead of spending resources reimplementing the wheel – or, worse, the flat tire. Join us for a lively, informative, and entertaining look at how a service mesh can solve real-world issues with ML applications while making it simpler and faster to actually get things done in the world of ML. Rob Koch, Principal at Slalom Build, will demonstrate how you can use Linkerd together with multiple clusters to develop, debug, and deploy an ML application in Kubernetes (including IPv6 and GPUs), with special attention to multitenancy and scaling.
Speakers
avatar for Rob Koch

Rob Koch

Principal, Slalom Build
A tech enthusiast who thrives on steering projects from their initial spark to successful fruition, Rob Koch is Principal at Slalom Build, AWS Hero, and Co-chair of the CNCF Deaf and Hard of Hearing Working Group. His expertise in architecting event-driven systems is firmly rooted... Read More →
avatar for Milad Vafaeifard

Milad Vafaeifard

Lead Software Engineer, Epam
Milad Vafaeifard, a Lead Software Engineer at EPAM Systems, has 9+ years of web design and development expertise. Deaf but undeterred, he is the creative force behind Sign Language Tech and an active contributor to a YouTube channel focused on tech content for the signing tech community... Read More →
Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 2 | 255 EF
  AI + ML
  • Content Experience Level Any

2:30pm MST

Architecting the Future of AI: From Cloud-Native Orchestration to Advanced LLMOps - Ion Stoica, Anyscale
Wednesday November 13, 2024 2:30pm - 3:05pm MST
With the groundbreaking release of ChatGPT, large language models (LLMs) have taken the world by storm: they have enabled new applications, have exacerbated GPU shortage, and raised new questions about their answers’ veracity. This talk delves into an AI stack, encompassing cloud-native orchestration, distributed computing, and advanced LLMOps. Key topics include: - Kubernetes: The foundational technology that seamlessly manages AI workloads across diverse cloud environments. - Ray: The versatile, open-source framework that streamlines the development and scaling of distributed applications. - vLLM: The cutting-edge, high-performance, and memory-efficient inference and serving engine designed specifically for large language models. Attendees will gain insights into the architecture and integration of these powerful tools, driving innovation and efficiency in the deployment of AI solutions.
Speakers
avatar for Ion Stoica

Ion Stoica

Co-founder, executive chairman & president, Anyscale
Ion Stoica is a Professor in the EECS Department at the University of California at Berkeley, and the Director of SkyLab. He is currently doing research on cloud computing and AI systems. Past work includes Ray, Apache Spark, Apache Mesos, Tachyon, Chord DHT, and Dynamic Packet State... Read More →
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 2 | 255 EF
  AI + ML
  • Content Experience Level Any

3:25pm MST

A Tale of 2 Drivers: GPU Configuration on the Fly Using DRA - Alay Patel & Varun Ramachandra Sekar US, Nvidia
Wednesday November 13, 2024 3:25pm - 4:00pm MST
NVIDIA’s GeForceNow is a cloud gaming service that allows users to stream video games from NVIDIA's servers to a wide range of devices, including PCs, Macs, Android devices, iOS devices, and smart TVs. Under the hood, it is powered by Kubernetes running Kubevirt VMs. For a seamless user experience, GeForceNow dynamically switches GPU drivers to accommodate either passing through an entire GPU or slicing it into multiple virtual GPUs, all while keeping utilization close to 100% across the datacenter. This poses significant challenges when using the traditional device plugin API provided by Kubernetes. In this talk, we explore GeForce Now’s journey to transition away from the traditional device plugin API in favor of Dynamic Resource Allocation (DRA). We'll share valuable insights for anyone looking to perform a similar migration of their own. Join us to learn about the challenges, solutions, and best practices to help optimize your GPU-accelerated workloads in the cloud.
Speakers
avatar for Alay Patel

Alay Patel

Senior Software Engineer, Nvidia
Alay is a Senior Software Engineer at Nvidia where he works on cloud gaming service, exposing infrastructure for GPU workloads. He is passionate about open source with a focus on Kubernetes and platform engineering.
avatar for Varun Ramachandra Sekar US

Varun Ramachandra Sekar US

Senior Software Engineer, Nvidia
Developer by day, Dog whisperer by night.
Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 2 | 255 EF
  AI + ML

4:30pm MST

Making Kubernetes Simpler for Accelerated Workloads - Susan Wu, Google; Lucy Sweet, Uber; Mitch McKenzie, Weave; Aditya Shanker, Crusoe
Wednesday November 13, 2024 4:30pm - 5:05pm MST
Kubernetes and the open-source ecosystem for AI frameworks have been great for LLM innovation, empowering developers to build applications that use natural language as the interface to data. Yet, many developers and cluster operators struggle to put these frameworks into production use. In this session, hear from several platform engineers responsible for designing core infrastructure supporting accelerated workloads, services, large language model training and inference pipelines. You can expect to come away with guidance, hear of pitfalls to watch out for and learn how they successfully abstracted the infrastructure complexity to improve their research users' experience and velocity. Panelists include: Lucy Sweet, Senior Software Engineer (Infrastructure), Uber, Mitch McKenzie, Site Reliability Engineer - Machine Learning Operations, Weave, Susan Wu, Outbound Product Manager, Google
Speakers
avatar for Susan Wu

Susan Wu

Outbound Product Manager, Google
Susan is an Outbound Product Manager for Google Cloud, focusing on GKE Networking and Network Security. She previously led product and technical marketing roles at VMware, Sun/Oracle, Canonical, Docker, Citrix and Midokura (part of Sony Group). She is a frequent speaker at conferences... Read More →
avatar for Lucy Sweet

Lucy Sweet

Senior Software Engineer at Uber, Uber
Lucy is a Senior Software Engineer at Uber Denmark who works on software infrastructure
Wednesday November 13, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 2 | 255 EF
  AI + ML
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date - 
  • 🚨 Contribfest
  • 🪧 Poster Sessions
  • AI + ML
  • Breaks
  • ⚡ Lightning Talks
  • Cloud Native Experience
  • Cloud Native Novice
  • CNCF-hosted Co-located Events
  • Connectivity
  • Data Processing + Storage
  • Emerging + Advanced
  • Experiences
  • Keynote Sessions
  • Maintainer Track
  • Observability
  • Operations + Performance
  • Platform Engineering
  • Project Opportunties
  • Registration
  • SDLC
  • Security
  • Solutions Showcase
  • Sponsor-hosted Co-located Event
  • Tutorials