Name: Unlocking Potential of Large Models in Production - Yuan Tang, Red Hat & Adam Tetelman, NVIDIA
Start: 2024-11-14T14:30:00-0700
End: 2024-11-14T15:05:00-0700

In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 2 | 255 E

The recent paradigm shift from traditional ML to GenAI and LLMs has brought with it a new set of non-trivial LLMOps challenges around deployment, scaling, and operations that make building an inference platform to meet all business requirements an unsolved problem. This talk highlights these new challenges along with best-practices and solutions for building out large, scalable, and reliable inference platforms on top of cloud native technologies such as Kubernetes, Kubeflow, Kserve, and Knative. Which tools help effectively benchmark and assess the quality of an LLM? What type of storage and caching solutions enable quick auto-scaling and model downloads? How can you ensure your model is optimized for the specialized accelerators running in your cluster? How can A/B testing or rolling upgrades be accomplished with limited compute? What exactly do you monitor in an LLM? In this session we will use KServe as a case study to answer these questions and more.

Speakers

Yuan Tang

Principal Software Engineer, Red Hat

Yuan is a principal software engineer at Red Hat, working on OpenShift AI. Previously, he has led AI infrastructure and platform teams at various companies. He holds leadership positions in open source projects, including Argo, Kubeflow, and Kubernetes. He's also a maintainer and... Read More →

Adam Tetelman

Principal Product Architect, NVIDIA

Adam Tetelman is a principal architect at NVIDIA leading cloud native initiatives and CNCF engagements across the company; building inference platforms for NVIDIA AI Enterprise and DGX Cloud. He has degrees in computational robotics, computer & systems engineering, and cognitive science... Read More →

KubeCon NA 2024 Yuan and Adam Unlocking Potential of Large Models in Production.pptx 1 pdf

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 2 | 255 E

AI + ML

Content Experience Level Intermediate

KubeCon + CloudNativeCon North America 2024

Yuan Tang

Adam Tetelman

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!