Name: Optimizing LLM Performance in Kubernetes with OpenTelemetry - Ashok Chandrasekar, Google & Liudmila Molkova, Microsoft
Start: 2024-11-13T14:30:00-0700
End: 2024-11-13T15:05:00-0700

In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

Wednesday November 13, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 2 | 255 E

Large Language Models are increasing in popularity and their deployments on Kubernetes have steadily increased. LLM applications bring new usage patterns that the industry does not have the expertise in. At the same time, there is a lack of observability in these deployments which makes it difficult to debug performance issues. We will present an end to end walkthrough of how you can leverage client and server LLM observability using Open Telemetry based on the recent efforts in the Kubernetes and Open Telemetry communities to standardize these across LLM clients and model servers. We will also demonstrate how to troubleshoot a real-world performance issue in your LLM deployment and how to optimize your LLM server setup for better performance on Kubernetes. We'll show how to use Kubernetes autoscaling based on custom model server metrics and demonstrate how they offer a superior alternative to using GPU utilization metrics for such deployments.

Speakers

Liudmila Molkova

Principal Software Engineer, Microsoft

Liudmila Molkova is a Principal Software Engineer at Microsoft working on observability and Azure client libraries. She is a co-author of distributed tracing implementations across the .NET ecosystem including HTTP client instrumentation and Azure Functions. Liudmila is an active... Read More →

Ashok Chandrasekar

Senior Software Engineer, Google

Ashok Chandrasekar is a Senior Software Engineer at Google working on AI/ML experience for Google Kubernetes Engine. Previously he was a Staff Engineer at VMware where he led the cluster lifecycle management area for Tanzu Mission Control. He has 7 years of Kubernetes experience working... Read More →

optimizing llm performance pdf

Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 2 | 255 E

AI + ML

Content Experience Level Intermediate

KubeCon + CloudNativeCon North America 2024

Liudmila Molkova

Ashok Chandrasekar

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!