Loading…
Attending this event?
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Salt Palace | Level 1 | Hall DE clear filter
arrow_back View All Dates
Wednesday, November 13
 

9:00am MST

Keynotes To Be Announced
Wednesday November 13, 2024 9:00am - 10:45am MST
Wednesday November 13, 2024 9:00am - 10:45am MST
Salt Palace | Level 1 | Hall DE

12:10pm MST

Operationalizing High-Performance GPU Clusters in Kubernetes: A Case Study of Databricks' DBRX - Will Gleich & Wai Wu, Databricks
Wednesday November 13, 2024 12:10pm - 12:45pm MST
Training large language models (LLMs) on GPUs within Kubernetes environments involves significant configuration and complexity, often leading to unique failure scenarios. This presentation will cover the lessons learned from training DBRX, a state-of-the-art LLM, that we developed on a 400-node cluster with a primary workload utilizing 3072 GPUs and the tooling needed to measure and maintain a healthy fleet of nodes and underlying interconnect fabric. This will include: * How we implemented GPU health detection leveraging Prometheus and DCGM Exporter * How we monitor GPU Direct Remote Direct Memory Access (GDRDMA) and the challenges of monitoring components that bypass CPU * Discussion of failure scenarios during training, and how they were addressed Databricks Mosaic AI Training leverages GPU clusters across many cloud providers to maximize availability; we will also discuss the variations we see and how we had to engineer around them.
Speakers
WW

Wai Wu

Databricks
avatar for Will Gleich

Will Gleich

Sr. DevOps Engineer, Databricks
Will Gleich is a Sr. DevOps engineer at Databricks specializing in MLOps and Site Reliability Engineering.
Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 1 | Hall DE
  AI + ML

2:30pm MST

Optimizing LLM Performance in Kubernetes with OpenTelemetry - Ashok Chandrasekar, Google & Liudmila Molkova, Microsoft
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Large Language Models are increasing in popularity and their deployments on Kubernetes have steadily increased. LLM applications bring new usage patterns that the industry does not have the expertise in. At the same time, there is a lack of observability in these deployments which makes it difficult to debug performance issues. We will present an end to end walkthrough of how you can leverage client and server LLM observability using Open Telemetry based on the recent efforts in the Kubernetes and Open Telemetry communities to standardize these across LLM clients and model servers. We will also demonstrate how to troubleshoot a real-world performance issue in your LLM deployment and how to optimize your LLM server setup for better performance on Kubernetes. We'll show how to use Kubernetes autoscaling based on custom model server metrics and demonstrate how they offer a superior alternative to using GPU utilization metrics for such deployments.
Speakers
avatar for Liudmila Molkova

Liudmila Molkova

Principal Software Engineer, Microsoft
Liudmila Molkova is a Principal Software Engineer at Microsoft working on observability and Azure client libraries. She is a co-author of distributed tracing implementations across the .NET ecosystem including HTTP client instrumentation and Azure Functions. Liudmila is an active... Read More →
avatar for Ashok Chandrasekar

Ashok Chandrasekar

Senior Software Engineer, Google
Ashok Chandrasekar is a Senior Software Engineer at Google working on AI/ML experience for Google Kubernetes Engine. Previously he was a Staff Engineer at VMware where he led the cluster lifecycle management area for Tanzu Mission Control. He has 7 years of Kubernetes experience working... Read More →
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Hall DE
  AI + ML

3:25pm MST

Optimizing Load Balancing and Autoscaling for Large Language Model (LLM) Inference on Kubernetes - David Gray, Red Hat
Wednesday November 13, 2024 3:25pm - 4:00pm MST
As generative AI language models improve, they are increasingly being integrated into business-critical applications. However, large language model (LLM) inference is a compute-intensive workload that often requires expensive GPU hardware. Making efficient use of these hardware resources in the public or private cloud is critical for managing costs and power usage. This talk introduces the KServe platform for deploying LLMs on Kubernetes and provides an overview of LLM inference performance concepts. Attendees will learn techniques to improve load balancing and autoscaling for LLM inference, such as leveraging KServe, Knative, and GPU operator features. Sharing test results, we will analyze the impact of these optimizations on key performance metrics, such as latency per token and tokens per second. This talk equips participants with strategies to maximize the efficiency of LLM inference deployments on Kubernetes, ultimately reducing costs and improving resource utilization.
Speakers
avatar for David Gray

David Gray

Senior Software Engineer, Red Hat
David Gray is a Senior Software Engineer on the Performance and Scale team at Red Hat. His role involves analyzing and improving AI inference workloads on Kubernetes platforms. David is actively engaged in performance experimentation and analysis of running large language models in... Read More →
Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Hall DE
  AI + ML
  • Content Experience Level Any

4:30pm MST

Platform Performance Optimization for AI - a Resource Management Perspective - Antti Kervinen, Intel & Dixita Narang, Google
Wednesday November 13, 2024 4:30pm - 5:05pm MST
How much node resource management can affect AI workload performance? What options are there? What is the trade-off between total throughput and low latencies? In this talk we take a systematic approach to Platform Performance Optimization. We walk through the whole path from goal setting, gathering data, analysis, visualizations and conclusions. At each stop along the path we share our practical experiences in a case of LLM inference optimization. You will find many considerations, findings and practical tricks to take away. For instance, how to instrument PyTorch without touching the source or a container image, how to enable changing what we are measuring without new expensive benchmark reruns, and how much more we can learn from visualizations compared to numeric averages and percentiles. Finally we share real results from our case: how resource management increased total token throughput per worker node by more than 3.5x from the baseline.
Speakers
avatar for Antti Kervinen

Antti Kervinen

Cloud Orchestration Software Engineer, Intel
Antti Kervinen is a Cloud Orchestration Software Engineer working at Intel, whose interest in Linux and distributed systems has led him from academic research of concurrency to the world of Kubernetes. When unplugged, Antti spends his time outdoors discovering wonders of nature.
avatar for Dixita Narang

Dixita Narang

Software Engineer, Google
Dixita Narang is a Software Engineer at Google on the Kubernetes Node team. With a primary focus on resource management within Kubernetes, Dixita is deeply involved in the development and advancement of the Memory QoS feature, which is currently in the alpha stage. She is a new contributor... Read More →
Wednesday November 13, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Hall DE
  AI + ML

5:25pm MST

Production AI at Scale: Cloudera’s Journey in Building a Robust Inference Platform - Zoram Thanga & Peter Ableda, Cloudera
Wednesday November 13, 2024 5:25pm - 6:00pm MST
In this session, we talk about Cloudera AI Inference Service, a secure, large scale platform for generative AI and predictive inference workloads, built using state of the art Kubernetes, CNCF and Apache open source projects. We take the audience through our journey in building this platform and share the experiences we gained along the way. The platform is built using openness, security, scalability, performance and standards compliance as guiding principles. We demonstrate that it is possible to be open and secure at the same time, and that organizations can incorporate production grade AI inferencing into their Big Data environments. This session will cover the architecture of the platform, and explain how we handle performance, scaling, authentication, fine grained authorization and audit logging, all of which are critical considerations for production inferencing.
Speakers
avatar for Peter Ableda

Peter Ableda

Director, Product Management, Cloudera
Peter Ableda is the Director of Product Management for Cloudera’s AI product suite, bringing over a decade of experience in data management and advanced analytics. Holding a Master of Science degree in Computer Science from the Budapest University of Technology, Peter has dedicated... Read More →
avatar for Zoram Thanga

Zoram Thanga

Principal Engineer, Cloudera
Zoram is a Principal Engineer, Enterprise AI Platform in Cloudera. He has been working in the software industry for over 23 years, and has been involved in building clustering software, containers, file systems, analytical query engines, and ML/AI platforms. He is a committer in the... Read More →
Wednesday November 13, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Hall DE
  AI + ML
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date - 
  • 🚨 Contribfest
  • 🪧 Poster Sessions
  • AI + ML
  • Breaks
  • ⚡ Lightning Talks
  • Cloud Native Experience
  • Cloud Native Novice
  • CNCF-hosted Co-located Events
  • Connectivity
  • Data Processing + Storage
  • Emerging + Advanced
  • Experiences
  • Keynote Sessions
  • Maintainer Track
  • Observability
  • Operations + Performance
  • Platform Engineering
  • Project Opportunties
  • Registration
  • SDLC
  • Security
  • Solutions Showcase
  • Sponsor-hosted Co-located Event
  • Tutorials