Loading…
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
or to bookmark your favorites and sync them to your phone or calendar.
strong>Salt Palace | Level 2 | 250 AD [clear filter]
Wednesday, November 13
 

11:15am MST

GitOops... I Did It Again! Protecting Your GitOps System from Being Used for Privilege Escalation - Oreen Livni & Elad Pticha, Cycode
Wednesday November 13, 2024 11:15am - 11:50am MST
From data theft to privilege escalation in the Kubernetes cluster, you don't want to be the one telling your boss that your GitOps system has been compromised. This talk covers the security of GitOps tools, highlighting common misconfiguration pitfalls and how to avoid them. We will share the story of CVE-2024-31989, a critical vulnerability we discovered in the popular tool Argo. When installed with the default configuration, this vulnerability allowed privilege escalation from any access point to the cluster (such as a webshell) to complete cluster takeover. We will discuss common insecure configurations like this and provide examples from popular open-source projects to explain how your organization can protect itself from these risks. Attendees will receive a guide and practical tools to protect their GitOps systems against such threats.
Speakers
avatar for Elad Pticha

Elad Pticha

Security Researcher, Cycode
Elad is a passionate security researcher with a focus on software supply chain and web application security. He dedicates his time to writing security research tools and finding vulnerabilities across a broad spectrum, from open-source projects and web applications to IoT devices... Read More →
avatar for Oreen Livni

Oreen Livni

Security Researcher, Cycode
Oreen Livni is a passionate security researcher specializing in application and supply chain security, Domain, and networking. With a focus on software supply chain vulnerabilities. Alongside his professional commitments, he immerses himself in art, gardening, and the world of surfing... Read More →
Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 2 | 250 AD
  Security
  • Content Experience Level Any

12:10pm MST

The Hard Truth About GitOps and Database Rollbacks - Rotem Tamir, Ariga
Wednesday November 13, 2024 12:10pm - 12:45pm MST
For two decades now, the common practice for handling rollbacks of database schema migrations has been pre-planned "down migration scripts". A closer examination of this widely accepted truth reveals critical gaps that result in teams relying on risky, manual operations to roll back schema migrations in times of crisis. In this talk, we show why our existing tools and practices cannot deliver on the GitOps promise of "declarative" and "continuously reconciled" workflows and how we can use the Operator Pattern to build a new solution for robust and safe schema rollbacks.
Speakers
avatar for Rotem Tamir

Rotem Tamir

CTO, Ariga
Rotem Tamir (39), father of two. Co-founder and CTO of Ariga, co-maintainer of Atlas and Ent. Ex-data platform architect at Nexar, infrastructure team lead at ironSource.
Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 2 | 250 AD
  SDLC

2:30pm MST

Secure by Design CI/CD: Practical Insights from Adobe and Autodesk - Vikram Sethi, Adobe Inc. & Jesse Sanford, Autodesk
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Worried that your CI/CD pipelines and developer workflows are insecure? Lost in security buzzwords like SBOMs, provenance, attestation, SLSA, OpenSSF, and more? Seeking a clear, actionable reference architecture to secure your pipeline? Whether you are just getting started on your Software Supply Chain Security journey, or are ready to take it to the next level navigating this diverse ecosystem is challenging. Join Vikram and Jesse as they present a reference architecture for secure-by-default CI/CD pipelines and show you effective security controls at every step. See firsthand how these industry giants safeguarded their pipelines while maintaining agility and innovation. This talk will showcase their work, and the work of the CNOE (Cloud Native Operational Excellence) group, which aims to build a paved path through this problem space by producing opinionated software collections or “CNOE stacks” that can be adapted to meet you where your technology is.
Speakers
avatar for Jesse Sanford

Jesse Sanford

Software Architect, Autodesk
Jesse is a lifelong software engineer focused on site reliability and Infosec. Currently architecting the juncture of platform engineering and security/compliance for Autodesk's Developer Enablement team. He regularly contributes to open source and frequently speaks about his work... Read More →
avatar for Vikram Sethi

Vikram Sethi

Principal Scientist, Adobe Inc.
Vikram is a Principal Scientist in the Developer Platforms organization at Adobe. Vikram has been architecting and building the Developer Experience for Adobe's Internal Developer Platform for the last few years. In the last year or so, Vikram has been working on rearchitecting Adobe's... Read More →
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 2 | 250 AD
  SDLC
  • Content Experience Level Any

3:25pm MST

Scale Job Triggering with a Distributed Scheduler - Cassie Coyle & Artur Souza, Diagrid
Wednesday November 13, 2024 3:25pm - 4:00pm MST
Imagine scheduling thousands or millions of jobs that are persisted and triggered timely and resilient to downtime. Some jobs might be triggered every second while others need to reliably be triggered on the first day of the month. Achieving high throughput and reliability is critical for the performance and operational efficiency of modern distributed systems. How can traditional cron job scheduling be extended? How can distributed systems handle job scheduling with minimal downtime? What challenges arise when scaling job scheduling to thousands or millions of jobs? In this session, Artur and Cassie will delve into the design of Dapr’s distributed Scheduler and how users can start using it today. You will gain a comprehensive understanding of how Dapr’s Scheduler unblocks scalability of actors and workflows while also enabling new capabilities, like delayed pubsub and schedule job API.
Speakers
avatar for Artur Souza

Artur Souza

Head of Engineering, Diagrid
I am a maintainer of Dapr since 2019, helped the project reach the 1.0 stable version and keeping frequent releases since then. Currently Head of Engineering at Diagrid, leading the engineering teams building Conductor and the next generation of managed cloud native APIs via Dapr... Read More →
avatar for Cassie Coyle

Cassie Coyle

Software Engineer, Diagrid
Cassie, a devoted software engineer at Diagrid actively contributes to Dapr, focusing on Go backend development to simplify the creation of resilient, event-driven, and microservices-based apps. She is a member of the Dapr Day and AppDeveloperCon 2024 program committees. Her work... Read More →
Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 2 | 250 AD
  SDLC

4:30pm MST

Perform Laser Focused Deployments by Deciding in Advance the Blast Radius - Kostis Kapelonis, Octopus deploy
Wednesday November 13, 2024 4:30pm - 5:05pm MST
Progressive Delivery is an advanced deployment method that allows for zero-downtime application releases. Argo Rollouts is a Kubernetes controller that allows you to adopt progressive delivery in the form of blue/green and canary deployments. We see a lot of teams that choose an arbitrary number of clients that access the new version of a canary. Yes, it is very easy to send only 10% of the traffic to the new version of a Kubernetes deployment. But sometimes you want to choose WHICH 10% sees the new traffic. In this talk we will see several approaches on pinning down specific clients to the old or new version and advanced scenarios for sending canary traffic only to a specific subset of users such as internal employees or customers who have expressed their interest on seeing brand new releases as soon as possible.
Speakers
avatar for Kostis Kapelonis

Kostis Kapelonis

Developer Advocate, Codefresh by Octopus Deploy
Kostis is a software engineer/technical-writer dual class character. He lives and breathes automation, good testing practices and stress-free deployments with GitOps.
Wednesday November 13, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 2 | 250 AD
  SDLC

5:25pm MST

Taming Your Application’s Environments - Marcos Lilljedahl, Dagger & Mauricio "Salaboy" Salatino, Diagrid
Wednesday November 13, 2024 5:25pm - 6:00pm MST
How coupled are your applications code and pipelines to its target cloud or on-prem environment? Kubernetes helps us to abstract how we run our workloads. However, there are other aspects, like infrastructure dependencies, service configuration, build process, deployment descriptors, etc., which need to be considered to make an application portable across multiple environments. Focusing on these aspects make a big difference when migrating apps to reduce costs, meeting compliance requirements or leveraging a specific tech only available somewhere else. Join us to cover three techniques you can implement to level up your SDLC: - Modularizing and enhancing our delivery pipelines to simplify complex environments (Crossplane and Dagger) - Building consistent experiences around well-known interfaces (CloudEvents, Dapr, and OpenFeature) to minimize runtime drift. - Design with separation of concerns to enable fast feedback loops between development and operation teams (Argo CD, Knative)
Speakers
avatar for Marcos Lilljedahl

Marcos Lilljedahl

Software Engineer, Dagger
Dad, Docker Captain, OSS lover, helmsman and wine drinker. Father of a joyful kid and wannabe surfer. I like listening to jazz music and tinker with some fun projects when possible. Avid open source contributor.
avatar for Mauricio Salatino

Mauricio Salatino

OSS Software Engineer, Diagrid
Mauricio works as an Open Source Software Engineer at @Diagrid, contributing to and driving initiatives for the Dapr OSS project. Mauricio also serves as a Steering Committee member for the Knative Project and Co-Leading the Knative Functions initiative. He published a book titled... Read More →
Wednesday November 13, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 2 | 250 AD
  SDLC
 
Thursday, November 14
 

11:00am MST

How We Made OpenTelemetry Be Our Fitness Tracker for Your CI/CD Pipelines! - Nicolas Woerner, Clario & Andreas Grabner, Dynatrace
Thursday November 14, 2024 11:00am - 11:35am MST
CI/CD pipelines are the heartbeat of modern cloud-native software delivery. Healthy pipelines ensure rapid and continuous deployments every time code gets committed to the Git repositories! Every new repository and commit puts more load on the CI/CD tool making it more challenging to keep this crucial heartbeat healthy! In this session, engineers from Clario will demonstrate how they leverage OpenTelemetry to observe, validate, report and optimize their CI/CD pipelines, keeping their deployments healthy despite increased scale and unlocking the full potential of modern software delivery on Kubernetes with GitLab.
Speakers
avatar for Andi Grabner

Andi Grabner

CNCF Ambassador and DevRel, Dynatrace
Andreas Grabner (@grabnerandi) has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a CNCF ambassador, contributor to the CNCF project keptn and a DevRel for Dynatrace. Andreas is also a regular... Read More →
avatar for Nicolas Woerner

Nicolas Woerner

Associate DevOps Engineer, Clario
Nicolas Wörner works in the Platform Engineering Team at Clario. With a background in software and DevOps engineering he focuses on continuously enhancing the software delivery workflow at Clario. Nicolas is passionate about leveraging CNCF software to drive efficiency and reliability... Read More →
Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 250 AD
  SDLC

11:55am MST

From Chaos to Calm: Building a Unified and Scalable CI/CD Pipeline at Akamai - Tomer Patel, Akamai Technologies Inc.
Thursday November 14, 2024 11:55am - 12:30pm MST
Are you struggling with a chaotic development process? Join Akamai's talk and discover how we built a unified and scalable CI/CD pipeline, saving 40% of our QA, Performance, Dev, and Ops daily work, and how you can do that in your organization! This session dives into the architecture, key features, and its impact on development efficiency. You will learn how to: - Conquer cloud-native deployments by adding the right tools - such as Argo Rollouts, and Backstage - Integrate CI/CD tools (ArgoCD, Jenkins, DevSpace, Grafana, Prometheus, Thanos) for a smoother workflow. - Leverage best-in-breed, cost-efficient open-source solutions
Speakers
avatar for Tomer Patel

Tomer Patel

Senior Engineering Manager, Akamai Technologies Inc.
Tomer currently works as Senior Engineering Manager at Akamai Technologies, where he leads a group of Data engineers, Software developers and DevOps at scale. Previously Tomer worked as Team Lead at Clarizen (Now Planview).
Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 2 | 250 AD
  SDLC

2:30pm MST

Mastering Cell-Based Architecture: Practical Solutions and Best Practices - Shweta Vohra, Booking.com & Asanka Abeysinghe, WSO2
Thursday November 14, 2024 2:30pm - 3:05pm MST
Are you struggling to validate your cell boundaries or facing challenges with greenfield versus brownfield cell-based architectures (CBA)? Do you find it difficult to define enterprise-wide cell boundaries or wish there were best practices to guide you? If these pain points sound familiar, this session is tailored for you. In this talk, we will first guide you through the process of defining an enterprise-wide cell-based architecture for your organization or context. Then we will explore best practices for greenfield, brownfield, and hybrid cell implementations using CBA. By translating common user challenges into actionable implementation references, we aim to elevate your understanding of CBA with real-world use cases and best practices. This session will also cover best practices for the data, security, application, and infrastructure layers, ensuring a comprehensive approach to CBA implementation. Join us to take your knowledge of CBA to the next level!
Speakers
avatar for Shweta Vohra

Shweta Vohra

Lead Architect, Booking.com
Shweta Vohra is an Architect, Author, and Inventor with over 20 years of experience in the software industry. Her expertise spans from complex embedded systems design to hybrid cloud-native solutions, and most recently, the creation of data and machine learning platforms. She is the... Read More →
avatar for Asanka Abeysinghe

Asanka Abeysinghe

CTO, WSO2
Asanka, WSO2's CTO, is a technology visionary with over 20 years of experience designing and implementing scalable distributed systems, microservices, and business integration solutions. He advances WSO2's corporate reference architecture, collaborates with customers and industry... Read More →
Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 2 | 250 AD
  SDLC

3:25pm MST

You're Overpaying for CI - Kyle Penfound, Dagger
Thursday November 14, 2024 3:25pm - 4:00pm MST
In recent years, the computational power of developer workstations has surged dramatically. With so much compute available at every developer's fingertips, why do we continue to waste time and money with lengthy build times on sluggish CI compute? Some forward-thinking organizations are re-evaluating this approach, questioning the necessity of paying for CI compute when the developers' workstations, which are already more powerful and paid for, remain underutilized. In this technical session we will transition a fully functioning production CI system from cloud-based compute to local workstation compute. We will explore the intricacies of replicating the functionality of a modern CI system, leveraging the power of developer workstations, all using open source software.
Speakers
avatar for Kyle Penfound

Kyle Penfound

Solutions Engineer, Dagger
Kyle is part of the ecosystem team at dagger.io working on the future of CICD. He has a background in DevOps and just loves giving demos!
Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 2 | 250 AD
  SDLC
  • Content Experience Level Any

4:30pm MST

Bring the Joy Back to Deployments! - Murriel McCabe, Google Cloud & Elizabeth Ponce, Airbnb
Thursday November 14, 2024 4:30pm - 5:05pm MST
Destination: deployment! Your feature is complete. Your application is ready. You want to share your hard work with the world. How do you pick the optimal deployment process? Where do you even start? In this talk, Murriel and Elizabeth will be your guides on a brief tour of several open source tools for deploying a workload into Kubernetes. Our journey will begin with manual hello world deployments and from there we will explore some of the most common modern tools for CI/CD, including a demo speedrun! Major destinations on this tour will include helm, kustomize, skaffold, ArgoCD, Tekton, Jenkins and JenkinsX. We will walk through the fundamentals of CI/CD, explore tradeoffs and discuss the process for implementing these tools in your software development lifecycle. By the end of this talk, you'll be equipped to begin navigating the CI/CD landscape and will leave with resources that will enable you to get started quickly and begin testing in your own environment.
Speakers
avatar for Murriel McCabe

Murriel McCabe

Customer Engineer, Google Cloud
Murriel is a Customer Engineer with Google Cloud, and works with enterprise customers to solve technical and business challenges and build applications on the cloud. She is currently enthusiastic about DevOps and Platform Engineering, Kubernetes, and the Developer Experience. She... Read More →
avatar for Elizabeth Ponce

Elizabeth Ponce

Software Engineer, Airbnb
Elizabeth is a Software Engineer in Search Infrastructure at Airbnb and has a non traditional pathway from Customer Support Specialist to Software Engineering at Airbnb. As a Global Co-Chair for GemTech, Airbnb's Genders Marginalized in Tech employee resource group, Elizabeth actively... Read More →
Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 2 | 250 AD
  SDLC

5:25pm MST

Navigating Failures in Pods with Devices: Challenges and Solutions - Sergey Kanzhelev, Google & Mrunal Patel, Red Hat
Thursday November 14, 2024 5:25pm - 6:00pm MST
Pods are no longer running with just CPU and Memory. We provision GPUs, network cards, request special placement of those devices and allocated memory. And the more efficient or effective you want your set up to be, the more complicated those device requirements are, the more chances you will hit an edge case Kubernetes has not accounted for yet. Come to the talk to learn from Node Maintainers about some of those shortcomings in Kubernetes. If you are only starting with AI/ML and devices, you will be interested to learn what to expect. If you have lots of experience, you may still learn new things. With the increased focus on AI/ML workloads, highlighting those scenarios is important. As Kubernetes plans to fix those problems, you can give feedback on what would work best for you.
Speakers
avatar for Sergey Kanzhelev

Sergey Kanzhelev

Staff Software Engineer, Google
Sergey Kanzhelev is a seasoned open source and cloud native maintainer working actively on Kubernetes. Sergey is serving as co-chair of SIG node. He is also one of the founders of OpenTelemetry. He is working on engineering aspect of software and its practical application. He is contributing... Read More →
avatar for Mrunal Patel

Mrunal Patel

Distinguished Engineer, Red Hat
Mrunal Patel is a Senior Principal Software Engineer at Red Hat working on containers for Openshift. He is a maintainer of runc/libcontainer and the OCI runtime specification. He started the CRI-O runtime. He is a SIG-Node chair and tech lead.
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 2 | 250 AD
  AI + ML
  • Content Experience Level Any
 
Friday, November 15
 

11:00am MST

Better Together! GPU, TPU and NIC Topological Alignment with DRA - John Belamaric, Google & Patrick Ohly, Intel
Friday November 15, 2024 11:00am - 11:35am MST
AI/ML workloads on Kubernetes demand ultra-high performance. If your training or multi-GPU inference job spans nodes, your GPUs will use the network, talking through a NIC over local PCIe. But not all NICs are equal! To get the best performance, you need a NIC which is as "close" to the GPU as possible. Unfortunately, the Kubernetes extended resources API does not have enough information and does not give you control over which specific devices are assigned. Dynamic Resource Allocation, the successor API, gives you this power. Come to this session to learn about DRA, how it is improving overall device support in K8s, and how to use it to allocate multiple GPUs, NICs, and TPUs to get the maximum performance out of your infrastructure.
Speakers
avatar for Patrick Ohly

Patrick Ohly

Principal Engineer, Intel
Patrick Ohly is a software engineer at Intel GmbH, Germany. In the past he has worked on performance analysis software for HPC clusters ("Intel Trace Analyzer and Collector") and cluster technology in general (PTP and hardware time stamping). Since January 2009 he has worked for Intel... Read More →
avatar for John Belamaric

John Belamaric

Senior Staff Software Engineer, Google
John is a Sr Staff SWE, co-chair of K8s SIG Architecture and of K8s WG Device Management, helping lead efforts to improve how GPUs, TPUs, NICs and other devices are selected, shared, and configured in Kubernetes. He is also co-founder of Nephio, an LF project for K8s-based automation... Read More →
Friday November 15, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 250 AD
  AI + ML

11:55am MST

Building Massive-Scale Generative AI Services with Kubernetes and Open Source - John McBride, OpenSauced
Friday November 15, 2024 11:55am - 12:30pm MST
At OpenSauced, we power over 40,000 generative AI inferences every day, all through our in-house platform ontop of Kubernetes. The cost of doing this kind of at-scale AI inference with a third party provider API would be astronomic. Thankfully, using Kubernetes, the public cloud, and open-source technologies, we've been able to scale with relatively low costs and a lean stack. In this talk, John will walk through the journey of building a production grade generative AI system using open source technologies, open large language models, and Kubernetes. We'll also explore why we chose to build ontop of Kubernetes for our AI workloads over using a third party provider, and how we're running and managing our AI/ML clusters today. Additionally, we'll dive into the techniques we used to groom our Retrieval-Augmented-Generation pipelines for efficiency ontop of Kubernetes and other practical tips for deploying your own AI services at-scale.
Speakers
avatar for John McBride

John McBride

Sr. Software Engineer, OpenSauced
John is a Sr. Software Engineer at OpenSauced where he also serves as Head of Infrastructure and AI engineer. He is the maintainer of spf13/cobra, the Go CLI bootstrapping library used throughout the CNCF landscape. In the past, he has worked on open source Kuberenetes platforms... Read More →
Friday November 15, 2024 11:55am - 12:30pm MST
Salt Palace | Level 2 | 250 AD
  AI + ML
  • Content Experience Level Any

2:00pm MST

Bloomberg’s Journey to Improve Resource Utilization in a Multi-Cluster Platform - Yao Weng & Leon Zhou, Bloomberg
Friday November 15, 2024 2:00pm - 2:35pm MST
Bloomberg provides an on-premises Data Science Platform (DSP) using cloud-native software to support internal AI model training. It runs on Kubernetes clusters spanning multiple data centers and featuring a diverse range of GPU types. However, managing such a large-scale and heterogeneous GPU environment poses many challenges, such as improving resource utilization, reducing operational costs, and scheduling workloads across different GPU types. In collaboration with the Karmada community, Bloomberg's DSP team has aimed to tackle these challenges by addressing multi-cluster batch job management problems. This talk will delve into the approaches the team has adopted, including: - Intelligently scheduling GPU workloads across multiple clusters - Using Karmada's resource interpreter to support Kubernetes Custom Resource Definitions (CRDs) on top of a multi-cluster architecture - Building a highly available Karmada control plane - Establishing a consistent training job submission interface
Speakers
avatar for Leon Zhou

Leon Zhou

Software Engineer, Bloomberg
Leon Zhou is a software engineer on the Data Science Platform engineering team at Bloomberg. With prior NLP experience, he is now building ML platforms to facilitate machine learning development. He is interested in ML infrastructure to enable large-scale training and complex pipelines... Read More →
avatar for Yao Weng

Yao Weng

Senior Software Engineer, Bloomberg
Yao Weng is a Senior Software Engineer on Bloomberg’s Data Science Platform engineering team. She has contributed extensively to optimizing the company’s Kubernetes environment for high performance compute, model inference, and workflow orchestration. Yao Weng obtained her Ph.D... Read More →
Friday November 15, 2024 2:00pm - 2:35pm MST
Salt Palace | Level 2 | 250 AD
  AI + ML

2:55pm MST

Cloud-Native AI: Wasm in Portable, Secure AI/ML Workloads - Miley Fu, Second State
Friday November 15, 2024 2:55pm - 3:30pm MST
In this talk, we present Wasm as a pioneering solution for running AI/ML workloads in cloud-native environments. Our focus is on demonstrating how Wasm (on the server) facilitates the execution of AI models, such as Llama3, Grok by X, Mixtral etc, across diverse cloud and edge platforms without sacrificing performance. We will discuss the advantages of using Rust and WebAssembly in AI/ML workloads, highlighting aspects like portability, speed, and security. Real-world examples will illustrate the deployment of AI inference models using Wasm runtime in Kubernetes environments, showcasing seamless orchestration and execution across varied devices. This session is aimed at cloud-native practitioners and AI/ML enthusiasts eager to explore innovative approaches in AI deployment.
Speakers
avatar for Miley Fu

Miley Fu

DevRel, WasmEdge
Miley is a Developer Advocate with a passion for empowering developers to build and contribute to open source. With over 5 years of experience working on WasmEdge runtime in CNCF sandbox as the founding member, she talked at KubeCon, KCD Shenzhen, CloudDay Italy, DevRelCon, Open Source... Read More →
Friday November 15, 2024 2:55pm - 3:30pm MST
Salt Palace | Level 2 | 250 AD
  AI + ML

4:00pm MST

Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines on K8s - Meenakshi Kaushik & Shiva Krishna Merla, NVIDIA
Friday November 15, 2024 4:00pm - 4:35pm MST
In this session, we'll cover best practices for deploying, scaling, and managing LLM inference pipelines on Kubernetes (K8s). We'll explore common patterns like inference, retrieval-augmented generation (RAG), and fine-tuning. Key challenges addressed include: [1]. Minimizing initial inference latency with model caching [2] Optimizing GPU usage with efficient scheduling, multi-GPU/node handling, and auto-quantization [3] Enhancing security and management with RBAC, monitoring, auto-scaling, and support for air-gapped clusters We'll also demonstrate building customizable pipelines for inference, RAG, and fine-tuning, and managing them post-deployment. Solutions include [1] a lightweight standalone tool built using operator pattern and [2] KServe, a robust open-source AI inference platform. This session will equip you to effectively manage LLM inference pipelines on K8s, improving performance, efficiency, and security
Speakers
avatar for Meenakshi Kaushik

Meenakshi Kaushik

Product Management, Nvidia
Meenakshi Kaushik leads product management for NIM Operator and KServe.. Meenakshi is interested in the AI and ML space and is excited to see how the technology can enhance human well-being and productivity.
avatar for Shiva Krishna Merla

Shiva Krishna Merla

Senior Software Engineer, NVIDIA
Shiva Krishna Merla is a senior software engineer on the NVIDIA Cloud Native team where he works on GPU cloud infrastructure, orchestration and monitoring. He is focused on enabling GPU-accelerated DL and AI workloads in container orchestration systems such as Kubernetes and OpenShift... Read More →
Friday November 15, 2024 4:00pm - 4:35pm MST
Salt Palace | Level 2 | 250 AD
  AI + ML

4:55pm MST

Best of Both Worlds: Integrating Slurm with Kubernetes in a Kubernetes Native Way - Eduardo Arango Gutierrez, NVIDIA & Angel Beltre, Sandia National Laboratories
Friday November 15, 2024 4:55pm - 5:30pm MST
It's not always clear which container orchestration system is best suited for a given use case. Slurm, for example, is often preferred over Kubernetes when running large-scale distributed workloads. As a result, organizations areoften faced a hard choice: do they deploy Slurm or Kubernetes to service the rising demands of their AI/ML workloads. In this talk, we introduce K-Foundry, an open-source custom controller for KCP that translates Kubernetes jobs to Slurm jobs and exposes Slurm nodes and cluster info as Kubernetes Custom Resource Definitions (CRDs). This integration combines Slurm’s robust job scheduling with Kubernetes' dynamic orchestration and API-driven ecosystem, easing the administration of both clusters through a common API. This session will end with a live demo, where attendees will see how this integration bridges the gap between cloud and HPC, facilitating resource management and optimizing performance for large-scale AI and LLM tasks.
Speakers
avatar for Eduardo Arango Gutierez DE

Eduardo Arango Gutierez DE

Senior systems software engineer, NVIDIA
Eduardo is a Senior Systems Software Engineer at NVIDIA, working on the Cloud Native Technologies team. Eduardo has focused on enabling users to build and deploy containers on distributed environments.
avatar for Angel Beltre

Angel Beltre

Senior Member of Technical Staff, Sandia National Laboratories
Angel Beltre serves as a senior member of the technical staff within the Scalable System Software department at Sandia National Laboratories. He is a contributor to the CSSE Computing-as-a-Service (CaaS) initiative, aimed at streamlining the deployment of modeling and simulation tools... Read More →
Friday November 15, 2024 4:55pm - 5:30pm MST
Salt Palace | Level 2 | 250 AD
  AI + ML
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
  • 🚨 Contribfest
  • 🪧 Poster Sessions
  • AI + ML
  • Breaks
  • ⚡ Lightning Talks
  • Cloud Native Experience
  • Cloud Native Novice
  • CNCF-hosted Co-located Events
  • Connectivity
  • Data Processing + Storage
  • Diversity + Equity + Inclusion
  • Emerging + Advanced
  • Experiences
  • Keynote Sessions
  • Maintainer Track
  • Observability
  • Operations + Performance
  • Platform Engineering
  • Project Opportunities
  • Registration
  • SDLC
  • Security
  • Solutions Showcase
  • Sponsor-hosted Co-located Event
  • Tutorials