KubeCon + CloudNativeCon North America 2024: Full Schedule

In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

11:15am MST

GitOops... I Did It Again! Protecting Your GitOps System from Being Used for Privilege Escalation - Oreen Livni & Elad Pticha, Cycode

Wednesday November 13, 2024 11:15am - 11:50am MST

Salt Palace | Level 2 | 250 AD

From data theft to privilege escalation in the Kubernetes cluster, you don't want to be the one telling your boss that your GitOps system has been compromised. This talk covers the security of GitOps tools, highlighting common misconfiguration pitfalls and how to avoid them. We will share the story of CVE-2024-31989, a critical vulnerability we discovered in the popular tool Argo. When installed with the default configuration, this vulnerability allowed privilege escalation from any access point to the cluster (such as a webshell) to complete cluster takeover. We will discuss common insecure configurations like this and provide examples from popular open-source projects to explain how your organization can protect itself from these risks. Attendees will receive a guide and practical tools to protect their GitOps systems against such threats.

Speakers

Elad Pticha

Security Researcher, Cycode

Elad is a passionate security researcher with a focus on software supply chain and web application security. He dedicates his time to writing security research tools and finding vulnerabilities across a broad spectrum, from open-source projects and web applications to IoT devices... Read More →

Oreen Livni

Security Researcher, Cycode

Oreen Livni is a passionate security researcher specializing in application and supply chain security, Domain, and networking. With a focus on software supply chain vulnerabilities. Alongside his professional commitments, he immerses himself in art, gardening, and the world of surfing... Read More →

Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 2 | 250 AD

Security

Content Experience Level Any

12:10pm MST

The Hard Truth About GitOps and Database Rollbacks - Rotem Tamir, Ariga

Wednesday November 13, 2024 12:10pm - 12:45pm MST

Salt Palace | Level 2 | 250 AD

For two decades now, the common practice for handling rollbacks of database schema migrations has been pre-planned "down migration scripts". A closer examination of this widely accepted truth reveals critical gaps that result in teams relying on risky, manual operations to roll back schema migrations in times of crisis. In this talk, we show why our existing tools and practices cannot deliver on the GitOps promise of "declarative" and "continuously reconciled" workflows and how we can use the Operator Pattern to build a new solution for robust and safe schema rollbacks.

Speakers

Rotem Tamir

CTO, Ariga

Rotem Tamir (39), father of two. Co-founder and CTO of Ariga, co-maintainer of Atlas and Ent. Ex-data platform architect at Nexar, infrastructure team lead at ironSource.

Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 2 | 250 AD

SDLC

Content Experience Level Intermediate

2:30pm MST

Secure by Design CI/CD: Practical Insights from Adobe and Autodesk - Vikram Sethi, Adobe Inc. & Jesse Sanford, Autodesk

Wednesday November 13, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 2 | 250 AD

Worried that your CI/CD pipelines and developer workflows are insecure? Lost in security buzzwords like SBOMs, provenance, attestation, SLSA, OpenSSF, and more? Seeking a clear, actionable reference architecture to secure your pipeline? Whether you are just getting started on your Software Supply Chain Security journey, or are ready to take it to the next level navigating this diverse ecosystem is challenging. Join Vikram and Jesse as they present a reference architecture for secure-by-default CI/CD pipelines and show you effective security controls at every step. See firsthand how these industry giants safeguarded their pipelines while maintaining agility and innovation. This talk will showcase their work, and the work of the CNOE (Cloud Native Operational Excellence) group, which aims to build a paved path through this problem space by producing opinionated software collections or “CNOE stacks” that can be adapted to meet you where your technology is.

Speakers

Jesse Sanford

Software Architect, Autodesk

Jesse is a lifelong software engineer focused on site reliability and Infosec. Currently architecting the juncture of platform engineering and security/compliance for Autodesk's Developer Enablement team. He regularly contributes to open source and frequently speaks about his work... Read More →

Vikram Sethi

Principal Scientist, Adobe Inc.

Vikram is a Principal Scientist in the Developer Platforms organization at Adobe. Vikram has been architecting and building the Developer Experience for Adobe's Internal Developer Platform for the last few years. In the last year or so, Vikram has been working on rearchitecting Adobe's... Read More →

KCNA24 Secure by Design CI CD Practical Insights from Adobe and Autodesk pdf

Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 2 | 250 AD

SDLC

Content Experience Level Any

3:25pm MST

Scale Job Triggering with a Distributed Scheduler - Cassie Coyle & Artur Souza, Diagrid

Wednesday November 13, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 2 | 250 AD

Imagine scheduling thousands or millions of jobs that are persisted and triggered timely and resilient to downtime. Some jobs might be triggered every second while others need to reliably be triggered on the first day of the month. Achieving high throughput and reliability is critical for the performance and operational efficiency of modern distributed systems. How can traditional cron job scheduling be extended? How can distributed systems handle job scheduling with minimal downtime? What challenges arise when scaling job scheduling to thousands or millions of jobs? In this session, Artur and Cassie will delve into the design of Dapr’s distributed Scheduler and how users can start using it today. You will gain a comprehensive understanding of how Dapr’s Scheduler unblocks scalability of actors and workflows while also enabling new capabilities, like delayed pubsub and schedule job API.

Speakers

Artur Souza

Head of Engineering, Diagrid

I am a maintainer of Dapr since 2019, helped the project reach the 1.0 stable version and keeping frequent releases since then. Currently Head of Engineering at Diagrid, leading the engineering teams building Conductor and the next generation of managed cloud native APIs via Dapr... Read More →

Cassie Coyle

Software Engineer, Diagrid

Cassie, a devoted software engineer at Diagrid actively contributes to Dapr, focusing on Go backend development to simplify the creation of resilient, event-driven, and microservices-based apps. She is a member of the Dapr Day and AppDeveloperCon 2024 program committees. Her work... Read More →

ScaleJobTriggeringWithADistributedScheduler pdf

Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 2 | 250 AD

SDLC

Content Experience Level Intermediate

4:30pm MST

Perform Laser Focused Deployments by Deciding in Advance the Blast Radius - Kostis Kapelonis, Octopus deploy

Wednesday November 13, 2024 4:30pm - 5:05pm MST

Salt Palace | Level 2 | 250 AD

Progressive Delivery is an advanced deployment method that allows for zero-downtime application releases. Argo Rollouts is a Kubernetes controller that allows you to adopt progressive delivery in the form of blue/green and canary deployments. We see a lot of teams that choose an arbitrary number of clients that access the new version of a canary. Yes, it is very easy to send only 10% of the traffic to the new version of a Kubernetes deployment. But sometimes you want to choose WHICH 10% sees the new traffic. In this talk we will see several approaches on pinning down specific clients to the old or new version and advanced scenarios for sending canary traffic only to a specific subset of users such as internal employees or customers who have expressed their interest on seeing brand new releases as soon as possible.

Speakers

Kostis Kapelonis

Developer Advocate, Codefresh by Octopus Deploy

Kostis is a software engineer/technical-writer dual class character. He lives and breathes automation, good testing practices and stress-free deployments with GitOps.

Laser focused deployments pdf

Wednesday November 13, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 2 | 250 AD

SDLC

Content Experience Level Intermediate

5:25pm MST

Taming Your Application’s Environments - Marcos Lilljedahl, Dagger & Mauricio "Salaboy" Salatino, Diagrid

Wednesday November 13, 2024 5:25pm - 6:00pm MST

Salt Palace | Level 2 | 250 AD

How coupled are your applications code and pipelines to its target cloud or on-prem environment? Kubernetes helps us to abstract how we run our workloads. However, there are other aspects, like infrastructure dependencies, service configuration, build process, deployment descriptors, etc., which need to be considered to make an application portable across multiple environments. Focusing on these aspects make a big difference when migrating apps to reduce costs, meeting compliance requirements or leveraging a specific tech only available somewhere else. Join us to cover three techniques you can implement to level up your SDLC: - Modularizing and enhancing our delivery pipelines to simplify complex environments (Crossplane and Dagger) - Building consistent experiences around well-known interfaces (CloudEvents, Dapr, and OpenFeature) to minimize runtime drift. - Design with separation of concerns to enable fast feedback loops between development and operation teams (Argo CD, Knative)

Speakers

Marcos Lilljedahl

Software Engineer, Dagger

Dad, Docker Captain, OSS lover, helmsman and wine drinker. Father of a joyful kid and wannabe surfer. I like listening to jazz music and tinker with some fun projects when possible. Avid open source contributor.

Mauricio Salatino

OSS Software Engineer, Diagrid

Mauricio works as an Open Source Software Engineer at @Diagrid, contributing to and driving initiatives for the Dapr OSS project. Mauricio also serves as a Steering Committee member for the Knative Project and Co-Leading the Knative Functions initiative. He published a book titled... Read More →

Wednesday November 13, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 2 | 250 AD

SDLC

Content Experience Level Intermediate

11:00am MST

How We Made OpenTelemetry Be Our Fitness Tracker for Your CI/CD Pipelines! - Nicolas Woerner, Clario & Andreas Grabner, Dynatrace

Thursday November 14, 2024 11:00am - 11:35am MST

Salt Palace | Level 2 | 250 AD

CI/CD pipelines are the heartbeat of modern cloud-native software delivery. Healthy pipelines ensure rapid and continuous deployments every time code gets committed to the Git repositories! Every new repository and commit puts more load on the CI/CD tool making it more challenging to keep this crucial heartbeat healthy! In this session, engineers from Clario will demonstrate how they leverage OpenTelemetry to observe, validate, report and optimize their CI/CD pipelines, keeping their deployments healthy despite increased scale and unlocking the full potential of modern software delivery on Kubernetes with GitLab.

Speakers

Andi Grabner

CNCF Ambassador and DevRel, Dynatrace

Andreas Grabner (@grabnerandi) has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a CNCF ambassador, contributor to the CNCF project keptn and a DevRel for Dynatrace. Andreas is also a regular... Read More →

Nicolas Woerner

Associate DevOps Engineer, Clario

Nicolas Wörner works in the Platform Engineering Team at Clario. With a background in software and DevOps engineering he focuses on continuously enhancing the software delivery workflow at Clario. Nicolas is passionate about leveraging CNCF software to drive efficiency and reliability... Read More →

Kubecon NA 2024 Gitlab CICD Pipelines Otel pdf

Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 250 AD

SDLC

Content Experience Level Intermediate

11:55am MST

From Chaos to Calm: Building a Unified and Scalable CI/CD Pipeline at Akamai - Tomer Patel, Akamai Technologies Inc.

Thursday November 14, 2024 11:55am - 12:30pm MST

Salt Palace | Level 2 | 250 AD

Are you struggling with a chaotic development process? Join Akamai's talk and discover how we built a unified and scalable CI/CD pipeline, saving 40% of our QA, Performance, Dev, and Ops daily work, and how you can do that in your organization! This session dives into the architecture, key features, and its impact on development efficiency. You will learn how to: - Conquer cloud-native deployments by adding the right tools - such as Argo Rollouts, and Backstage - Integrate CI/CD tools (ArgoCD, Jenkins, DevSpace, Grafana, Prometheus, Thanos) for a smoother workflow. - Leverage best-in-breed, cost-efficient open-source solutions

Speakers

Tomer Patel

Senior Engineering Manager, Akamai Technologies Inc.

Tomer currently works as Senior Engineering Manager at Akamai Technologies, where he leads a group of Data engineers, Software developers and DevOps at scale. Previously Tomer worked as Team Lead at Clarizen (Now Planview).

From Chaos to Calm Building a Unified and Scalable CI CD Pipeline at Akamai pdf

Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 2 | 250 AD

SDLC

Content Experience Level Intermediate

2:30pm MST

Mastering Cell-Based Architecture: Practical Solutions and Best Practices - Shweta Vohra, Booking.com & Asanka Abeysinghe, WSO2

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 2 | 250 AD

Are you struggling to validate your cell boundaries or facing challenges with greenfield versus brownfield cell-based architectures (CBA)? Do you find it difficult to define enterprise-wide cell boundaries or wish there were best practices to guide you? If these pain points sound familiar, this session is tailored for you. In this talk, we will first guide you through the process of defining an enterprise-wide cell-based architecture for your organization or context. Then we will explore best practices for greenfield, brownfield, and hybrid cell implementations using CBA. By translating common user challenges into actionable implementation references, we aim to elevate your understanding of CBA with real-world use cases and best practices. This session will also cover best practices for the data, security, application, and infrastructure layers, ensuring a comprehensive approach to CBA implementation. Join us to take your knowledge of CBA to the next level!

Speakers

Shweta Vohra

Lead Architect, Booking.com

Shweta Vohra is an Architect, Author, and Inventor with over 20 years of experience in the software industry. Her expertise spans from complex embedded systems design to hybrid cloud-native solutions, and most recently, the creation of data and machine learning platforms. She is the... Read More →

Asanka Abeysinghe

CTO, WSO2

Asanka, WSO2's CTO, is a technology visionary with over 20 years of experience designing and implementing scalable distributed systems, microservices, and business integration solutions. He advances WSO2's corporate reference architecture, collaborates with customers and industry... Read More →

CBA KubeConNA 2024 V1.0 pdf

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 2 | 250 AD

SDLC

Content Experience Level Intermediate

3:25pm MST

You're Overpaying for CI - Kyle Penfound, Dagger

Thursday November 14, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 2 | 250 AD

In recent years, the computational power of developer workstations has surged dramatically. With so much compute available at every developer's fingertips, why do we continue to waste time and money with lengthy build times on sluggish CI compute? Some forward-thinking organizations are re-evaluating this approach, questioning the necessity of paying for CI compute when the developers' workstations, which are already more powerful and paid for, remain underutilized. In this technical session we will transition a fully functioning production CI system from cloud-based compute to local workstation compute. We will explore the intricacies of replicating the functionality of a modern CI system, leveraging the power of developer workstations, all using open source software.

Speakers

Kyle Penfound

Solutions Engineer, Dagger

Kyle is part of the ecosystem team at dagger.io working on the future of CICD. He has a background in DevOps and just loves giving demos!

Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 2 | 250 AD

SDLC

Content Experience Level Any

4:30pm MST

Bring the Joy Back to Deployments! - Murriel McCabe, Google Cloud & Elizabeth Ponce, Airbnb

Thursday November 14, 2024 4:30pm - 5:05pm MST

Salt Palace | Level 2 | 250 AD

Destination: deployment! Your feature is complete. Your application is ready. You want to share your hard work with the world. How do you pick the optimal deployment process? Where do you even start? In this talk, Murriel and Elizabeth will be your guides on a brief tour of several open source tools for deploying a workload into Kubernetes. Our journey will begin with manual hello world deployments and from there we will explore some of the most common modern tools for CI/CD, including a demo speedrun! Major destinations on this tour will include helm, kustomize, skaffold, ArgoCD, Tekton, Jenkins and JenkinsX. We will walk through the fundamentals of CI/CD, explore tradeoffs and discuss the process for implementing these tools in your software development lifecycle. By the end of this talk, you'll be equipped to begin navigating the CI/CD landscape and will leave with resources that will enable you to get started quickly and begin testing in your own environment.

Speakers

Murriel McCabe

Customer Engineer, Google Cloud

Murriel is a Customer Engineer with Google Cloud, and works with enterprise customers to solve technical and business challenges and build applications on the cloud. She is currently enthusiastic about DevOps and Platform Engineering, Kubernetes, and the Developer Experience. She... Read More →

Elizabeth Ponce

Software Engineer, Airbnb

Elizabeth is a Software Engineer in Search Infrastructure at Airbnb and has a non traditional pathway from Customer Support Specialist to Software Engineering at Airbnb. As a Global Co-Chair for GemTech, Airbnb's Genders Marginalized in Tech employee resource group, Elizabeth actively... Read More →

Bring Joy to Your Deployments Murriel & Elizabeth pdf

Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 2 | 250 AD

SDLC

Content Experience Level Beginner

5:25pm MST

Navigating Failures in Pods with Devices: Challenges and Solutions - Sergey Kanzhelev, Google & Mrunal Patel, Red Hat

Thursday November 14, 2024 5:25pm - 6:00pm MST

Salt Palace | Level 2 | 250 AD

Pods are no longer running with just CPU and Memory. We provision GPUs, network cards, request special placement of those devices and allocated memory. And the more efficient or effective you want your set up to be, the more complicated those device requirements are, the more chances you will hit an edge case Kubernetes has not accounted for yet. Come to the talk to learn from Node Maintainers about some of those shortcomings in Kubernetes. If you are only starting with AI/ML and devices, you will be interested to learn what to expect. If you have lots of experience, you may still learn new things. With the increased focus on AI/ML workloads, highlighting those scenarios is important. As Kubernetes plans to fix those problems, you can give feedback on what would work best for you.

Speakers

Sergey Kanzhelev

Staff Software Engineer, Google

Sergey Kanzhelev is a seasoned open source and cloud native maintainer working actively on Kubernetes. Sergey is serving as co-chair of SIG node. He is also one of the founders of OpenTelemetry. He is working on engineering aspect of software and its practical application. He is contributing... Read More →

Mrunal Patel

Distinguished Engineer, Red Hat

Mrunal Patel is a Senior Principal Software Engineer at Red Hat working on containers for Openshift. He is a maintainer of runc/libcontainer and the OCI runtime specification. He started the CRI-O runtime. He is a SIG-Node chair and tech lead.

KubeCon NA 2024 Navigating Failures in Pods With Devices Challenges and Solutions.pptx pdf

Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 2 | 250 AD

AI + ML

Content Experience Level Any

11:00am MST

Better Together! GPU, TPU and NIC Topological Alignment with DRA - John Belamaric, Google & Patrick Ohly, Intel

Friday November 15, 2024 11:00am - 11:35am MST

Salt Palace | Level 2 | 250 AD

AI/ML workloads on Kubernetes demand ultra-high performance. If your training or multi-GPU inference job spans nodes, your GPUs will use the network, talking through a NIC over local PCIe. But not all NICs are equal! To get the best performance, you need a NIC which is as "close" to the GPU as possible. Unfortunately, the Kubernetes extended resources API does not have enough information and does not give you control over which specific devices are assigned. Dynamic Resource Allocation, the successor API, gives you this power. Come to this session to learn about DRA, how it is improving overall device support in K8s, and how to use it to allocate multiple GPUs, NICs, and TPUs to get the maximum performance out of your infrastructure.

Speakers

Patrick Ohly

Principal Engineer, Intel

Patrick Ohly is a software engineer at Intel GmbH, Germany. In the past he has worked on performance analysis software for HPC clusters ("Intel Trace Analyzer and Collector") and cluster technology in general (PTP and hardware time stamping). Since January 2009 he has worked for Intel... Read More →

John Belamaric

Senior Staff Software Engineer, Google

John is a Sr Staff SWE, co-chair of K8s SIG Architecture and of K8s WG Device Management, helping lead efforts to improve how GPUs, TPUs, NICs and other devices are selected, shared, and configured in Kubernetes. He is also co-founder of Nephio, an LF project for K8s-based automation... Read More →

[PUBLIC] 2024 KubeCon NA Better Together! GPU, TPU and NIC Topological Alignment with DRA pdf

Friday November 15, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 250 AD

AI + ML

Content Experience Level Intermediate

11:55am MST

Building Massive-Scale Generative AI Services with Kubernetes and Open Source - John McBride, OpenSauced

Friday November 15, 2024 11:55am - 12:30pm MST

Salt Palace | Level 2 | 250 AD

At OpenSauced, we power over 40,000 generative AI inferences every day, all through our in-house platform ontop of Kubernetes. The cost of doing this kind of at-scale AI inference with a third party provider API would be astronomic. Thankfully, using Kubernetes, the public cloud, and open-source technologies, we've been able to scale with relatively low costs and a lean stack. In this talk, John will walk through the journey of building a production grade generative AI system using open source technologies, open large language models, and Kubernetes. We'll also explore why we chose to build ontop of Kubernetes for our AI workloads over using a third party provider, and how we're running and managing our AI/ML clusters today. Additionally, we'll dive into the techniques we used to groom our Retrieval-Augmented-Generation pipelines for efficiency ontop of Kubernetes and other practical tips for deploying your own AI services at-scale.

Speakers

John McBride

Sr. Software Engineer, OpenSauced

John is a Sr. Software Engineer at OpenSauced where he also serves as Head of Infrastructure and AI engineer. He is the maintainer of spf13/cobra, the Go CLI bootstrapping library used throughout the CNCF landscape. In the past, he has worked on open source Kuberenetes platforms... Read More →

Kubecon 24 Building Massive Scale Generative AI Services with Kubernetes and Open Source John McBride, OpenSauced pdf

Friday November 15, 2024 11:55am - 12:30pm MST
Salt Palace | Level 2 | 250 AD

AI + ML

Content Experience Level Any

2:00pm MST

Bloomberg’s Journey to Improve Resource Utilization in a Multi-Cluster Platform - Yao Weng & Leon Zhou, Bloomberg

Friday November 15, 2024 2:00pm - 2:35pm MST

Salt Palace | Level 2 | 250 AD

Bloomberg provides an on-premises Data Science Platform (DSP) using cloud-native software to support internal AI model training. It runs on Kubernetes clusters spanning multiple data centers and featuring a diverse range of GPU types. However, managing such a large-scale and heterogeneous GPU environment poses many challenges, such as improving resource utilization, reducing operational costs, and scheduling workloads across different GPU types. In collaboration with the Karmada community, Bloomberg's DSP team has aimed to tackle these challenges by addressing multi-cluster batch job management problems. This talk will delve into the approaches the team has adopted, including: - Intelligently scheduling GPU workloads across multiple clusters - Using Karmada's resource interpreter to support Kubernetes Custom Resource Definitions (CRDs) on top of a multi-cluster architecture - Building a highly available Karmada control plane - Establishing a consistent training job submission interface

Speakers

Leon Zhou

Software Engineer, Bloomberg

Leon Zhou is a software engineer on the Data Science Platform engineering team at Bloomberg. With prior NLP experience, he is now building ML platforms to facilitate machine learning development. He is interested in ML infrastructure to enable large-scale training and complex pipelines... Read More →

Yao Weng

Senior Software Engineer, Bloomberg

Yao Weng is a Senior Software Engineer on Bloomberg’s Data Science Platform engineering team. She has contributed extensively to optimizing the company’s Kubernetes environment for high performance compute, model inference, and workflow orchestration. Yao Weng obtained her Ph.D... Read More →

Kubecon NA 2024 Slides Bloomberg's Journey to Improve Resource Utilization in a Multi Cluster Platform pptx

Friday November 15, 2024 2:00pm - 2:35pm MST
Salt Palace | Level 2 | 250 AD

AI + ML

Content Experience Level Intermediate

2:55pm MST

Cloud-Native AI: Wasm in Portable, Secure AI/ML Workloads - Miley Fu, Second State

Friday November 15, 2024 2:55pm - 3:30pm MST

Salt Palace | Level 2 | 250 AD

In this talk, we present Wasm as a pioneering solution for running AI/ML workloads in cloud-native environments. Our focus is on demonstrating how Wasm (on the server) facilitates the execution of AI models, such as Llama3, Grok by X, Mixtral etc, across diverse cloud and edge platforms without sacrificing performance. We will discuss the advantages of using Rust and WebAssembly in AI/ML workloads, highlighting aspects like portability, speed, and security. Real-world examples will illustrate the deployment of AI inference models using Wasm runtime in Kubernetes environments, showcasing seamless orchestration and execution across varied devices. This session is aimed at cloud-native practitioners and AI/ML enthusiasts eager to explore innovative approaches in AI deployment.

Speakers

Miley Fu

DevRel, WasmEdge

Miley is a Developer Advocate with a passion for empowering developers to build and contribute to open source. With over 5 years of experience working on WasmEdge runtime in CNCF sandbox as the founding member, she talked at KubeCon, KCD Shenzhen, CloudDay Italy, DevRelCon, Open Source... Read More →

Friday November 15, 2024 2:55pm - 3:30pm MST
Salt Palace | Level 2 | 250 AD

AI + ML

Content Experience Level Beginner

4:00pm MST

Best Practices for Deploying LLM Inference, RAG and Fine Tuning Pipelines on K8s - Meenakshi Kaushik & Shiva Krishna Merla, NVIDIA

Friday November 15, 2024 4:00pm - 4:35pm MST

Salt Palace | Level 2 | 250 AD

In this session, we'll cover best practices for deploying, scaling, and managing LLM inference pipelines on Kubernetes (K8s). We'll explore common patterns like inference, retrieval-augmented generation (RAG), and fine-tuning. Key challenges addressed include: [1]. Minimizing initial inference latency with model caching [2] Optimizing GPU usage with efficient scheduling, multi-GPU/node handling, and auto-quantization [3] Enhancing security and management with RBAC, monitoring, auto-scaling, and support for air-gapped clusters We'll also demonstrate building customizable pipelines for inference, RAG, and fine-tuning, and managing them post-deployment. Solutions include [1] a lightweight standalone tool built using operator pattern and [2] KServe, a robust open-source AI inference platform. This session will equip you to effectively manage LLM inference pipelines on K8s, improving performance, efficiency, and security

Speakers

Meenakshi Kaushik

Product Management, Nvidia

Meenakshi Kaushik leads product management for NIM Operator and KServe.. Meenakshi is interested in the AI and ML space and is excited to see how the technology can enhance human well-being and productivity.

Shiva Krishna Merla

Senior Software Engineer, NVIDIA

Shiva Krishna Merla is a senior software engineer on the NVIDIA Cloud Native team where he works on GPU cloud infrastructure, orchestration and monitoring. He is focused on enabling GPU-accelerated DL and AI workloads in container orchestration systems such as Kubernetes and OpenShift... Read More →

best practices llm inference rag finetuning pdf

Friday November 15, 2024 4:00pm - 4:35pm MST
Salt Palace | Level 2 | 250 AD

AI + ML

Content Experience Level Beginner

4:55pm MST

Best of Both Worlds: Integrating Slurm with Kubernetes in a Kubernetes Native Way - Eduardo Arango Gutierrez, NVIDIA & Angel Beltre, Sandia National Laboratories

Friday November 15, 2024 4:55pm - 5:30pm MST

Salt Palace | Level 2 | 250 AD

It's not always clear which container orchestration system is best suited for a given use case. Slurm, for example, is often preferred over Kubernetes when running large-scale distributed workloads. As a result, organizations areoften faced a hard choice: do they deploy Slurm or Kubernetes to service the rising demands of their AI/ML workloads. In this talk, we introduce K-Foundry, an open-source custom controller for KCP that translates Kubernetes jobs to Slurm jobs and exposes Slurm nodes and cluster info as Kubernetes Custom Resource Definitions (CRDs). This integration combines Slurm’s robust job scheduling with Kubernetes' dynamic orchestration and API-driven ecosystem, easing the administration of both clusters through a common API. This session will end with a live demo, where attendees will see how this integration bridges the gap between cloud and HPC, facilitating resource management and optimizing performance for large-scale AI and LLM tasks.

Speakers

Eduardo Arango Gutierez DE

Senior systems software engineer, NVIDIA

Eduardo is a Senior Systems Software Engineer at NVIDIA, working on the Cloud Native Technologies team. Eduardo has focused on enabling users to build and deploy containers on distributed environments.

Angel Beltre

Senior Member of Technical Staff, Sandia National Laboratories

Angel Beltre serves as a senior member of the technical staff within the Scalable System Software department at Sandia National Laboratories. He is a contributor to the CSSE Computing-as-a-Service (CaaS) initiative, aimed at streamlining the deployment of modeling and simulation tools... Read More →

Friday November 15, 2024 4:55pm - 5:30pm MST
Salt Palace | Level 2 | 250 AD

AI + ML

Content Experience Level Intermediate