Loading…
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
strong>Salt Palace | Level 1 | Grand Ballroom BDF [clear filter]
arrow_back View All Dates
Wednesday, November 13
 

11:15am MST

All-Your-GPUs-Are-Belong-to-Us: An Inside Look at NVIDIA's Self-Healing GeForce NOW Infrastructure - Ryan Hallisey & Piotr Prokop PL, NVIDIA
Wednesday November 13, 2024 11:15am - 11:50am MST
GeForce Now is a game streaming platform used by 20+ million gamers worldwide. Kubernetes is at the core of its infrastructure powering game workloads and other containerized services and tools. The infrastructure includes many regional clusters with 10s of thousands of GPUs capable of supporting 100s of thousands concurrent gamers. To operate a large Kubernetes infrastructure efficiently, NVIDIA built a GPU maintenance API to enable automated lifecycle management of critical infrastructure components. When combined with a few operators, this API facilitates planning and coordination of crucial driver, GPU, and Kubernetes upgrades at an unprecedented scale, as well as empowering self-healing operators to detect and remediate failures to avoid outages. In this talk, we will share: - How K8s and KubeVirt powers Nvidia GeForce Now - Nvidia’s GPU Maintenance API solution - NVIDIA’s vision for doing automated GPU maintenance at scale in K8s
Speakers
avatar for Ryan Hallisey

Ryan Hallisey

Software Engineer, NVIDIA
Ryan is a software engineer at NVIDIA. He works on building data centers powered by Kubernetes and KubeVirt for NVIDIA products.
avatar for Piotr Prokop

Piotr Prokop

Senior Software Engineer, NVIDIA
Piotr is a Senior Software Engineer at NVIDIA. He works on running high performance workloads powered by Kubernetes for NVIDIA products.
Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

12:10pm MST

Automated Multi-Cloud Blue-Green Cluster Rotations: Zero Downtime Upgrades at Scale - Sourav Khandelwal, Databricks
Wednesday November 13, 2024 12:10pm - 12:45pm MST
I will present the system developed for cluster rotations across Databricks’ fleet of over a thousand cloud-managed k8s clusters on AWS, Azure, and GCP. Blue-green cluster rotations, or cluster swaps (upgrading by creating a new k8s cluster with a new version/configuration & shifting workloads from the old cluster), allow us to implement major infrastructure changes and upgrade k8s versions with low risk through staged rollouts, seamless rollbacks, zero downtime, and minimal operator intervention. Our system includes a k8s-style continuous reconciliation mechanism to manage cluster swap lifecycles, a fast and reliable cluster state change discovery system, and a k8s workload migration system. We will share methodologies and experiences in constructing this loosely coupled system that orchestrates product workloads and cloud provider APIs for automated cluster swaps. This session will explore the challenges faced, and the benefits of automating large-scale, multi-cloud k8s upgrades.
Speakers
avatar for Sourav Khandelwal

Sourav Khandelwal

Sr. Software Engineer, Databricks
I am a seasoned software engineer with over 10 years of experience in designing and managing large-scale platforms in cloud-native environments. At Databricks, my significant contributions have been pivotal in launching our next-generation cloud infrastructure that helped to transition... Read More →
Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

2:30pm MST

Better Pod Availability: A Survey of the Many Ways to Manage Workload Disruptions - Zach Loafman, Google
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Kubernetes Pods are ephemeral, but some are more ephemeral than others. Kubernetes provides a dizzying array of options to manage and handle Pod disruption. From PodDisruptionBudgets, to "safe-to-evict" annotations, GracefulTermination timeouts and more, it can be incredibly hard to determine the optimal solution for handling Pod disruption and how to manage gracefully terminating your application. Thankfully, due to the extensible nature of Kubernetes we can build CRDs and controllers that can simplify these complex topics for end users. In this talk, we'll present an in-depth analysis of the built-in options and how they work (or don't). While this problem is not unique to game-serving, we'll deep-dive and explain how Agones (an open-source session orchestration system layered on Kubernetes) solves this problem with a simple abstraction to hide the complexity!
Speakers
avatar for Zach Loafman

Zach Loafman

Staff Software Engineer, Google
Zach leads Google’s GKE Games team. He was previously lead of the Kubernetes Control Plane team for GKE, lead of the GKE Cluster Lifecycle team, worked on Kubernetes prior to GA, and was one of the founding members of the Google Kubernetes Engine team.
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

3:25pm MST

Cash App's Journey Into a Multi-Cluster Ecosystem - Rachel Sheikh, Cash App
Wednesday November 13, 2024 3:25pm - 4:00pm MST
Cash App's Compute team is responsible for the health and maintenance of the company's Kubernetes clusters, and the enablement of service owners to deploy their services into these clusters with confidence. Over the past year, we've made strides in improving our reliability and uptime, part of which involved introducing a paradigm around creating new Kubernetes clusters in our service ecosystem that allow us to seamlessly transition services in/out of to simplify cluster upgrades and provide us with guardrails against common outages. This talk intends to walk you through our experience introducing new Kubernetes clusters for our services at Cash App, migrating and splitting service traffic across clusters with zero downtime, and thinking through tooling adoption / creation to simplify cluster maintenance as our overhead scales.
Speakers
avatar for Rachel Sheikh

Rachel Sheikh

Ms., Cash App
I'm a software engineer with a decade of experience building and scaling backend services across various industries. When I'm not working on clusters or writing Go, I'm probably watching pro League of Legends or taking pictures of my dog.
Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any

4:30pm MST

Museum of Weird Bugs: Our Favorites from 8 Years of Service Mesh Debugging - Tom Dean & Alen Haric, Buoyant
Wednesday November 13, 2024 4:30pm - 5:05pm MST
Over the past 8 years, we've fixed a lot of bugs in Linkerd. Many of these were straightforward, but some manifested in strange ways, or only showed up in unique situations, or otherwise surprised us. Some of them were just plain funny. In this talk, we will run through a couple of Linkerd's favorites: the most interesting, weird, and memorable bugs we've found and fixed Linkerd. We describe how they originally manifested (usually in someone else's production system),  how we went about tackling them (often by educating the reporter on how to construct a useful bug report), and the sometimes long and windy path to finally fixing them.
Speakers
avatar for Tom Dean

Tom Dean

Field Engineer, Buoyant
Tom Dean started programming BASIC on Apple IIs over 40 years ago, and has been hooked on tech since then. A long-time user of Linux and Open Source, he has been expanding his Cloud, Cloud Native and adjacent subject matter knowledge to become a more well-rounded technologist, and... Read More →
AH

Alen Haric

Solutions Architect, Buoyant
Wednesday November 13, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Cloud Native Experience
  • Content Experience Level Any

5:25pm MST

Creating Paved Paths for Platform Engineers - Ritesh Patel, Nirmata; Abby Bangser, Syntasso; Viktor Farcic, Upbound; Nicholas Morey, Akuity; Praseeda Sathaye, Amazon
Wednesday November 13, 2024 5:25pm - 6:00pm MST
The platform engineering team's role has evolved into a pivotal one as the custodian of the internal developer platform. However, these teams often find themselves in a quagmire of identifying the right components to include in their platforms, particularly in the ever-expanding CNCF landscape. This panel session discusses these challenges by exploring the concept of 'Paved Paths' as a strategic approach to guide platform teams in their journey of building an internal developer platform (IDP). 'Paved Paths' offers a solution by providing platform engineering teams with proven reference architectures (e.g. CNOE and the BACK Stack). This approach prevents them from starting from scratch and getting lost in the vast CNCF landscape. By offering proven and opinionated reference architectures, platform teams can focus on enhancing developer experiences and optimizing higher-level workflows rather than grappling with the complexities of identifying foundational components for their IDP.
Speakers
avatar for Viktor Farcic

Viktor Farcic

Developer Advocate, Upbound
Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.
avatar for Ritesh Patel

Ritesh Patel

Co-Founder & VP Product, Nirmata
Ritesh Patel is Co-founder and leads Products at Nirmata, the creators of Kyverno. At Nirmata, he is responsible for commercial products for Kubernetes security, governance, and automation. He also leads key technology partnerships. Ritesh has 20+ years of experience delivering enterprise... Read More →
avatar for Praseeda Sathaye

Praseeda Sathaye

Principal Specialist Solution Architect, Amazon (AWS)
Praseeda Sathaye is a Principal Specialist SA for App Modernization and Containers at Amazon Web Services based in Bay Area California. She has been focused on helping customers speed their cloud-native adoption journey by modernizing their platform infrastructure, internal architecture... Read More →
avatar for Nicholas Morey

Nicholas Morey

Senior Developer Advocate, Akuity
Nicholas Morey is a Platform Engineer with a passion for DevOps practices. He is on the team at Akuity as a Developer Advocate, working with the community on anything Argo and Kargo-related. He is an experienced Argo CD operator and a Certified Kubernetes Administrator.
avatar for Abby Bangser

Abby Bangser

Principal Engineer, Syntasso
Abby is a Principal Engineer at Syntasso delivering Kratix, an open-source cloud-native framework for building internal platforms on Kubernetes. Her keen interest in supporting internal development comes from over a decade of experience in consulting and product delivery roles across... Read More →
Wednesday November 13, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date - 
  • 🚨 Contribfest
  • 🪧 Poster Sessions
  • AI + ML
  • Breaks
  • ⚡ Lightning Talks
  • Cloud Native Experience
  • Cloud Native Novice
  • CNCF-hosted Co-located Events
  • Connectivity
  • Data Processing + Storage
  • Emerging + Advanced
  • Experiences
  • Keynote Sessions
  • Maintainer Track
  • Observability
  • Operations + Performance
  • Platform Engineering
  • Project Opportunties
  • Registration
  • SDLC
  • Security
  • Solutions Showcase
  • Sponsor-hosted Co-located Event
  • Tutorials