KubeCon + CloudNativeCon North America 2024: Full Schedule

In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

11:15am MST

All-Your-GPUs-Are-Belong-to-Us: An Inside Look at NVIDIA's Self-Healing GeForce NOW Infrastructure - Ryan Hallisey & Piotr Prokop PL, NVIDIA

Wednesday November 13, 2024 11:15am - 11:50am MST

Salt Palace | Level 1 | Grand Ballroom H

GeForce Now is a game streaming platform used by 20+ million gamers worldwide. Kubernetes is at the core of its infrastructure powering game workloads and other containerized services and tools. The infrastructure includes many regional clusters with 10s of thousands of GPUs capable of supporting 100s of thousands concurrent gamers. To operate a large Kubernetes infrastructure efficiently, NVIDIA built a GPU maintenance API to enable automated lifecycle management of critical infrastructure components. When combined with a few operators, this API facilitates planning and coordination of crucial driver, GPU, and Kubernetes upgrades at an unprecedented scale, as well as empowering self-healing operators to detect and remediate failures to avoid outages. In this talk, we will share: - How K8s and KubeVirt powers Nvidia GeForce Now - Nvidia’s GPU Maintenance API solution - NVIDIA’s vision for doing automated GPU maintenance at scale in K8s

Speakers

Ryan Hallisey

Software Engineer, NVIDIA

Ryan is a software engineer at NVIDIA. He works on building data centers powered by Kubernetes and KubeVirt for NVIDIA products.

Piotr Prokop

Senior Software Engineer, NVIDIA

Piotr is a Senior Software Engineer at NVIDIA. He works on running high performance workloads powered by Kubernetes for NVIDIA products.

All Your GPUs Are Belong To Us KubeCon NA '24 pptx

Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Beginner

12:10pm MST

Automated Multi-Cloud Large Scale K8s Cluster Lifecycle Management - Sourav Khandelwal, Databricks

Wednesday November 13, 2024 12:10pm - 12:45pm MST

Salt Palace | Level 1 | Grand Ballroom H

I will present the system developed for cluster rotations across Databricks’ fleet of over a thousand cloud-managed k8s clusters on AWS, Azure, and GCP. Blue-green cluster rotations, or cluster swaps (upgrading by creating a new k8s cluster with a new version/configuration & shifting workloads from the old cluster), allow us to implement major infrastructure changes and upgrade k8s versions with low risk through staged rollouts, seamless rollbacks, zero downtime, and minimal operator intervention. Our system includes a k8s-style continuous reconciliation mechanism to manage cluster swap lifecycles, a fast and reliable cluster state change discovery system, and a k8s workload migration system. We will share methodologies and experiences in constructing this loosely coupled system that orchestrates product workloads and cloud provider APIs for automated cluster swaps. This session will explore the challenges faced, and the benefits of automating large-scale, multi-cloud k8s upgrades.

Speakers

Sourav Khandelwal

Sr. Software Engineer, Databricks

I am a seasoned software engineer with over 10 years of experience in designing and managing large-scale platforms in cloud-native environments. At Databricks, my significant contributions have been pivotal in launching our next-generation cloud infrastructure that helped to transition... Read More →

Automated Multi Cloud Large Scale K8s Cluster Lifecycle Management pdf

Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Intermediate

2:30pm MST

Better Pod Availability: A Survey of the Many Ways to Manage Workload Disruptions - Zach Loafman, Google

Wednesday November 13, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | Grand Ballroom H

Kubernetes Pods are ephemeral, but some are more ephemeral than others. Kubernetes provides a dizzying array of options to manage and handle Pod disruption. From PodDisruptionBudgets, to "safe-to-evict" annotations, GracefulTermination timeouts and more, it can be incredibly hard to determine the optimal solution for handling Pod disruption and how to manage gracefully terminating your application. Thankfully, due to the extensible nature of Kubernetes we can build CRDs and controllers that can simplify these complex topics for end users. In this talk, we'll present an in-depth analysis of the built-in options and how they work (or don't). While this problem is not unique to game-serving, we'll deep-dive and explain how Agones (an open-source session orchestration system layered on Kubernetes) solves this problem with a simple abstraction to hide the complexity!

Speakers

Zach Loafman

Staff Software Engineer, Google

Zach leads Google’s GKE Games team. He was previously lead of the Kubernetes Control Plane team for GKE, lead of the GKE Cluster Lifecycle team, worked on Kubernetes prior to GA, and was one of the founding members of the Google Kubernetes Engine team.

KCNA24 Better Pod Availability pdf

Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Intermediate

3:25pm MST

Cash App's Journey Into a Multi-Cluster Ecosystem - Rachel Sheikh, Cash App

Wednesday November 13, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 1 | Grand Ballroom H

Cash App's Compute team is responsible for the health and maintenance of the company's Kubernetes clusters, and the enablement of service owners to deploy their services into these clusters with confidence. Over the past year, we've made strides in improving our reliability and uptime, part of which involved introducing a paradigm around creating new Kubernetes clusters in our service ecosystem that allow us to seamlessly transition services in/out of to simplify cluster upgrades and provide us with guardrails against common outages. This talk intends to walk you through our experience introducing new Kubernetes clusters for our services at Cash App, migrating and splitting service traffic across clusters with zero downtime, and thinking through tooling adoption / creation to simplify cluster maintenance as our overhead scales.

Speakers

Rachel Sheikh

Engineer, Cash App

I'm a software engineer with a decade of experience building and scaling backend services across various industries. When I'm not working on clusters or writing Go, I'm probably watching pro League of Legends or taking pictures of my dog.

KubeCon 2024 talk pptx

Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Any

5:25pm MST

Creating Paved Paths for Platform Engineers - Ritesh Patel, Nirmata; Abby Bangser, Syntasso; Viktor Farcic, Upbound; Nicholas Morey, Akuity; Praseeda Sathaye, Amazon

Wednesday November 13, 2024 5:25pm - 6:00pm MST

Salt Palace | Level 1 | Grand Ballroom H

The platform engineering team's role has evolved into a pivotal one as the custodian of the internal developer platform. However, these teams often find themselves in a quagmire of identifying the right components to include in their platforms, particularly in the ever-expanding CNCF landscape. This panel session discusses these challenges by exploring the concept of 'Paved Paths' as a strategic approach to guide platform teams in their journey of building an internal developer platform (IDP). 'Paved Paths' offers a solution by providing platform engineering teams with proven reference architectures (e.g. CNOE and the BACK Stack). This approach prevents them from starting from scratch and getting lost in the vast CNCF landscape. By offering proven and opinionated reference architectures, platform teams can focus on enhancing developer experiences and optimizing higher-level workflows rather than grappling with the complexities of identifying foundational components for their IDP.

Speakers

Viktor Farcic

Developer Advocate, Upbound

Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.

Ritesh Patel

Co-Founder & VP Product, Nirmata

Ritesh Patel is Co-founder and leads Products at Nirmata, the creators of Kyverno. At Nirmata, he is responsible for commercial products for security and operations (SecOps) automation powered by policy as code. He also leads key technology partnerships. Ritesh has 20+ years of experience... Read More →

Praseeda Sathaye

Principal Specialist Solution Architect, Amazon (AWS)

Praseeda Sathaye is a Principal Specialist SA for App Modernization and Containers at Amazon Web Services based in Bay Area California. She has been focused on helping customers speed their cloud-native adoption journey by modernizing their platform infrastructure, internal architecture... Read More →

Nicholas Morey

Senior Developer Advocate, Akuity

Nicholas Morey is a Platform Engineer with a passion for DevOps practices. He is on the team at Akuity as a Developer Advocate, working with the community on anything Argo and Kargo-related. He is an experienced Argo CD operator and a Certified Kubernetes Administrator.

Abby Bangser

Principal Engineer, Syntasso

Abby is a Principal Engineer at Syntasso delivering Kratix, an open-source cloud-native framework for building internal platforms on Kubernetes. Her keen interest in supporting internal development comes from over a decade of experience in consulting and product delivery roles across... Read More →

Wednesday November 13, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Intermediate

11:00am MST

Engineering a Kubernetes Operator: Lessons Learned from Versions 1 to 5 - Andrew L'Ecuyer, Crunchy Data

Thursday November 14, 2024 11:00am - 11:35am MST

Salt Palace | Level 1 | Grand Ballroom H

Join me to uncover insights and hard-learned lessons from our journey through the first five versions of a Kubernetes Operator for Postgres. I will trace the development lifecycle from version 1 started in 2017 to version 5 now. Each version represents a milestone in addressing specific challenges, functionality, stability, and performance. We will discuss the architectural decisions, design patterns, and implementation strategies that shaped the evolution of the Operator. Key topics will include handling stateful applications, ensuring high availability, building for flexible deployment models, scalability, and managing rolling upgrades for both the Operator and underlying software. By the end of this session, participants will be equipped with practical knowledge and actionable strategies for engineering their own Kubernetes Operators, ready to accelerate their development process and avoid common pitfalls.

Speakers

Andrew L'Ecuyer

Sr. Director of Kubernetes Engineering, Crunchy Data

Andrew head’s up the Kubernetes Engineering Team at Crunchy Data. With a diverse background spanning both the public and private sectors, Andrew has played a key role in designing, building and integrating complex systems of all shapes and sizes. He holds degrees in both Computer... Read More →

Engineering A Kubernetes Operator FINAL pdf

Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Any

11:00am MST

Yahoo’s Kubernetes Journey from on-Prem to Multi-Cloud at Scale - Nandhakumar Venkatachalam & Payal Patel, Yahoo

Thursday November 14, 2024 11:00am - 11:35am MST

Salt Palace | Level 2 | 251 AD

Yahoo is an early adopter of Kubernetes, operating 37 on-prem and 42 multi-cloud production clusters hosting 2700 applications. Our team offers a simple yet powerful interface for users to deploy applications onto our managed clusters. Since 2015, we have handled multiple complex upgrades, including Operating Systems and Kubernetes, upgrading from version 1.0.3 to 1.30.0. In 2023, Yahoo announced plans to migrate to both GCP and AWS cloud platforms. Leveraging extensive knowledge, our team successfully provisioned Kubernetes clusters in a multi-cloud environment within a short period. Our team faced numerous challenges during the cloud adoption process, including networking, security, cluster autoscaling, and cost. In this talk, we will share managing K8S in a multi-cloud and discuss the challenges faced and solutions found. Key topics include Shared VPC, IP Space for K8s, securely accessing private clusters, multi-tenant workload identity, and maintaining a user interface to K8S.

Speakers

Nandhakumar Venkatachalam

Sr Princ Production Engineer, Yahoo Inc

Nandhakumar Venkatachalam is a Senior Principal Production Engineer at Yahoo Inc. As a lead engineer responsible for operating the large-scale Kubernetes cluster, he has played a key architect role in building scalable cloud infrastructure. Nandha has been with Yahoo for over 17 years... Read More →

Payal Patel

Principal Software Development Engineer, Yahoo

Payal Patel is a Principal Software Development Engineer in the Cloud Infrastructure team at Yahoo. She is currently developing a hybrid cloud solution for Kubernetes clusters in AWS and GCP to set up the Kubernetes clusters at scale. Before that, she worked on managing the Kubernetes... Read More →

Yahoo’s Kubernetes Journey from On prem to Multi cloud at Scale pdf

Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 251 AD

Platform Engineering

Content Experience Level Any

11:55am MST

Evolving Reddit’s Infrastructure via Principled Platform Abstractions - Karan Thukral & Harvey Xia, Reddit

Thursday November 14, 2024 11:55am - 12:30pm MST

Salt Palace | Level 1 | Grand Ballroom H

Reddit’s approach to infrastructure management has grown organically over time, adapted to solve tactical, near term problems. We have now reached a point where the only way to scale infrastructure capabilities to a growing engineering organization is through platform abstractions offering self-service management of standardized infrastructure patterns. Beginning in 2021, a concerted effort was made to reimagine infrastructure as an internal platform that empowers both application and infrastructure engineers to build impactful and maintainable systems. We present a case study of Reddit’s ongoing journey in evolving its infrastructure management practices from inefficient, human-in-the-loop processes to efficient, self-service interfaces. By treating Kubernetes as a universal control plane and extending it with custom control processes fronted by well-designed interfaces, we are moving the organization towards this vision. This will cover the the many trade-offs and lessons learnt.

Speakers

Harvey Xia

Staff Engineer, Compute Infrastructure @ Reddit, Reddit

I'm a software engineer with experience across a variety of disciplines including backend engineering, data engineering, and most recently, infrastructure engineering. I specialize in building cloud native infrastructure platform features.

Karan Thukral

Senior Engineer, Compute Infrastructure @ Reddit, Reddit

Karan is a Senior Software Engineer at Reddit working on the Compute team to build an easy to use internal developer platform which is scalable and reliable. He has been working in this problem space since 2017 building both internal and external developer platforms including App... Read More →

Achilles Kubecon Talk 2024 pdf

Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Any

2:30pm MST

Exceeded Your Validation Cost Budget? Now What? - Joel Speed, Red Hat

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | Grand Ballroom H

With the introduction of the common expression language (CEL) for writing complex validations, this is also brought in validation cost budgeting. It can be easy to violate this budget and difficult to work out how to reduce your validation cost. This talk with dive into the runtime cost budgeting and help to prevent those pesky errors! In this talk, we will cover the basics of CEL to set some groundwork before taking a look at some relatively simple CEL validations that cause the API server to reject your CRD definition. We will look at why the API server suggests that the runtime cost is over 100x the allowable cost budget, exploring how it came to that conclusion, and what you need to know when building your own APIs to be able to prevent that from happening. When you walk away from this talk, you should understand the various factors that contribute to your CEL runtime cost and be able to prevent errors in the future, improving CRD validation one field at a time!

Speakers

Joel Speed

Principal Software Engineer, Red Hat

Joel has been working with Kubernetes and building controllers since 2017. Joel cut his teeth with Kubernetes as an SRE, before eventually moving into full software development at Red Hat where he leads the Cluster Infrastructure team, responsible for both Cloud Controller Managers... Read More →

Exceeded Your Validation Cost Budget Now What pdf

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Advanced

3:25pm MST

From Chaos to Harmony, Transforming ML Engineering: A Kubernetes Adoption Journey - Paris Nakita Kejser, JP Politikens Hus

Thursday November 14, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 1 | Grand Ballroom H

How Ekstra Bladet’s Data Science team went from a small team of ML engineers, who needed to deliver quickly without deep technical infrastructure knowledge, to a rigid and proprietary ML pipeline built from AWS components and triggered by a large and chaotic Infrastructure as Code project. This made it difficult to achieve freedom and required a lot of work to implement and debug. One of the key reasons for adopting Kubernetes for our ML team emerged when we realized that we should serve all stakeholders across the JP/Politikens Hus organization, not just Ekstra Bladet. We then chose Kubernetes as our container infrastructure, which transformed the ML team into a dynamic ML ecosystem with great freedom under responsibility.

Initially, we focused on building robust frameworks for training and deploying ML models as API services and model training. Today, our ML team operates at the forefront of innovation, where we embrace GitOps principles to streamline our machine learning platform. Through careful adoption of advanced techniques such as autoscaling, scheduling, event triggers, and dynamic service deployment, we ensure seamless integration of new ML models into our infrastructure. This evolution has allowed us to effectively meet our diverse needs, while maintaining agility and scalability in our ML operations.

Speakers

Paris Nakita Kejser

Cloud Engineer, JP | Politiken Media Group

As a certified Cloud Engineer specializing in AWS and Kubernetes, I'm integral to Ekstra Bladet’s Data Science team. My focus lies in optimizing cloud infrastructure, integrating AWS and Kubernetes setups, and driving technological advancements. I contribute to Ekstra Bladet's digital... Read More →

From Chaos to Harmony, Transforming ML Engineering A Kubernetes Adoption Journey pdf

Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Intermediate

4:30pm MST

GÖDel Scheduler: A Unified Scheduler for Online and Offline Workloads - Bing Li, Yue Yin & Lintong Jiang, ByteDance

Thursday November 14, 2024 4:30pm - 5:05pm MST

Salt Palace | Level 1 | Grand Ballroom H

Gödel Scheduler, developed by ByteDance, has been open-sourced as a unified system for managing online and offline workloads efficiently. Created to surpass the capabilities of Kubernetes' default scheduler, it enhances resource utilization, operational efficiency, and scheduling throughput. Key features include optimistic concurrency, a two-layer scheduling abstraction, and a robust dispatcher and binder system. Gödel Scheduler aims to improve cloud-native experiences and reduce operational burdens, catering to ByteDance’s extensive and diverse computing needs. Join us to explore how Gödel Scheduler can revolutionize your workload management strategy, ensuring efficient and reliable operations across your cloud-native infrastructure.

Speakers

Yue Yin

Software Engineer, ByteDance

Yue is a software engineer at ByteDance focusing on compute orchestration & resource scheduling. Prior to joining ByteDance, Yue worked at VMware, where she contributed to the development of the Tanzu product. Outside of work, Yue enjoys spending time with her cats, listening to podcasts... Read More →

Lintong Jiang

ByteDance

Bing Li

Senior Software Engineer, ByteDance

Software Engineer at ByteDance CloudNative Infrastructure, building Gödel.

KCNA24 Godel Scheduler pdf

Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Advanced

5:25pm MST

How Google Built a New Cloud on Top of Kubernetes - Jie Yu & Prashanth Venugopal, Google

Thursday November 14, 2024 5:25pm - 6:00pm MST

Salt Palace | Level 1 | Grand Ballroom H

“Build a new air-gapped cloud with open source technologies” – this is what a small team at Google was tasked with in late 2021. The team delivered a private cloud platform, complete with managed VMs, databases, AI services, and more. Moreover, it did so by leveraging a number of CNCF technologies, including Kubernetes, Istio, etc. We’ll share the potential of these technologies, as well as their limitations, by explaining how they were used to build a scalable, reliable, and secure cloud platform. We’ll discuss how to implement cloud tenancy concepts, enforce isolation among tenants, and how we built a cloud API leveraging k8s API machinery and service mesh. A key innovation in building the private cloud platform was the “Kubernetes Defined Networking” (KDN) stack we created: by leveraging existing k8s networking features (e.g. load balancer, etc.) along with a few key enhancements, we implemented most of the traditional cloud SDN concepts, like VPC, firewall, VM support, etc.

Speakers

prashanth venugopal

Kubernetes Networking Lead, Google

Prashanth has an almost two decades long career, across various networking market segments. In his current role as the lead architect of Google's Kubernetes networking stack, he helps drive the networking stack's evolution for Google Kubernetes Engine (for the Public Cloud Market... Read More →

Jie Yu

Principal Software Engineer, Google

Jie Yu is a currently a Principal Software Engineer at Google. Jie is currently working on Google Distributed Cloud, and is the leading architect for the product. Prior to Google, Jie was a Chief Architect at Mesosphere (D2IQ), and worked at Twitter. Jie joined Kubernetes community... Read More →

Kubecon SLC 2024 How Google Built a New Cloud on Top of Kubernetes pdf

Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Any

11:00am MST

Platform Engineering in Financial Institutions: The Practitioner Panel - Paula Kennedy, Syntasso; Chris Plank, NatWest Bank; Suhail Patel, Monzo; Jinhong Brejnholt, Saxo Bank; Rachael Wonnacott, Fidelity International

Friday November 15, 2024 11:00am - 11:35am MST

Salt Palace | Level 1 | Grand Ballroom H

In the world of small and large financial institutions, platform engineering is a driver for shipping quickly, safely, and efficiently. This panel brings together seasoned practitioners from leading banks and financial institutions to share their firsthand platform experiences, successes, and challenges. - Discover how platform engineering can enhance developer experience, facilitate rapid innovation and drive efficiencies. - Delve into the complexities of navigating regulatory compliance, specifically when using open source technologies such as Kubernetes. - Learn from the experts' successes, setbacks and strategies (across technology and people), gaining actionable insights for successful implementation. Join us as we discuss the journey of adopting and deploying CNCF technologies at scale within the highly regulated financial sector. We’ll explore practical examples of both successes and incidents where things have gone wrong, providing the audience with valuable takeaways.

Speakers

Paula Kennedy

Chief Operating Officer, Syntasso

Paula is Co-Founder & Chief Operating Officer of Syntasso; previous roles include Senior Director at VMware Tanzu, Pivotal and Co-Founder & Chief Operating Officer of CloudCredo. With 20+ years experience in IT, Paula champions community, diversity and inclusion and has a range of... Read More →

Suhail Patel

Senior Staff Engineer, Monzo

Suhail is a Staff Engineer at Monzo focused on building the Core Platform. His role involves building and maintaining Monzo's infrastructure which spans over two thousand microservices and leverages key infrastructure components like Kubernetes, Cassandra, Etcd and more. He focuses... Read More →

Jinhong Brejnholt

Global Head of Cloud & Container Platforms, Saxo Bank

Jinhong is an accomplished cloud and platform architect, deeply committed to advancing DevSecOps practices and cloud-native technologies. She holds an MSc in Software Development and Technology and is certified as a Kubernetes application developer, administrator, and security specialist... Read More →

Chris Plank

Enterprise Architect & Joint Product Owner, NatWest Bank

Chris Plank is a Enterprise Architect working for NatWest Bank in Edinburgh, Scotland. He has been leading a Platform as a Product initiative within the Bank over the last year looking to radically change the Banks approach to provisioning and maintaining services. Outside of work... Read More →

Rachael Wonnacott

Technical Product Owner, Kubernetes Platform, Fidelity International

Rachael has spent the last decade focused on platform engineering. She places a conscious emphasis on improving flow and is on the quest to smooth the application lifecycle for developers in the enterprise. With a background in astrophysics, Rachael brings her scientific approach... Read More →

Friday November 15, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Any

11:00am MST

Share the Ride: Robust Multi-Tenancy in Kubernetes at Uber - Sashank Appireddy & Apoorva Jindal, Uber

Friday November 15, 2024 11:00am - 11:35am MST

Salt Palace | Level 2 | 251 AD

Multi-tenancy in Kubernetes involves the coexistence of multiple users or teams (tenants) on a single Kubernetes cluster while ensuring isolation, security, and performance. Our use cases at Uber span from scenarios with disruptive neighbors to those with large container sizes, specialized hardware, sticky placement preferences, and dynamic resource scaling demands, necessitating robust isolation measures. In this proposal, we present a comprehensive exploration of multi-tenancy in Kubernetes, covering strategies, the challenges we have faced and the effective solutions implemented to overcome them at Uber. Further, we will deep dive into the key aspects of building and managing multi-tenant Kubernetes clusters, by establishing strong tenant boundaries leveraging the ideas around node pools and tightly integrating with namespaces.

Speakers

Apoorva Jindal

Senior Staff Software Engineer, Uber Inc

Apoorva Jindal is working as Senior Staff Software Engineer at Uber. At Uber, he leads the Compute platform which powers all stateless and batch containerized workloads at Uber.

Sashank Reddy

Staff Software Engineer, Uber Technologies Inc

I am software engineer with over a decade of experience specializing in containerization and distributed systems. As a Staff Software Engineer in the container platform team at Uber Technologies Inc, I lead the design, development and deployment of scalable multi-tenant architecture... Read More →

KubeCon2024 MultiTenancy pdf

Friday November 15, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 251 AD

Platform Engineering

Content Experience Level Intermediate

11:55am MST

Kubernetes Upgrades: Less Pain, More Gain (and Maybe a Little Swearing) - Jago Macleod, Google

Friday November 15, 2024 11:55am - 12:30pm MST

Salt Palace | Level 1 | Grand Ballroom H

Kubernetes upgrades are a major pain point for many users, often due to the complexity of managing multiple, independently versioned components. This talk will delve into the strategies and best practices for minimizing disruption and maximizing success during Kubernetes upgrades. We'll explore: - Common pitfalls and challenges faced during upgrades - Practical tips for smoother, more reliable upgrade processes - The risks of relying solely on Long Term Support (LTS) versions - Improving upgrade reliability for all Kubernetes users, regardless of their chosen platform Led by the head of both OSS Kubernetes and GKE Release and Upgrades at Google, this talk will provide valuable insights and actionable advice for anyone looking to create a sustainable and successful upgrade strategy. Whether you're a seasoned Kubernetes veteran or just getting started, this session will equip you with the knowledge and tools to navigate the complex landscape of Kubernetes upgrades.

Speakers

Jago Macleod

Engineering Director, Google

Jago Macleod is an Engineering Director at Google, where he leads much of the Kubernetes and Google Kubernetes Engine (GKE) team, which gives him the opportunity to work with some of Google Cloud’s largest customers. Prior to working at Google, Jago helped make the smart homes that... Read More →

KubeCon NA 2024 Kubernetes Upgrades pptx

Friday November 15, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Any

11:55am MST

Still Don't Do What Charlie Don't Does - Making CRD Changes Safer - Nick Young, Isovalent

Friday November 15, 2024 11:55am - 12:30pm MST

Salt Palace | Level 2 | 251 AD

Many Kubernetes installations use controllers that include Custom Resource Definitions (CRDs) to extend their capabilities. However, because CRDs can only have one version installed in a cluster at any one time, version and change management can be very difficult. This talk will benefit both controller implementers and users. For implementers, I have tips on how to more safely make API changes to their CRDs, and for CRD users, some tips on what to look out for when installing CRD updates. All of this is based on using experience from projects like Contour, Gateway API, and Cilium among others. Learn things like: Different CRD version management strategies - what’s worked and what hasn’t How to make schema changes like pluralizing a field or changing field validation in a safe way How not to make the same mistakes I did Expect to come away from this talk having learned from my painful experiences handling CRD changes badly, but also having heard a bunch of Simpsons references.

Speakers

Nick Young

Senior Software Engineer, Isovalent at Cisco

Nick has been working to prevent the entropic downfall of systems for 25 years, across datacenters, clouds, networking, and others. He's a Staff Engineer at Isovalent, and a maintainer on the Kubernetes Gateway API project, where he works on improving the ingress and mesh experiences... Read More →

Still Don't Do What Charlie Don't Does.pptx pdf

Friday November 15, 2024 11:55am - 12:30pm MST
Salt Palace | Level 2 | 251 AD

Platform Engineering

Content Experience Level Intermediate

2:00pm MST

Micro-Segmentation and Multi-Tenancy: The Brown M&Ms of Platform Engineering - Jim Bugwadia, Nirmata & Rachael Wonnacott, Fidelity International

Friday November 15, 2024 2:00pm - 2:35pm MST

Salt Palace | Level 1 | Grand Ballroom H

A key requirement for internal developer platforms is that they serve multiple workloads. The reality of platform engineering is that while it seeks to lower the barrier to entry for teams to deliver applications, it must also balance cost and ensure appropriate levels of security. It’s therefore essential to consider how application components running on shared infrastructure are allowed to communicate with each other and weigh up the cost of each architecture. In industry, we have seen differing approaches to deploying Kubernetes to achieve these goals, from multiple single-tenant clusters through to shared clusters that deliver namespaces-as-a-service. Rachael and Jim will define the concepts of multi-tenancy and micro-segmentation for cloud native systems, explain why they are critical to success with platform engineering. They will also show real-world examples of how they can be implemented, and demonstrate full automation using best practices like GitOps and Policy as Code.

Speakers

Jim Bugwadia

Co-founder and CEO, Nirmata

Jim Bugwadia is a co-founder and the CEO of Nirmata, the Kubernetes policy and governance company. Jim is an active contributor in the cloud native community and currently serves as co-chair of the Kubernetes Policy and Multi-Tenancy Working Groups. Jim is also a co-creator and maintainer... Read More →

Rachael Wonnacott

Technical Product Owner, Kubernetes Platform, Fidelity International

KCNA24 Brown M&Ms of Platform Engineering Nov 7 2024 pdf

Friday November 15, 2024 2:00pm - 2:35pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Intermediate

2:00pm MST

The Missing Talk About API Versioning & Evolution in Your Developer Platform - Stefan Schimanski, Upbound & Sergiusz Urbaniak, Independent

Friday November 15, 2024 2:00pm - 2:35pm MST

Salt Palace | Level 2 | 251 AD

In the realm of developer platforms, individuals without extensive experience in the cloud-native ecosystem are now venturing into the creation of Kubernetes-based APIs. Tools like Crossplane are transforming every platform engineer into an API designer. Ten years in, the ecosystem still offers little guidance on Kubernetes versioning and API evolution in practice. A naive understanding is not helpful, and many have been burned by relying on intuition. This talk will provide deep, yet applicable knowledge, starting from the first principles of the invariants to maintain when changing APIs in Kubernetes. It will cover tools like schemas, conversion, validation, and admission, and present very concrete and directly applicable API Evolution Patterns. These patterns will help navigate the life cycle of CRD-based projects. This talk aims to educate on how to evolve APIs effectively and safely without inadvertently breaking users.

Speakers

Sergiusz Urbaniak

Team Lead - Kubernetes, https://mongodb.com

Sergiusz is a Kubernetes Team Lead at MongoDB. He is enthusiastic about modern infrastructure software while still enjoying minimalistic networking techniques like morse code. He worked on Mesos, container runtimes, Prometheus Operator, Thanos, upstream Kubernetes, Operators, and... Read More →

Stefan Schimanski

Senior Principal Software Engineer, Upbound

Stefan is a Senior Principal Engineer at Upbound working on control planes, Kubernetes, kcp, and as a tech-lead in Sig API Machinery. He contributed a major part of the CRD feature set. Stefan is a 2nd time GoogleSummer of Code mentor with CNCF, loves to teach and help people to learn... Read More →

Friday November 15, 2024 2:00pm - 2:35pm MST
Salt Palace | Level 2 | 251 AD

Platform Engineering

Content Experience Level Intermediate

2:55pm MST

Modernization of Intuit Payroll Enterprise Using Event Driven Architecture - Hema Maarimuthu & Vigith Maurice, Intuit

Friday November 15, 2024 2:55pm - 3:30pm MST

Salt Palace | Level 1 | Grand Ballroom H

Intuit's Quickbooks Online Payroll Enterprise, a critical application serving over 2 million customers, processes over a million transactions and $34 billion in payroll taxes. We're modernizing with a heavy investment in event-driven architecture for effective handling of financial data. This major transition extends beyond just the payroll platform; it involves decomposing complex systems across Intuit products using event-driven architecture and a focus on availability, scalability, and security is crucial. To address challenges like autoscaling for high throughput, low latency, better operational excellence, and development productivity, we have built our modernized platform on Numaflow, an open-source, Kubernetes native, language-agnostic platform. In our presentation, we will share our journey of modernizing our stack using event-driven serverless architecture on Numaflow and highlight the advantages it has brought to our developers and technology infrastructure.

Speakers

Vigith Maurice

Principal Engineer, Intuit

Vigith is a co-creator of Numaproj and Principal Software Engineer for the Intuit Core Platform team in Mountain View, California. One of Vigith's current day-to-day focus areas is the various challenges in building scalable data and AIOps solutions for both batch and high-throughput... Read More →

Hema Maarimuthu

Principal Engineer, Intuit

Hema is a Principal Software Engineer for Intuit's Online Payroll Infrastructure team in Mountain View, California. Hema’s current work involves leading cross-functional teams, strategizing, and driving operational excellence initiatives. Her major accomplishments include successfully... Read More →

Friday November 15, 2024 2:55pm - 3:30pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Any

2:55pm MST

This Platform Goes to 11: Boost Developer Productivity with Lessons from Salesforce - Joe Kutner, Salesforce

Friday November 15, 2024 2:55pm - 3:30pm MST

Salt Palace | Level 2 | 251 AD

Internal platforms play an essential role in boosting the productivity of developers who use cloud native technologies. That’s why Salesforce, a global leader in the cloud for more than two decades, evolved its existing collection of managed services and capabilities into a cohesive platform that delights developers. In this talk, you’ll learn how Salesforce's platform removes friction, unifies interfaces, and meets developers where they are with industry standard tooling. As you design and build your own platforms, you’ll be able to use the same principles that guided Salesforce to accelerate day-1 onboarding of new apps, increase the speed of the developer inner-loop and testing cycles, and reduce the time it takes to deliver new code to production. Our lessons learned will help you avoid missteps. Finally, you’ll learn how to measure developer satisfaction, performance, activity, collaboration, and efficiency to ensure that your platform delivers the most value for your developers.

Speakers

Joe Kutner

Software Architect, Salesforce

Joe is co-founder of the Cloud Native Buildpacks project, which aims to make containerization more secure and more developer friendly. He started the project in 2018 while working as DX Architect at Salesforce Heroku, and today is the DX Architect for Salesforce’s Hyperforce platform... Read More →

This Platform Goes to 11 Boost Developer Productivity with Lessons from Salesforce pptx

Friday November 15, 2024 2:55pm - 3:30pm MST
Salt Palace | Level 2 | 251 AD

Platform Engineering

Content Experience Level Intermediate

4:00pm MST

Medical Research Computing Infrastructure on Hybrid Kubernetes - Jennings Zhang, Boston Children's Hospital

Friday November 15, 2024 4:00pm - 4:35pm MST

Salt Palace | Level 1 | Grand Ballroom H

Research computing is essential across biomedical research, especially in medical imaging and radiology where ML+AI are rapidly disrupting the field. But while the research frontier continues moving forward, the computing infrastructure of research and healthcare institutions tend to lag behind. At the Boston Children’s Hospital, we are closing the gap by developing the ChRIS Research Integration Service (ChRIS for short). ChRIS is an MIT-licensed platform for medical computation, enabling the use of research software in clinical practice, while maximizing the utility of our hybrid-cloud resources. This talk will be a discussion of the cloud-native software ecosystem from the perspective of a medical researcher of a teaching hospital. We will consider the advantages of adopting cloud-native software and Kubernetes for research and healthcare institutions, as well as the challenges in doing so.

Speakers

Jennings Zhang

Research Developer, Boston Children's Hospital

Jennings is a neuroscience researcher and software developer at the Boston Children's Hospital. His work and interests are split between biological questions, e.g. human brain development, and all-things software development, especially containers and Rust.

KubeCon2024 ChRIS pdf

Friday November 15, 2024 4:00pm - 4:35pm MST
Salt Palace | Level 1 | Grand Ballroom H

Platform Engineering

Content Experience Level Beginner

4:55pm MST

Zero Downtime Upgrades at Scale: How Okta Manages Hundreds of Clusters Daily - Jérémy Albuixech & Kahou Lei, Okta

Friday November 15, 2024 4:55pm - 5:30pm MST

Salt Palace | Level 2 | 251 AD

How do you upgrade your K8s clusters? Perhaps a rolling update of nodes, with services moving around? Can you guarantee a zero-downtime upgrade? Will this method scale and support the velocity of production environments? Likely not. But fear not - you are not alone! At Okta, we maintain hundreds of clusters, each hosting >130 services, with node counts ranging from 20-400 and we are updating them daily. How do we do it? Without an out-of-the-box solutions we had to build our own and we want to share what we learned with all of you! In this talk Kahou and Jeremy will go over the challenges and successes, highlighting how their deployment method provides the foundational blocks to build extra features while reducing the blast radius when something goes wrong - thanks to quick rollbacks and a canary rollouts. In this session attendees will learn how we leverage open source technologies to tackle three main problems: how to scale, how to secure and how to upgrade clusters with no downtime.

Speakers

Jérémy Albuixech

Staff Software Engineer, Okta

Jeremy is a Staff Software Engineer at Okta. Starting as a full stack programmer with a good foundation in Javascript, he then gravitated towards a DevOps role and later became a member of the SRE team at Cisco, picking up an IaC, observability and Kubernetes skillset. With the Okta... Read More →

Kahou Lei

Principal Software Engineer, Okta

Kahou Lei is a Principal Software Engineer with a strong background in Cloud infrastructure and Kubernetes. With 20 years of industry experience, he has held significant positions at renowned companies such as Okta and Cisco. Kahou leads critical software engineering initiatives... Read More →

Zero Downtime Upgrades at Scale How Okta Manages Hundres of Clusters Daily pdf

Friday November 15, 2024 4:55pm - 5:30pm MST
Salt Palace | Level 2 | 251 AD

Platform Engineering

Content Experience Level Intermediate