Loading…
Attending this event?
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Platform Engineering clear filter
Wednesday, November 13
 

11:15am MST

All-Your-GPUs-Are-Belong-to-Us: An Inside Look at NVIDIA's Self-Healing GeForce NOW Infrastructure - Ryan Hallisey & Piotr Prokop PL, NVIDIA
Wednesday November 13, 2024 11:15am - 11:50am MST
GeForce Now is a game streaming platform used by 20+ million gamers worldwide. Kubernetes is at the core of its infrastructure powering game workloads and other containerized services and tools. The infrastructure includes many regional clusters with 10s of thousands of GPUs capable of supporting 100s of thousands concurrent gamers. To operate a large Kubernetes infrastructure efficiently, NVIDIA built a GPU maintenance API to enable automated lifecycle management of critical infrastructure components. When combined with a few operators, this API facilitates planning and coordination of crucial driver, GPU, and Kubernetes upgrades at an unprecedented scale, as well as empowering self-healing operators to detect and remediate failures to avoid outages. In this talk, we will share: - How K8s and KubeVirt powers Nvidia GeForce Now - Nvidia’s GPU Maintenance API solution - NVIDIA’s vision for doing automated GPU maintenance at scale in K8s
Speakers
avatar for Ryan Hallisey

Ryan Hallisey

Software Engineer, NVIDIA
Ryan is a software engineer at NVIDIA. He works on building data centers powered by Kubernetes and KubeVirt for NVIDIA products.
avatar for Piotr Prokop

Piotr Prokop

Senior Software Engineer, NVIDIA
Piotr is a Senior Software Engineer at NVIDIA. He works on running high performance workloads powered by Kubernetes for NVIDIA products.
Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

12:10pm MST

Automated Multi-Cloud Blue-Green Cluster Rotations: Zero Downtime Upgrades at Scale - Sourav Khandelwal, Databricks
Wednesday November 13, 2024 12:10pm - 12:45pm MST
I will present the system developed for cluster rotations across Databricks’ fleet of over a thousand cloud-managed k8s clusters on AWS, Azure, and GCP. Blue-green cluster rotations, or cluster swaps (upgrading by creating a new k8s cluster with a new version/configuration & shifting workloads from the old cluster), allow us to implement major infrastructure changes and upgrade k8s versions with low risk through staged rollouts, seamless rollbacks, zero downtime, and minimal operator intervention. Our system includes a k8s-style continuous reconciliation mechanism to manage cluster swap lifecycles, a fast and reliable cluster state change discovery system, and a k8s workload migration system. We will share methodologies and experiences in constructing this loosely coupled system that orchestrates product workloads and cloud provider APIs for automated cluster swaps. This session will explore the challenges faced, and the benefits of automating large-scale, multi-cloud k8s upgrades.
Speakers
avatar for Sourav Khandelwal

Sourav Khandelwal

Sr. Software Engineer, Databricks
I am a seasoned software engineer with over 10 years of experience in designing and managing large-scale platforms in cloud-native environments. At Databricks, my significant contributions have been pivotal in launching our next-generation cloud infrastructure that helped to transition... Read More →
Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

2:30pm MST

Better Pod Availability: A Survey of the Many Ways to Manage Workload Disruptions - Zach Loafman, Google
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Kubernetes Pods are ephemeral, but some are more ephemeral than others. Kubernetes provides a dizzying array of options to manage and handle Pod disruption. From PodDisruptionBudgets, to "safe-to-evict" annotations, GracefulTermination timeouts and more, it can be incredibly hard to determine the optimal solution for handling Pod disruption and how to manage gracefully terminating your application. Thankfully, due to the extensible nature of Kubernetes we can build CRDs and controllers that can simplify these complex topics for end users. In this talk, we'll present an in-depth analysis of the built-in options and how they work (or don't). While this problem is not unique to game-serving, we'll deep-dive and explain how Agones (an open-source session orchestration system layered on Kubernetes) solves this problem with a simple abstraction to hide the complexity!
Speakers
avatar for Zach Loafman

Zach Loafman

Staff Software Engineer, Google
Zach leads Google’s GKE Games team. He was previously lead of the Kubernetes Control Plane team for GKE, lead of the GKE Cluster Lifecycle team, worked on Kubernetes prior to GA, and was one of the founding members of the Google Kubernetes Engine team.
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

3:25pm MST

Cash App's Journey Into a Multi-Cluster Ecosystem - Rachel Sheikh, Cash App
Wednesday November 13, 2024 3:25pm - 4:00pm MST
Cash App's Compute team is responsible for the health and maintenance of the company's Kubernetes clusters, and the enablement of service owners to deploy their services into these clusters with confidence. Over the past year, we've made strides in improving our reliability and uptime, part of which involved introducing a paradigm around creating new Kubernetes clusters in our service ecosystem that allow us to seamlessly transition services in/out of to simplify cluster upgrades and provide us with guardrails against common outages. This talk intends to walk you through our experience introducing new Kubernetes clusters for our services at Cash App, migrating and splitting service traffic across clusters with zero downtime, and thinking through tooling adoption / creation to simplify cluster maintenance as our overhead scales.
Speakers
avatar for Rachel Sheikh

Rachel Sheikh

Ms., Cash App
I'm a software engineer with a decade of experience building and scaling backend services across various industries. When I'm not working on clusters or writing Go, I'm probably watching pro League of Legends or taking pictures of my dog.
Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any

5:25pm MST

Creating Paved Paths for Platform Engineers - Ritesh Patel, Nirmata; Abby Bangser, Syntasso; Viktor Farcic, Upbound; Nicholas Morey, Akuity; Praseeda Sathaye, Amazon
Wednesday November 13, 2024 5:25pm - 6:00pm MST
The platform engineering team's role has evolved into a pivotal one as the custodian of the internal developer platform. However, these teams often find themselves in a quagmire of identifying the right components to include in their platforms, particularly in the ever-expanding CNCF landscape. This panel session discusses these challenges by exploring the concept of 'Paved Paths' as a strategic approach to guide platform teams in their journey of building an internal developer platform (IDP). 'Paved Paths' offers a solution by providing platform engineering teams with proven reference architectures (e.g. CNOE and the BACK Stack). This approach prevents them from starting from scratch and getting lost in the vast CNCF landscape. By offering proven and opinionated reference architectures, platform teams can focus on enhancing developer experiences and optimizing higher-level workflows rather than grappling with the complexities of identifying foundational components for their IDP.
Speakers
avatar for Viktor Farcic

Viktor Farcic

Developer Advocate, Upbound
Viktor Farcic is a lead rapscallion at Upbound, a member of the CNCF Ambassadors, Google Developer Experts, CDF Ambassadors, and GitHub Stars groups, and a published author. He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox.
avatar for Ritesh Patel

Ritesh Patel

Co-Founder & VP Product, Nirmata
Ritesh Patel is Co-founder and leads Products at Nirmata, the creators of Kyverno. At Nirmata, he is responsible for commercial products for Kubernetes security, governance, and automation. He also leads key technology partnerships. Ritesh has 20+ years of experience delivering enterprise... Read More →
avatar for Praseeda Sathaye

Praseeda Sathaye

Principal Specialist Solution Architect, Amazon (AWS)
Praseeda Sathaye is a Principal Specialist SA for App Modernization and Containers at Amazon Web Services based in Bay Area California. She has been focused on helping customers speed their cloud-native adoption journey by modernizing their platform infrastructure, internal architecture... Read More →
avatar for Nicholas Morey

Nicholas Morey

Senior Developer Advocate, Akuity
Nicholas Morey is a Platform Engineer with a passion for DevOps practices. He is on the team at Akuity as a Developer Advocate, working with the community on anything Argo and Kargo-related. He is an experienced Argo CD operator and a Certified Kubernetes Administrator.
avatar for Abby Bangser

Abby Bangser

Principal Engineer, Syntasso
Abby is a Principal Engineer at Syntasso delivering Kratix, an open-source cloud-native framework for building internal platforms on Kubernetes. Her keen interest in supporting internal development comes from over a decade of experience in consulting and product delivery roles across... Read More →
Wednesday November 13, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
 
Thursday, November 14
 

11:00am MST

Engineering a Kubernetes Operator: Lessons Learned from Versions 1 to 5 - Andrew L'Ecuyer, Crunchy Data
Thursday November 14, 2024 11:00am - 11:35am MST
Join me to uncover insights and hard-learned lessons from our journey through the first five versions of a Kubernetes Operator for Postgres. I will trace the development lifecycle from version 1 started in 2017 to version 5 now. Each version represents a milestone in addressing specific challenges, functionality, stability, and performance. We will discuss the architectural decisions, design patterns, and implementation strategies that shaped the evolution of the Operator. Key topics will include handling stateful applications, ensuring high availability, building for flexible deployment models, scalability, and managing rolling upgrades for both the Operator and underlying software. By the end of this session, participants will be equipped with practical knowledge and actionable strategies for engineering their own Kubernetes Operators, ready to accelerate their development process and avoid common pitfalls.
Speakers
avatar for Andrew L'Ecuyer

Andrew L'Ecuyer

Sr. Director of Kubernetes Engineering, Crunchy Data
Andrew head’s up the Kubernetes Engineering Team at Crunchy Data. With a diverse background spanning both the public and private sectors, Andrew has played a key role in designing, building and integrating complex systems of all shapes and sizes. He holds degrees in both Computer... Read More →
Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any

11:00am MST

Yahoo’s Kubernetes Journey from on-Prem to Multi-Cloud at Scale - Nandhakumar Venkatachalam & Payal Patel, Yahoo
Thursday November 14, 2024 11:00am - 11:35am MST
Yahoo is an early adopter of Kubernetes, operating 37 on-prem and 42 multi-cloud production clusters hosting 2700 applications. Our team offers a simple yet powerful interface for users to deploy applications onto our managed clusters. Since 2015, we have handled multiple complex upgrades, including Operating Systems and Kubernetes, upgrading from version 1.0.3 to 1.30.0. In 2023, Yahoo announced plans to migrate to both GCP and AWS cloud platforms. Leveraging extensive knowledge, our team successfully provisioned Kubernetes clusters in a multi-cloud environment within a short period. Our team faced numerous challenges during the cloud adoption process, including networking, security, cluster autoscaling, and cost. In this talk, we will share managing K8S in a multi-cloud and discuss the challenges faced and solutions found. Key topics include Shared VPC, IP Space for K8s, securely accessing private clusters, multi-tenant workload identity, and maintaining a user interface to K8S.
Speakers
avatar for Nandhakumar Venkatachalam

Nandhakumar Venkatachalam

Sr Princ Production Engineer, Yahoo Inc
Nandhakumar Venkatachalam is a Senior Principal Production Engineer at Yahoo Inc. As a lead engineer responsible for operating the large-scale Kubernetes cluster, he has played a key architect role in building scalable cloud infrastructure. Nandha has been with Yahoo for over 17 years... Read More →
avatar for Payal Patel

Payal Patel

Principal Software Development Engineer, Yahoo
Payal Patel is a Principal Software Development Engineer in the Cloud Infrastructure team at Yahoo. She is currently developing a hybrid cloud solution for Kubernetes clusters in AWS and GCP to set up the Kubernetes clusters at scale. Before that, she worked on managing the Kubernetes... Read More →
Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 251
  Platform Engineering
  • Content Experience Level Any

11:55am MST

Evolving Reddit’s Infrastructure via Principled Platform Abstractions - Karan Thukral & Harvey Xia, Reddit
Thursday November 14, 2024 11:55am - 12:30pm MST
Reddit’s approach to infrastructure management has grown organically over time, adapted to solve tactical, near term problems. We have now reached a point where the only way to scale infrastructure capabilities to a growing engineering organization is through platform abstractions offering self-service management of standardized infrastructure patterns. Beginning in 2021, a concerted effort was made to reimagine infrastructure as an internal platform that empowers both application and infrastructure engineers to build impactful and maintainable systems. We present a case study of Reddit’s ongoing journey in evolving its infrastructure management practices from inefficient, human-in-the-loop processes to efficient, self-service interfaces. By treating Kubernetes as a universal control plane and extending it with custom control processes fronted by well-designed interfaces, we are moving the organization towards this vision. This will cover the the many trade-offs and lessons learnt.
Speakers
avatar for Harvey Xia

Harvey Xia

Staff Engineer, Compute Infrastructure @ Reddit, Reddit
I'm a software engineer with experience across a variety of disciplines including backend engineering, data engineering, and most recently, infrastructure engineering. I specialize in building cloud native infrastructure platform features.
avatar for Karan Thukral

Karan Thukral

Senior Engineer, Compute Infrastructure @ Reddit, Reddit
Karan is a Senior Software Engineer at Reddit working on the Compute team to build an easy to use internal developer platform which is scalable and reliable. He has been working in this problem space since 2017 building both internal and external developer platforms including App... Read More →
Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any

2:30pm MST

Exceeded Your Validation Cost Budget? Now What? - Joel Speed, Red Hat
Thursday November 14, 2024 2:30pm - 3:05pm MST
With the introduction of the common expression language (CEL) for writing complex validations, this is also brought in validation cost budgeting. It can be easy to violate this budget and difficult to work out how to reduce your validation cost. This talk with dive into the runtime cost budgeting and help to prevent those pesky errors! In this talk, we will cover the basics of CEL to set some groundwork before taking a look at some relatively simple CEL validations that cause the API server to reject your CRD definition. We will look at why the API server suggests that the runtime cost is over 100x the allowable cost budget, exploring how it came to that conclusion, and what you need to know when building your own APIs to be able to prevent that from happening. When you walk away from this talk, you should understand the various factors that contribute to your CEL runtime cost and be able to prevent errors in the future, improving CRD validation one field at a time!
Speakers
avatar for Joel Speed

Joel Speed

Principal Software Engineer, Red Hat
Joel has been working with Kubernetes and building controllers since 2017. Joel cut his teeth with Kubernetes as an SRE, before eventually moving into full software development at Red Hat where he leads the Cluster Infrastructure team, responsible for both Cloud Controller Managers... Read More →
Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

3:25pm MST

From Chaos to Harmony, Transforming ML Engineering: A Kubernetes Adoption Journey
Thursday November 14, 2024 3:25pm - 4:00pm MST
How Ekstra Bladet’s Data Science team went from a small team of ML engineers, who needed to deliver quickly without deep technical infrastructure knowledge, to a rigid and proprietary ML pipeline built from AWS components and triggered by a large and chaotic Infrastructure as Code project. This made it difficult to achieve freedom and required a lot of work to implement and debug. One of the key reasons for adopting Kubernetes for our ML team emerged when we realized that we should serve all stakeholders across the JP/Politikens Hus organization, not just Ekstra Bladet. We then chose Kubernetes as our container infrastructure, which transformed the ML team into a dynamic ML ecosystem with great freedom under responsibility.

Initially, we focused on building robust frameworks for training and deploying ML models as API services and model training. Today, our ML team operates at the forefront of innovation, where we embrace GitOps principles to streamline our machine learning platform. Through careful adoption of advanced techniques such as autoscaling, scheduling, event triggers, and dynamic service deployment, we ensure seamless integration of new ML models into our infrastructure. This evolution has allowed us to effectively meet our diverse needs, while maintaining agility and scalability in our ML operations.
Speakers
avatar for Paris Nakita Kejser

Paris Nakita Kejser

Cloud Engineer, JP Politikens Hus
As a certified Cloud Engineer specializing in AWS and Kubernetes, I'm integral to Ekstra Bladet’s Data Science team. My focus lies in optimizing cloud infrastructure, integrating AWS and Kubernetes setups, and driving technological advancements. I contribute to Ekstra Bladet's digital... Read More →
Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

4:30pm MST

GÖDel Scheduler: A Unified Scheduler for Online and Offline Workloads - Bing Li, Yue Yin & Lintong Jiang, ByteDance
Thursday November 14, 2024 4:30pm - 5:05pm MST
Gödel Scheduler, developed by ByteDance, has been open-sourced as a unified system for managing online and offline workloads efficiently. Created to surpass the capabilities of Kubernetes' default scheduler, it enhances resource utilization, operational efficiency, and scheduling throughput. Key features include optimistic concurrency, a two-layer scheduling abstraction, and a robust dispatcher and binder system. Gödel Scheduler aims to improve cloud-native experiences and reduce operational burdens, catering to ByteDance’s extensive and diverse computing needs. Join us to explore how Gödel Scheduler can revolutionize your workload management strategy, ensuring efficient and reliable operations across your cloud-native infrastructure.
Speakers
YY

Yue Yin

ByteDance
LJ

Lintong Jiang

ByteDance
avatar for Bing Li

Bing Li

Senior Software Engineer, ByteDance
Software Engineer at ByteDance CloudNative Infrastructure, building Gödel.
Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

5:25pm MST

How Google Build Its New Cloud on Top of Kubernetes - Saad Ali, Jie Yu & Prashanth Venugopal, Google
Thursday November 14, 2024 5:25pm - 6:00pm MST
“Build a new air-gapped cloud with open source technologies” – this is what a small team at Google was tasked with in late 2021. The team delivered a private cloud platform, complete with managed VMs, databases, AI services, and more. Moreover, it did so by leveraging a number of CNCF technologies, including Kubernetes, Istio, etc. We’ll share the potential of these technologies, as well as their limitations, by explaining how they were used to build a scalable, reliable, and secure cloud platform. We’ll discuss how to implement cloud tenancy concepts, enforce isolation among tenants, and how we built a cloud API leveraging k8s API machinery and service mesh. A key innovation in building the private cloud platform was the “Kubernetes Defined Networking” (KDN) stack we created: by leveraging existing k8s networking features (e.g. load balancer, etc.) along with a few key enhancements, we implemented most of the traditional cloud SDN concepts, like VPC, firewall, VM support, etc.
Speakers
avatar for Saad Ali

Saad Ali

Senior Engineering Manager, Google
Saad Ali is a Senior Engineering Manager at Google. He works on Google Distributed Cloud and the open-source Kubernetes project. He led the development of the Kubernetes storage and volume subsystem. He serves as a lead of the Kubernetes Storage SIG, has served as member of the CNCF... Read More →
avatar for prashanth venugopal

prashanth venugopal

Kubernetes Networking Lead, Google
Prashanth has an almost two decades long career, across various networking market segments. In his current role as the lead architect of Google's Kubernetes networking stack, he helps drive the networking stack's evolution for Google Kubernetes Engine (for the Public Cloud Market... Read More →
avatar for Jie Yu

Jie Yu

Principal Software Engineer, Google
Jie Yu is a currently a Principal Software Engineer at Google. Jie is currently working on Google Distributed Cloud, and is the leading architect for the product. Prior to Google, Jie was a Chief Architect at Mesosphere (D2IQ), and worked at Twitter. Jie joined Kubernetes community... Read More →
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any
 
Friday, November 15
 

11:00am MST

Platform Engineering in Financial Institutions: The Practitioner Panel - Paula Kennedy, Syntasso; Chris Plank, NatWest Bank; Suhail Patel, Monzo; Jinhong Brejnholt, Saxo Bank; Rachael Wonnacott, Fidelity International
Friday November 15, 2024 11:00am - 11:35am MST
In the world of small and large financial institutions, platform engineering is a driver for shipping quickly, safely, and efficiently. This panel brings together seasoned practitioners from leading banks and financial institutions to share their firsthand platform experiences, successes, and challenges. - Discover how platform engineering can enhance developer experience, facilitate rapid innovation and drive efficiencies. - Delve into the complexities of navigating regulatory compliance, specifically when using open source technologies such as Kubernetes. - Learn from the experts' successes, setbacks and strategies (across technology and people), gaining actionable insights for successful implementation. Join us as we discuss the journey of adopting and deploying CNCF technologies at scale within the highly regulated financial sector. We’ll explore practical examples of both successes and incidents where things have gone wrong, providing the audience with valuable takeaways.
Speakers
avatar for Paula Kennedy

Paula Kennedy

Chief Operating Officer, Syntasso
Paula is Co-Founder & Chief Operating Officer of Syntasso; previous roles include Senior Director at VMware Tanzu, Pivotal and Co-Founder & Chief Operating Officer of CloudCredo. With 20+ years experience in IT, Paula champions community, diversity and inclusion and has a range of... Read More →
avatar for Suhail Patel

Suhail Patel

Senior Staff Engineer, Monzo
Suhail is a Staff Engineer at Monzo focused on building the Core Platform. His role involves building and maintaining Monzo's infrastructure which spans over two thousand microservices and leverages key infrastructure components like Kubernetes, Cassandra, Etcd and more. He focuses... Read More →
avatar for Jinhong Brejnholt

Jinhong Brejnholt

Chief Cloud Architect, Saxo Bank
Jinhong is an accomplished cloud and platform architect, deeply committed to advancing DevSecOps practices and cloud-native technologies. She holds an MSc in Software Development and Technology and is certified as a Kubernetes application developer, administrator, and security specialist... Read More →
avatar for Chris Plank

Chris Plank

Enterprise Architect & Joint Product Owner, NatWest Bank
Chris Plank is a Enterprise Architect working for NatWest Bank in Edinburgh, Scotland. He has been leading a Platform as a Product initiative within the Bank over the last year looking to radically change the Banks approach to provisioning and maintaining services. Outside of work... Read More →
avatar for Rachael Wonnacott

Rachael Wonnacott

Technical Product Owner, Kubernetes Platform, Fidelity International
Rachael has spent the last decade focused on platform engineering. She places a conscious emphasis on improving flow and is on the quest to smooth the application lifecycle for developers in the enterprise. With a background in astrophysics, Rachael brings her scientific approach... Read More →
Friday November 15, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any

11:00am MST

Still Don't Do What Charlie Don't Does - Making CRD Changes Safer - Nick Young, Isovalent
Friday November 15, 2024 11:00am - 11:35am MST
Many Kubernetes installations use controllers that include Custom Resource Definitions (CRDs) to extend their capabilities. However, because CRDs can only have one version installed in a cluster at any one time, version and change management can be very difficult. This talk will benefit both controller implementers and users. For implementers, I have tips on how to more safely make API changes to their CRDs, and for CRD users, some tips on what to look out for when installing CRD updates. All of this is based on using experience from projects like Contour, Gateway API, and Cilium among others. Learn things like: Different CRD version management strategies - what’s worked and what hasn’t How to make schema changes like pluralizing a field or changing field validation in a safe way How not to make the same mistakes I did Expect to come away from this talk having learned from my painful experiences handling CRD changes badly, but also having heard a bunch of Simpsons references.
Speakers
avatar for Nick Young

Nick Young

Senior Software Engineer, Isovalent at Cisco
Nick has been working to prevent the entropic downfall of systems for 25 years, across datacenters, clouds, networking, and others. He's a Staff Engineer at Isovalent, and a maintainer on the Kubernetes Gateway API project, where he works on improving the ingress and mesh experiences... Read More →
Friday November 15, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 251
  Platform Engineering

11:55am MST

Kubernetes Upgrades: Less Pain, More Gain (and Maybe a Little Swearing) - Jago Macleod, Google
Friday November 15, 2024 11:55am - 12:30pm MST
Kubernetes upgrades are a major pain point for many users, often due to the complexity of managing multiple, independently versioned components. This talk will delve into the strategies and best practices for minimizing disruption and maximizing success during Kubernetes upgrades. We'll explore: - Common pitfalls and challenges faced during upgrades - Practical tips for smoother, more reliable upgrade processes - The risks of relying solely on Long Term Support (LTS) versions - Improving upgrade reliability for all Kubernetes users, regardless of their chosen platform Led by the head of both OSS Kubernetes and GKE Release and Upgrades at Google, this talk will provide valuable insights and actionable advice for anyone looking to create a sustainable and successful upgrade strategy. Whether you're a seasoned Kubernetes veteran or just getting started, this session will equip you with the knowledge and tools to navigate the complex landscape of Kubernetes upgrades.
Speakers
avatar for Jago Macleod

Jago Macleod

Engineering Director, Google
Jago Macleod is an Engineering Director at Google, where he leads much of the Kubernetes and Google Kubernetes Engine (GKE) team, which gives him the opportunity to work with some of Google Cloud’s largest customers. Prior to working at Google, Jago helped make the smart homes that... Read More →
Friday November 15, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any

11:55am MST

Share the Ride: Robust Multi-Tenancy in Kubernetes at Uber - Sashank Appireddy & Apoorva Jindal, Uber
Friday November 15, 2024 11:55am - 12:30pm MST
Multi-tenancy in Kubernetes involves the coexistence of multiple users or teams (tenants) on a single Kubernetes cluster while ensuring isolation, security, and performance. Our use cases at Uber span from scenarios with disruptive neighbors to those with large container sizes, specialized hardware, sticky placement preferences, and dynamic resource scaling demands, necessitating robust isolation measures. In this proposal, we present a comprehensive exploration of multi-tenancy in Kubernetes, covering strategies, the challenges we have faced and the effective solutions implemented to overcome them at Uber. Further, we will deep dive into the key aspects of building and managing multi-tenant Kubernetes clusters, by establishing strong tenant boundaries leveraging the ideas around node pools and tightly integrating with namespaces.
Speakers
avatar for Apoorva Jindal

Apoorva Jindal

Senior Staff Software Engineer, Uber Inc
Apoorva Jindal is working as Senior Staff Software Engineer at Uber. At Uber, he leads the Compute platform which powers all stateless and batch containerized workloads at Uber.
avatar for Sashank Reddy

Sashank Reddy

Staff Software Engineer, Uber Technologies Inc
I am software engineer with over a decade of experience specializing in containerization and distributed systems. As a Staff Software Engineer in the container platform team at Uber Technologies Inc, I lead the design, development and deployment of scalable multi-tenant architecture... Read More →
Friday November 15, 2024 11:55am - 12:30pm MST
Salt Palace | Level 2 | 251
  Platform Engineering

2:00pm MST

Micro-Segmentation and Multi-Tenancy: The Brown M&Ms of Platform Engineering - Jim Bugwadia, Nirmata & Rachael Wonnacott, Fidelity International
Friday November 15, 2024 2:00pm - 2:35pm MST
A key requirement for internal developer platforms is that they serve multiple workloads. The reality of platform engineering is that while it seeks to lower the barrier to entry for teams to deliver applications, it must also balance cost and ensure appropriate levels of security. It’s therefore essential to consider how application components running on shared infrastructure are allowed to communicate with each other and weigh up the cost of each architecture. In industry, we have seen differing approaches to deploying Kubernetes to achieve these goals, from multiple single-tenant clusters through to shared clusters that deliver namespaces-as-a-service. Rachael and Jim will define the concepts of multi-tenancy and micro-segmentation for cloud native systems, explain why they are critical to success with platform engineering. They will also show real-world examples of how they can be implemented, and demonstrate full automation using best practices like GitOps and Policy as Code.
Speakers
avatar for Jim Bugwadia

Jim Bugwadia

Co-founder and CEO, Nirmata
Jim Bugwadia is a co-founder and the CEO of Nirmata, the Kubernetes policy and governance company. Jim is an active contributor in the cloud native community and currently serves as co-chair of the Kubernetes Policy and Multi-Tenancy Working Groups. Jim is also a co-creator and maintainer... Read More →
avatar for Rachael Wonnacott

Rachael Wonnacott

Technical Product Owner, Kubernetes Platform, Fidelity International
Rachael has spent the last decade focused on platform engineering. She places a conscious emphasis on improving flow and is on the quest to smooth the application lifecycle for developers in the enterprise. With a background in astrophysics, Rachael brings her scientific approach... Read More →
Friday November 15, 2024 2:00pm - 2:35pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

2:00pm MST

The Missing Talk About API Versioning & Evolution in Your Developer Platform - Stefan Schimanski, Upbound & Sergiusz Urbaniak, Independent
Friday November 15, 2024 2:00pm - 2:35pm MST
In the realm of developer platforms, individuals without extensive experience in the cloud-native ecosystem are now venturing into the creation of Kubernetes-based APIs. Tools like Crossplane are transforming every platform engineer into an API designer. Ten years in, the ecosystem still offers little guidance on Kubernetes versioning and API evolution in practice. A naive understanding is not helpful, and many have been burned by relying on intuition. This talk will provide deep, yet applicable knowledge, starting from the first principles of the invariants to maintain when changing APIs in Kubernetes. It will cover tools like schemas, conversion, validation, and admission, and present very concrete and directly applicable API Evolution Patterns. These patterns will help navigate the life cycle of CRD-based projects. This talk aims to educate on how to evolve APIs effectively and safely without inadvertently breaking users.
Speakers
avatar for Sergiusz Urbaniak

Sergiusz Urbaniak

Team Lead - Kubernetes, https://mongodb.com
Sergiusz is a Kubernetes Team Lead at MongoDB. He is enthusiastic about modern infrastructure software while still enjoying minimalistic networking techniques like morse code. He worked on Mesos, container runtimes, Prometheus Operator, Thanos, upstream Kubernetes, Operators, and... Read More →
avatar for Stefan Schimanski

Stefan Schimanski

Senior Principal Software Engineer, Upbound
Stefan is a Senior Principal Engineer at Upbound working on control planes, Kubernetes, kcp, and as a tech-lead in Sig API Machinery. He contributed a major part of the CRD feature set. Stefan is a 2nd time GoogleSummer of Code mentor with CNCF, loves to teach and help people to learn... Read More →
Friday November 15, 2024 2:00pm - 2:35pm MST
Salt Palace | Level 2 | 251
  Platform Engineering

2:55pm MST

Modernization of Intuit Payroll Enterprise Using Event Driven Architecture - Hema Maarimuthu & Vigith Maurice, Intuit
Friday November 15, 2024 2:55pm - 3:30pm MST
Intuit's Quickbooks Online Payroll Enterprise, a critical application serving over 2 million customers, processes over a million transactions and $34 billion in payroll taxes. We're modernizing with a heavy investment in event-driven architecture for effective handling of financial data. This major transition extends beyond just the payroll platform; it involves decomposing complex systems across Intuit products using event-driven architecture and a focus on availability, scalability, and security is crucial. To address challenges like autoscaling for high throughput, low latency, better operational excellence, and development productivity, we have built our modernized platform on Numaflow, an open-source, Kubernetes native, language-agnostic platform. In our presentation, we will share our journey of modernizing our stack using event-driven serverless architecture on Numaflow and highlight the advantages it has brought to our developers and technology infrastructure.
Speakers
avatar for Vigith Maurice

Vigith Maurice

Principal Engineer, Intuit
Vigith is a co-creator of Numaproj and Principal Software Engineer for the Intuit Core Platform team in Mountain View, California. One of Vigith's current day-to-day focus areas is the various challenges in building scalable data and AIOps solutions for both batch and high-throughput... Read More →
avatar for Hema Maarimuthu

Hema Maarimuthu

Principal Engineer, Intuit
Hema is a Principal Software Engineer for Intuit's Online Payroll Infrastructure team in Mountain View, California. Hema’s current work involves leading cross-functional teams, strategizing, and driving operational excellence initiatives. Her major accomplishments include successfully... Read More →
Friday November 15, 2024 2:55pm - 3:30pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any

2:55pm MST

This Platform Goes to 11: Boost Developer Productivity with Lessons from Salesforce - Joe Kutner, Salesforce
Friday November 15, 2024 2:55pm - 3:30pm MST
Internal platforms play an essential role in boosting the productivity of developers who use cloud native technologies. That’s why Salesforce, a global leader in the cloud for more than two decades, evolved its existing collection of managed services and capabilities into a cohesive platform that delights developers. In this talk, you’ll learn how Salesforce's platform removes friction, unifies interfaces, and meets developers where they are with industry standard tooling. As you design and build your own platforms, you’ll be able to use the same principles that guided Salesforce to accelerate day-1 onboarding of new apps, increase the speed of the developer inner-loop and testing cycles, and reduce the time it takes to deliver new code to production. Our lessons learned will help you avoid missteps. Finally, you’ll learn how to measure developer satisfaction, performance, activity, collaboration, and efficiency to ensure that your platform delivers the most value for your developers.
Speakers
avatar for Joe Kutner

Joe Kutner

Software Architect, Salesforce
Joe is co-founder of the Cloud Native Buildpacks project, which aims to make containerization more secure and more developer friendly. He started the project in 2018 while working as DX Architect at Salesforce Heroku, and today is the DX Architect for Salesforce’s Hyperforce platform... Read More →
Friday November 15, 2024 2:55pm - 3:30pm MST
Salt Palace | Level 2 | 251
  Platform Engineering

4:00pm MST

Medical Research Computing Infrastructure on Hybrid Kubernetes - Jennings Zhang & Rudolph Pienaar, Boston Children's Hospital
Friday November 15, 2024 4:00pm - 4:35pm MST
Research computing is essential across biomedical research, especially in medical imaging and radiology where ML+AI are rapidly disrupting the field. But while the research frontier continues moving forward, the computing infrastructure of research and healthcare institutions tend to lag behind. At the Boston Children’s Hospital, we are closing the gap by developing the ChRIS Research Integration Service (ChRIS for short). ChRIS is an MIT-licensed platform for medical computation, enabling the use of research software in clinical practice, while maximizing the utility of our hybrid-cloud resources. This talk will be a discussion of the cloud-native software ecosystem from the perspective of a medical researcher of a teaching hospital. We will consider the advantages of adopting cloud-native software and Kubernetes for research and healthcare institutions, as well as the challenges in doing so.
Speakers
avatar for Rudolph Pienaar

Rudolph Pienaar

Dr, Boston Children's Hospital
Dr Pienaar is the architect of ChRIS -- a general purpose and MLops platform that is uniquely suited to the needs of both biomedical researcher and clinical users. He leads the Advanced Computing Group at the Fetal Neonatal Neuroimaging Development Science Center at Boston Children's... Read More →
avatar for Jennings Zhang

Jennings Zhang

Research Developer, Boston Children's Hospital
Jennings is a neuroscience researcher and software developer at the Boston Children's Hospital. His work and interests are split between biological questions, e.g. human brain development, and all-things software development, especially containers and Rust.
Friday November 15, 2024 4:00pm - 4:35pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

4:55pm MST

Reducing Cloud Cost for Multi-Tenancy Kubernetes Platform - Simon Ting & Sravan Akinapally, American Airlines
Friday November 15, 2024 4:55pm - 5:30pm MST
A self-service multi-tenancy Kubernetes platform offers many benefits to application teams. In less than 2 years, American Airlines Shared K8 Platform has grown to over 1000+ deployments. Now that we built a resilient and secure platform, we must make it cost-effective to ensure long-term viability. This has the added benefit of reducing the carbon footprint of our platform. In the 2nd year, our platform grew by over 300% but costs increased by 500% as we added security, observability, and other features. How do we start to control costs without violating our self-service model? What is the reasonable amount to spend on Observability? What is a reasonable utilization goal and how do we get there? What level of cost optimization can we achieve without compromising our self-service model and maintaining the resiliency of our platform? We set out to address all these questions and this is our journey. In 4 months, we decreased the total Cost Per Utilized Core (CPUC) by 40%.
Speakers
avatar for Simon Ting

Simon Ting

Principal Product Manager, American Airlines
Simon Ting is the Principal Product Manager for Kubernetes as a Platform and Observability at American Airlines. Simon started his IT career as a developer and moved into configuration management and development platforms manager for over 2 decades. During that time he supported on-site... Read More →
avatar for Sravan Akinapally

Sravan Akinapally

Product Tech Lead, American Airlines
Product Tech Lead
Friday November 15, 2024 4:55pm - 5:30pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering

4:55pm MST

Zero Downtime Upgrades at Scale: How Okta Manages Hundreds of Clusters Daily - Jérémy Albuixech & Kahou Lei, Okta
Friday November 15, 2024 4:55pm - 5:30pm MST
How do you upgrade your K8s clusters? Perhaps a rolling update of nodes, with services moving around? Can you guarantee a zero-downtime upgrade? Will this method scale and support the velocity of production environments? Likely not. But fear not - you are not alone! At Okta, we maintain hundreds of clusters, each hosting >130 services, with node counts ranging from 20-400 and we are updating them daily. How do we do it? Without an out-of-the-box solutions we had to build our own and we want to share what we learned with all of you! In this talk Kahou and Jeremy will go over the challenges and successes, highlighting how their deployment method provides the foundational blocks to build extra features while reducing the blast radius when something goes wrong - thanks to quick rollbacks and a canary rollouts. In this session attendees will learn how we leverage open source technologies to tackle three main problems: how to scale, how to secure and how to upgrade clusters with no downtime.
Speakers
avatar for Jérémy Albuixech

Jérémy Albuixech

Staff Software Engineer, Okta
Jeremy is a Staff Software Engineer at Okta. Starting as a full stack programmer with a good foundation in Javascript, he then gravitated towards a DevOps role and later became a member of the SRE team at Cisco, picking up an IaC, observability and Kubernetes skillset. With the Okta... Read More →
avatar for Kahou Lei

Kahou Lei

Principal Software Engineer, Okta
Kahou Lei is a Principal Software Engineer with a strong background in Cloud infrastructure and Kubernetes. With 20 years of industry experience, he has held significant positions at renowned companies such as Okta and Cisco. Kahou leads critical software engineering initiatives... Read More →
Friday November 15, 2024 4:55pm - 5:30pm MST
Salt Palace | Level 2 | 251
  Platform Engineering
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
  • 🚨 Contribfest
  • 🪧 Poster Sessions
  • AI + ML
  • Breaks
  • ⚡ Lightning Talks
  • Cloud Native Experience
  • Cloud Native Novice
  • CNCF-hosted Co-located Events
  • Connectivity
  • Data Processing + Storage
  • Emerging + Advanced
  • Experiences
  • Keynote Sessions
  • Maintainer Track
  • Observability
  • Operations + Performance
  • Platform Engineering
  • Project Opportunties
  • Registration
  • SDLC
  • Security
  • Solutions Showcase
  • Sponsor-hosted Co-located Event
  • Tutorials