KubeCon + CloudNativeCon North America 2024: Full Schedule

In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

11:15am MST

ARM-Wrestling: Overcoming CPU Migration Challenges to Reduce Costs - Laurent Bernaille & Eric Mountain, Datadog

Wednesday November 13, 2024 11:15am - 11:50am MST

Salt Palace | Level 1 | 155 B

When you have a significant cloud footprint, you always look for performance improvements and cost reductions. So when ARM instances became commonly available on one of our providers, seemingly providing great performance at a lower cost, we had to take a closer look! In this talk, we will first describe the steps we took to make our clusters ARM-ready and a few interesting issues we encountered during our initial tests: from performance regressions due to compiler behaviors to subtle memory corruption bugs. We will then discuss new challenges, in particular how to achieve load-balancing and auto-scaling when running workloads on a mix of CPUs with different performances, and share our results. If migrating real workloads to ARM proved challenging, it was worth the effort and we now run more than 50% of our workloads on ARM.

Speakers

Laurent Bernaille

Principal Engineer, Datadog

Laurent Bernaille worked several years as a consultant specializing in cloud, containers, and automation and helped organizations migrate to the public cloud and adopt containers. He is now Principal Engineer at Datadog and works closely with infrastructure teams, which are responsible... Read More →

Eric Mountain

Staff Engineer, Datadog

Eric Mountain began working with Kubernetes in 2014 helping Amadeus migrate to container and cloud technology. Eric is now a Staff Engineer in Datadog’s Compute team providing large scale Kubernetes to our internal users.

KubeCon NA 2024 ARM Wrestling pdf

Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Any

12:10pm MST

Automated Multi-Cloud, Multi-Flavor Kubernetes Cluster Upgrades Using Operators - Ziyuan Chen, Databricks

Wednesday November 13, 2024 12:10pm - 12:45pm MST

Salt Palace | Level 1 | 155 B

Databricks manages over a thousand k8s clusters across three major cloud providers which run critical workloads in cloud regions around the world. This talk describes the system we built to upgrade nodes’ operating system, k8s version, and other configs monthly, supporting EKS, AKS, GKE, and self-managed k8s. Our system is built on k8s operators and performs zero-downtime blue-green rolling updates, respects contracts with services with features like PDBs, maintenance windows, deferred node draining, and custom workload handling plugins. It enables easy rollbacks, has good observability, and incurs minimal human operational cost. This has allowed us to patch vulnerabilities and release infrastructure changes quickly and reliably across the fleet. We will also share our lessons learned on building several operators that work together using the controller-runtime framework, designing the declarative interfaces between them, and achieving consistent behavior across three clouds.

Speakers

Ziyuan Chen

Staff Software Engineer, Databricks

Ziyuan Chen is a software engineer at Databricks. He has worked on Databricks' compute and OS infrastructure areas.

Automated Multi Cloud, Multi Flavor Kubernetes Cluster Upgrades Using Operators pptx

Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate
Presentation Slides Attached Yes

2:30pm MST

Does My K8s Application Need CPR? Performance Evaluation of a Multi-Cluster Workload Management App - Braulio Dumba & Ezra Silvera, IBM

Wednesday November 13, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | 155 B

KubeStellar (KS) is an open-source Kubernetes multi-cluster workload configuration management system that can be used to manage AI workloads in multi-cluster environments. Hence, understanding KS performance is crucial especially when managing resource intensive AI workloads. In this talk, we will present our experience in analyzing the performance metrics of KS across several dimensions of scalability (e.g., number of bindingPolicies, workload description spaces and number of managed remote clusters) and challenges that arise when conducting performance experiments in a multi-cluster environment. Our insights will demonstrate the utility of benchmarking the performance of a multi-cluster Kubernetes workload management application. Additionally, in this talk, we will demonstrate the usefulness of using several opensource tools such as clusterloader2, kube-burner & kwok to evaluate the performance of multi-cluster Kubernetes management applications.

Speakers

Ezra Silvera

Senior Technical Staff Member, IBM

Ezra Silvera is a Senior Technical Staff Member at IBM Research. His interests include distributed systems, cloud management, and cloud infrastructure. Ezra is passionate about open-source technologies and has been involved in several notable open source projects such as Docker, KubeVirt... Read More →

Braulio Dumba

Staff Research Scientist, IBM

Dr. Braulio Dumba is a Staff Research Scientist at IBM Research. In 2018, he joined IBM under the Hybrid Cloud organization. His current research is focus on edge computing and hybrid cloud computing. Dr. Dumba earned a Ph.D. in Computer Science from University of Minnesota, Twin... Read More →

KCNA24 KS Braulio Ezra pdf

Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate

3:25pm MST

Setting New Standards for Reliability in Cloud Native Multi-Region Applications - Trey Caliva, Global Payments

Wednesday November 13, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 1 | 155 B

As a multinational FinTech provider, processing over 32 billion card transactions for 816 million accounts, Global Payments requires globally available architectures with quick disaster recovery while maintaining subsecond latencies. In addition, these workloads require strict adherence to compliance standards. This session will explore the high-level architectural decisions implemented in a cloud-native redesign and cloud migration of a mission critical legacy .NET application. Key cloud native tools leveraged include Kubernetes on GCP, and the use of CockroachDB as a cloud native database solution. We will explore how leveraging these cloud native technologies achieved extreme fault tolerance in a multi-region deployment, setting new standards for performance and reliability.

Speakers

Jim Hatcher

Solutions Engineer, Cockroach Labs

Trey Caliva

Principal Cloud Architect, Global Payments

Trey Caliva is an Architect and engineer with 10+ years of hands-on experience planning, developing, managing, and securing deployments in Google Cloud and AWS. He is currently Principal Cloud Architect at Global Payments, a Fortune 500 company and a member of the S&P 500 focused... Read More →

KCNA24 GP Setting New Standards for Reliability in Cloud Native Applications pdf

Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate

4:30pm MST

Kubernetes at Scale: Practical Solutions for Enhanced CNI and Kubelet Performance - Henrique Santana, Amazon Web Services & Bruno Gabriel da Silva, Sysdig

Wednesday November 13, 2024 4:30pm - 5:05pm MST

Salt Palace | Level 1 | 155 B

In this session, we'll explore challenges faced in maintaining optimal performance for Container Network Interface (CNI) and Kubelet components in Kubernetes clusters. Based on recurring real-world scenarios, we will dive into troubleshooting and mitigations of issues such as IP address allocation delays, registry pull queries per second (QPS), disk throttling. These pose significant impacts on the performance, scalability and stability of Kubernetes clusters. Our discussion will revolve around practical strategies aimed at mitigating such challenges, leveraging multiple block storage volumes, adjusting instance types, tuning registryPullQPS settings, and exploring the benefits of prefix mode for faster IP address allocation. Additionally, we'll examine the role of warm IP pools, and the implications of WARM_ENI_TARGET settings on CNI performance, providing attendees with a comprehensive understanding on how to optimize CNI and Kubelet performance effectively.

Speakers

Bruno Gabriel da Silva

Sr. Solutions Engineer, Sysdig

I have been working as a Solutions Engineer for several years, with my passion for cloud-native technologies igniting around 2018. That year, I transitioned from a traditional IT Windows Sysadmin role to fully embracing DevOps, focusing entirely on Open Source and Cloud. My first... Read More →

Henrique Santana

Sr. Cloud Support Engineer, Amazon Web Service

I'm Containers Specialist with over 15 years of experience in infrastructure operations. Skilled at automating workflows and solving problems through user-centered design and emerging technologies. Currently focusing on containers and container orchestration. Adept at optimizing resource... Read More →

KCNA24 Kubernetes at Scale pdf

Wednesday November 13, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Advanced

5:25pm MST

Misadventures in Large Scale Cluster Performance - Shane Corbett, AWS & Dima Ilchenko, Lacework

Wednesday November 13, 2024 5:25pm - 6:00pm MST

Salt Palace | Level 1 | 155 B

Join us for our follow up to one of the highest rated talks of kubecon 2022 (73,000 pods a day, lessons from misadventures in multi-tenant). We are on a new misadventure, asking the question what if some of the most popular advice about Kubernetes was just...wrong? We spent over two years pouring through 800 page linux kernel performance books, tweaking obscure control plane settings, and developing detailed custom monitoring dashboards so you don’t have to! Join us as we take you through real world findings that took months of research to fully understand, and provide evidence that some of the things we were convinced were best practices, were the very things holding us back the most.

Speakers

Dima Ilchenko

SRE, Fortinet

Dima is a staff SRE on a Compute Platform Team focused on troubleshooting, observability and scalability of large-scale Kubernetes platform at Fortinet / Lacework. Lacework's unique features create unique challenges that push Kubenetes to its limits, offering Dima unique perspective... Read More →

Shane Corbett

Senior Kubernetes Specialist, AWS

Shane Corbett is a Senior Containers Specialist at AWS focused on helping customers with the finer points of Kubernetes large scale design and performance. When not pushing Kubernetes to extremes you will find Shane pursuing his lifelong obsession of exploring the edge of the extreme... Read More →

largeScalePerformance pdf

Wednesday November 13, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Advanced

11:00am MST

Navigating the Cgroup Transition: Bridging the Gap Between Kubernetes and User Expectations - Sohan Kunkerkar, Red Hat Inc

Thursday November 14, 2024 11:00am - 11:35am MST

Salt Palace | Level 1 | 155 B

As Kubernetes and container technologies evolve, shifting from cgroup v1 to cgroup v2 has become a pivotal development. With cgroup v2 available in Kubernetes since v1.25, we're at a crossroads where many users and organizations must decide when and how to transition fully to this new system. Despite the benefits of cgroup v2, including better resource management and enhanced capabilities, users frequently encounter unexpected challenges signaling a gap in readiness and understanding. This talk will address the practical implications of moving to cgroup v2, discuss the coordinated efforts to deprecate cgroup v1, and propose actionable strategies to bridge the gap between the Kubernetes community, system administrators, and developers. By focusing on real-world experiences and providing clear guidance, this session aims to equip you with the knowledge and tools to navigate this significant change confidently.

Speakers

Sohan Kunkerkar

Senior Software Engineer, Red Hat Inc

Sohan Kunkerkar is a Senior Software Engineer at Red Hat, bringing expertise in distributed systems, backend engineering, and containers. His active contributions extend to CRI-O, a container runtime engine, and various sub-projects within the Kubernetes Sig-Node community. Sohan... Read More →

Navigating the Cgroup Transition pdf

Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate

11:55am MST

Multi-Zone Clusters Inside and Out - Tom Dean & Phil Henderson, Buoyant

Thursday November 14, 2024 11:55am - 12:30pm MST

Salt Palace | Level 1 | 155 B

Multi-zone clusters are a great tool for improving application reliability — and also a great way to spend a ton of cash. Why? What really happens when you set these things up? How do you use them effectively without bankrupting your whole organization? In this session, we'll dig into the nuts and bolts of what goes on under the hood of a multi-zone cluster, including what a zone is, what Kubernetes understands about zones, how zones affect routing, and why multi-zone clusters can drive costs up. We'll spend some time on Kubernetes' Topology Aware Routing, covering its advantages as well as its very real limitations. Finally, we'll dive into how you can influence Kubernetes' choices to take advantage of multi-zone clusters' reliability while containing costs. Join us for learning and live demos!

Speakers

Phil Henderson

Customer Success Engineer, Buoyant

Tom Dean

Field Engineer, Buoyant

Tom Dean started programming BASIC on Apple IIs over 40 years ago, and has been hooked on tech since then. A long-time user of Linux and Open Source, he has been expanding his Cloud, Cloud Native and adjacent subject matter knowledge to become a more well-rounded technologist, and... Read More →

Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate

2:30pm MST

One Inventory to Rule Them All: Standardizing Multicluster Management - Corentin Debains, Google & Ryan Zhang, Microsoft

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | 155 B

Most Kubernetes users run more than one cluster, and some run hundreds or more. Crossing cluster boundaries has always been a challenge, because most Kubernetes APIs, tools, and operators are cluster-centric. In fact, there’s a remarkable lack of standard tools and patterns for multi-cluster. Over time users have found ways to stitch clusters together but the community has been asking for standardization.To share multi-cluster tools, Kubernetes sig-multicluster has introduced the “ClusterProfile” API, a critical building block for multi-cluster capabilities. This API provides a canonical way for multicluster controllers and users to iterate over clusters, and to install or manage multi-cluster features. In this talk, we will look at some of the problems inherent to multi-clustering, explain the concepts introduced by this new API and look at implementations and consumers of it.We dive into real life examples of patterns and usage, with products such as Kueue, ArgoCD, and Argo workflow.

Speakers

Ryan Zhang

Principal Software Engineering Manager, Microsoft

Dr. Ryan Zhang is a Principal Software Engineering Manager working in Azure Kubernetes Service at Microsoft. He received his Ph.D. from Rice University, specializing in Grid computing. With over 15 years of experience in software engineering, he has managed teams of software engineers... Read More →

Corentin Debains

Software Engineer, Google

Corentin Debains is a software engineer at Google working on the GKE Fleet (multicluster platform). He is an active member of Kubernetes’ special interest group sig-multicluster.

KubeconNA24 One Inventory to Rule Them All.pptx pdf

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate

3:25pm MST

Orchestrating Quasi-Real Time Data Processing in the Computing Farm of the ATLAS Experiment at CERN - Giuseppe Avolio, CERN

Thursday November 14, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 1 | 155 B

What has Kubernetes got to do with a High Energy Physics experiment collecting one million physics events per second at a data rate of 5 TB/s? That is what we would like to show you! The ATLAS experiment at CERN filters one million complex collision signatures per second provided by the Large Hadron Collider in quasi real-time, using a mixture of custom electronics and a large computing farm (the Event Filter – EF – farm) consisting of up to 5000 commodity servers. In this talk, we will tell you how we are going to exploit Kubernetes to orchestrate the ATLAS EF computing farm. In particular, we will focus on the strategy and optimizations we put in place in order to start more than 25000 PODs over more than 2500 worker nodes in about 50 seconds. We will also show the impact of the Kubernetes Scheduler and Controller Manager QPS values on POD start and stop throughputs and we will report about how custom scheduler profiles allow us to schedule PODs at an average rate of about 500 Hz.

Speakers

Giuseppe Avolio

Dr., CERN

Giuseppe Avolio is a physicist working at CERN, with almost 20 years of experience in the field of Data Acquisition (DAQ) systems for High Energy Physics experiments. He is member of the ATLAS collaboration, and he is currently responsible for coordinating the ATLAS DAQ system upgrade... Read More →

K8s ATLAS pdf

Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Advanced

4:30pm MST

Per-Node Api-Server Proxy: Expand the Cluster's Scale and Stability - Weizhou Lan & Iceber Gu, DaoCloud

Thursday November 14, 2024 4:30pm - 5:05pm MST

Salt Palace | Level 1 | 155 B

For lots of CNCF projects, kinds of daemonsets simultaneously synchronize datas from the Api-server from each node. Especially in large-scale clusters, it creates significant pressure on the Api-server, burdens the network, even affects the stability of the cluster. Some projects have implemented optimization to address this. For instance, Cilium aggregates endpoint information into the CRD CiliumEndpointSlice before distributing it to its daemonset. However, many projects have not yet adopted such data aggregation optimizations and Currently, there is still no project to help improve the communication between all components and the Api-server. ClusterPedia supports to launch per-node Api-server proxies to serve all local pods, and utilize eBPF to resolve the API server's clusterIP to the local proxy, which transparently implements API server access redirection on demand. In large-scale clusters, this can significantly improve the stability of all cluster's services.

Speakers

Iceber Gu

Software Engineer, DaoCloud

Senior open source enthusiast, focused on cloud runtime, multi-cloud and WASM. I am a CNCF Ambassador and founded Clusterpedia and promoted it as a CNCF Sandbox project. I also created KasmCloud to promote the integration of WASM with Kubernetes and contribute it to the WasmCloud... Read More →

Weizhou Lan

Senior Tech Lead, Daocloud

Weizhou Lan, 13+ years of engineering experience, engaged in kubernetes since 2018. a senior tech lead at Daocloud focusing on private cloud, a speaker at KubeCon NA/EU and KCD China, a Program Committee Member for KubeCon, the initiator and maintainer of the CNCF sandbox project... Read More →

Per Node Api Server Proxy Expand The Cluster Scale And Stability pdf

Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate

5:25pm MST

Pod Power: Liberating Kubernetes Users from Container Resource Micromanagement - Dixita Narang, Google & Peter Hunt, Red Hat

Thursday November 14, 2024 5:25pm - 6:00pm MST

Salt Palace | Level 1 | 155 B

In the dynamic world of Kubernetes, efficient resource management is crucial for optimizing performance and costs. Traditionally, managing resource requests and limits in Kubernetes has focused on individual containers within a pod. While this approach offers granular control, it can become cumbersome and error-prone, particularly for complex applications with multiple containers. Join us as we'll examine the challenges and scalability limitations posed by container resource micromanagement resource allocation. To address this issue, the pod-level feature specification is introduced. In this session, we'll delve into the transition towards pod-level resource specifications, providing an intuitive method for defining resource requests and limits at the pod level, in conjunction with the existing container-level settings. This innovative approach offers enhanced flexibility and optimized resource utilization for a variety of workloads, including those with init containers and sidecars.

Speakers

Dixita Narang

Software Engineer, Google

Dixita Narang is a Software Engineer at Google on the Kubernetes Node team. With a primary focus on resource management within Kubernetes, Dixita is deeply involved in the development and advancement of the Memory QoS feature, which is currently in the alpha stage. She is a new contributor... Read More →

Peter Hunt

Senior Software Engineer, Red Hat

Peter Hunt is a Senior Software Engineer working at Red Hat. Passionate about free software, Peter focuses on maintaining CRI-O, attending SIG node, and ~writing~ squashing bugs. Outside of the virtual world, Peter likes collecting floral-printed pants, gardening, and dancing.

Kubecon NA 2024 Pod Resources pdf

Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Advanced

11:00am MST

The State of Kubernetes Optimization and the Role of AI - James Wilson, nOps; Haoran Qiu, Microsoft; Katie Gamanji, Apple; Jasmine James, Square; Josh Cypher, Sonos

Friday November 15, 2024 11:00am - 11:35am MST

Salt Palace | Level 1 | 155 B

Featuring a diverse panel of experts, attendees will hear the latest in Kubernetes optimization. The session will encourage and engage attendees to challenge conventional wisdom and explore innovative approaches to optimization. Participants will leave with actionable knowledge and new perspectives they can apply to their own environments. Topics include: - Valuable insights into the current state of AI in optimization, highlighting both its potential and barriers to adoption - How and when AI can be used for real-time decision-making - Exploring the intersection of sustainability and optimization, emphasizing the importance of visibility in driving sustainable practices - The state of multidimensional pod autoscaling and potential to resolve conflicts between horizontal and vertical autoscaling - How new computing options and tools like Karpenter have the potential to disrupt the bin packing problem - How cloud-native projects can leverage new tools to track efficiencies

Speakers

Katie Gamanji

Sr Field Engineer, Apple

Katie is a cloud native leader and practitioner, currently in a Senior Field Engineer role at Apple and a TOC for CNCF. As a platform engineer, Katie contributed to Conde Nast and American Express platforms and at CNCF led the End User Community. Katie is the author of the Cloud Native... Read More →

Haoran Qiu

Research Engineer, Microsoft

Haoran Qiu is a Research Engineer at Microsoft Azure Systems Research. His research interests are in cloud efficiency, ML systems, and applying ML for cloud systems design and operation. Haoran was a recipient of ML and Systems Rising Star by MLCommons in 2023. Before joining Microsoft... Read More →

Jasmine James

Head of Development Infrastructure, Square

Jasmine James is an engineering leader at Square heading the Development Infrastructure for the Devices Platform overseeing CI Infrastructure, Developer Experience, and Test Rack teams aiming to streamline development and foster continuous feedback. She is passionate about diversity... Read More →

James Wilson

VP of Engineering, nOps

James has over two decades of experience in tech, with a strong focus in leading engineering teams in building cloud-based solutions. His expertise includes container orchestration, high-speed data transport, and cloud-native architectures. Currently, he leads the engineering team... Read More →

Josh Cypher

Senior DevOps Engineer, Sonos

Josh, a Senior DevOps Engineer at Sonos, has a diverse background in quality assurance and automation. Throughout his career, he has held roles such as tester, backend developer, automation engineer, engineering manager, and head of quality before specializing in DevOps and Kubernetes... Read More →

nOps KCNA24 Template.pptx pdf

Friday November 15, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Any

11:55am MST

Love thy (Noisy) Neighbor: Strategies for Mitigating Performance Interference in Cloud-Native Systems - Jonathan Perry, PerfPod

Friday November 15, 2024 11:55am - 12:30pm MST

Salt Palace | Level 1 | 155 B

In cloud-native environments, application performance often degrades due to contention over shared resources such as CPU caches and memory bandwidth. Current container technologies lack mechanisms to isolate these resources, which compels operators to maintain low utilization by scaling out their deployments. This session explores strategies used by hyperscalers like Google, Microsoft, Facebook, and Alibaba to mitigate such performance interference. We will review their published methodologies, extracting key principles that could guide the development of a Kubernetes-native performance isolator. Participants will gain insights into the design trade-offs and operational impacts of these tools. Additionally, we will discuss integration strategies for deploying such isolators in existing Kubernetes environments, aiming to optimize resource utilization while preserving application performance.

Speakers

Jonathan Perry

Founder & CEO, PerfPod

Jonathan Perry is a maintainer of the OpenTelemetry eBPF network collector. His PhD research at MIT CSAIL focused on performance isolation in datacenter and cloud networks, aiming to enhance network efficiency and reduce latency. Jonathan founded Flowmill, where he developed eBPF-based... Read More →

Slides Kubecon NA'24 Love thy (Noisy) Neighbor pdf

Transcript and Slides Love thy (Noisy) Neighbor pdf

Friday November 15, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate

2:00pm MST

Supercharge Your Kubernetes Autoscaling with Custom Metrics - Vamshi Krishna Samudrala & Sravan Akinapally, American Airlines

Friday November 15, 2024 2:00pm - 2:35pm MST

Salt Palace | Level 1 | 155 B

Out-of-the-box, Kubernetes provides native horizontal scaling capabilities driven by conventional resource consumption signals like CPU and memory utilization. However, in the real world, numerous applications demand dynamic scaling orchestrated by custom business telemetry such as queue depths, throughput volumes, or other domain-specific indicators. This session will unravel the secrets of extending Kubernetes' Horizontal Pod Autoscaler (HPA) to leverage custom metrics as scaling triggers, unlocking unprecedented scaling autonomy. Attendees will witness live demos showcasing: Deploying a custom metrics provider to expose application-centric metrics to the Kubernetes control plane Configuring the HPA to consume these custom metrics for intelligent scaling decisions A sample application dynamically scaling based on a custom metric like queue length or requests per second Best practices for crafting bespoke scaling policies tailored to custom metrics.

Speakers

Vamshi krishna Samudrala

Enterprise Cloud Architect, American Airlines

Enterprise Architect with a distinguished career spanning 14 years in the fields of DevOps and Cloud Architecture. Focused on automation, configuration management and innovation with cutting-edge technologies.Worked extensively with leading cloud service providers, including Amazon... Read More →

Friday November 15, 2024 2:00pm - 2:35pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate

2:55pm MST

The Key Value of etcd Over Custom Resources: Scalability - Jef Spaleta, Isovalent at Cisco

Friday November 15, 2024 2:55pm - 3:30pm MST

Salt Palace | Level 1 | 155 B

Cilium defaults to using Kubernetes Custom Resources to hold Cilium specific internal state, however when the cluster is large enough, the Kubernetes API becomes a bottleneck on performance. To scale a cluster to hundreds of nodes, Cilium can be configured to use a dedicated external etcd instance. This talk will discuss the details of what the external etcd looks like from an operator perspective, and explore why Cilium uses an external etcd for enhanced scalability. It will cover how to manage a cluster by bypassing the Kubernetes API and interacting only with the cluster's etcd key-value store - and also why it might be a bad idea. Get a taste of what's possible by bypassing the Kubernetes API and interacting with the etcd API directly, and learn why Cilium has an option to use a dedicated etcd deployment, not shared by the Kubernetes API, for holding Cilium state and the scalability benefits it can bring to your cluster.

Speakers

Jef Spaleta

Technical Community Advocate, Isovalent at Cisco

Jef Spaleta has more than a decade of experience in the technology industry; as software engineer, open source contributor, IoT hardware developer, operations, and most recently as a community advocate at Isovalent.

Friday November 15, 2024 2:55pm - 3:30pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Any

4:00pm MST

The Node Tetris Rabbit Hole: Why Your Binpacking Might Be Underperforming - Hannah Taub, Adobe Inc.

Friday November 15, 2024 4:00pm - 4:35pm MST

Salt Palace | Level 1 | 155 B

Have you ever looked at your Kubernetes cluster and thought “I have a perfectly good autoscaler! Why are all my nodes at less than 50% capacity?” When a team moves to the scale of hundreds of clusters with thousands of nodes, efficient binpacking changes from a side task to a financial necessity. From inefficient client apps to long-buried cluster configs, follow the Adobe Ethos team as they track down leads on what’s causing cluster underutilization and how to fix it. You will also learn some tips for designing your clusters to avoid these issues in the first place.

Speakers

Hannah Taub

Ms., Adobe Inc.

As a senior software engineer, Hannah has been working with Adobe’s Cloud Cost Efficiency team for the past several years. After graduating from the University of Edinburgh, she went from writing content APIs at Viacom (now Paramount) to building out Adobe’s Ethos Kubernetes CI/CD... Read More →

node tetris rabbit hole kubecon 2024 pptx

Friday November 15, 2024 4:00pm - 4:35pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate

4:55pm MST

Service Profiling Based Management and Scheduling in K8s - Jia Deng, Cong Xu & Mingmeng Luo, Bytedance

Friday November 15, 2024 4:55pm - 5:30pm MST

Salt Palace | Level 1 | 155 B

We present an open-source solution for the efficient management of resources and scheduling strategies in K8s. Our solution constructs workload-specific resource profiles based on their historical utilization patterns. This approach ensures that workloads receive adequate resources while optimizing overall resource utilization. To accomplish this objective, we employ a custom resource Service Profiling Description (SPD), facilitating a direct correlation between workloads and their resource usages, such as deployments and stateful sets etc. Resource utilization metrics, including CPU, disk I/O, and network I/O, are meticulously collected and aggregated. These usage indicators play a pivotal role in informing the scheduler's decisions regarding workloads allocation. This solution has been deployed within large-scale K8s clusters, addressing diverse workload demands, ranging from those requiring dedicated NUMA nodes to those capable of resource sharing among themselves.

Speakers

Mingmeng Luo

Software Engineer, Bytedance

Mingmeng Luo is a software engineer in the Infrastructure Department at ByteDance, where he specializes in the design and development of precision resource management technologies for large-scale Kubernetes clusters. His work focuses on optimizing resource allocation and efficiency... Read More →

Cong Xu

Senior Software Engineer, Bytedance

Cong Xu is a Tech Lead and Senior Software Engineer at ByteDance, where he focuses on building and optimizing the container-based cloud platform that hosts internal products such as Douyin and TikTok. From 2016 to 2022, he served as a Staff Research Member at IBM Research, contributing... Read More →

Jia Deng

Software Engineer, Bytedance

The speaker currently works for bytedance K8s orchestration team. Before that, the speaker worked for amazon EKSA and VMware Tanzu Mission Control.

KCNA24 2024 Service Profiling Based Resource Management and Scheduling pdf

Friday November 15, 2024 4:55pm - 5:30pm MST
Salt Palace | Level 1 | 155 B

Operations + Performance

Content Experience Level Intermediate