KubeCon + CloudNativeCon North America 2024: Full Schedule

In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

arrow_back View All Dates

11:00am MST

Harnessing the Power of Envoy Proxy for Building an LLM Gateway - Idit Levine, Solo.io

Thursday November 14, 2024 11:00am - 11:35am MST

Salt Palace | Level 1 | 155 EF

As the demand for LLMs continues to soar, the need for secure, cost-conscious, and content-aware control over its usage is paramount. In this talk, we explore why Envoy Proxy is the optimal choice for building an LLM gateway, leveraging its unique architecture and capabilities. Unlike traditional proxies (e.g. NGINX), which rely on scripting languages for customization, Envoy Proxy stands out due to its extensibility features: filter architecture, callout architecture (ext-proc, ext-auth), and ability to dynamically load libraries. Combined with its high-performant, async core ( C++), Envoy can run as an ingress, egress and mesh gateway. We'll look at using Envoy proxy for LLM credential management, prompt guarding/decorting, analyzing content safety, usage controls, context-aware failover, and observability. Ideal for developers, architects, and tech enthusiasts looking to solve challenges around LLM usage and picking the right technologies for their platform infrastructure.

Speakers

Idit Levine

Founder & CEO, Solo.io

Idit Levine is the founder and CEO of Solo.io, a company that creates open-source tools to assist enterprises in adopting and extending innovative cloud-native technologies while modernizing their existing IT investments. Solo.io is a top contributor to CNCF projects such as Envoy... Read More →

Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | 155 EF

Connectivity

Content Experience Level Intermediate

11:00am MST

Cooperative Scheduling for Stateful Systems - Michael Youssef & Laxman Prabhu, LinkedIn

Thursday November 14, 2024 11:00am - 11:35am MST

Salt Palace | Level 1 | Grand Ballroom GI

At LinkedIn, we develop many stateful systems and run them on tens of thousands of machines in our datacenters. As we move LinkedIn’s infrastructure to Kubernetes, we quickly realized that StatefulSet was not going to be enough to support running critical stateful systems and satisfy the safety and durability goals of the teams developing stateful systems. We've built first-class support for running stateful workloads on bare metal where the stateful systems can coordinate with Kubernetes to stay available and ensure durability. With our design, we support planned/unplanned maintenance, swapping out hardware, and allow stateful systems to customize their rollout policies natively on Kubernetes. This talk covers: - Our LiStatefulSet API. - How we allow apps to customize safety checks and deployment policies via an ApplicationClusterManager, our pluggable policy engine. - The ApplicationClusterManager protocol that allows coordination of the lifecycle of workloads with Kubernetes.

Speakers

Laxman Prabhu

Staff Software Engineer, Systems Infrastructure, LinkedIn

Michael Youssef

Staff Software Engineer, LinkedIn

Michael is a Staff Software Engineer at LinkedIn, currently making management and deployment of sharded systems a touch less painful on Kubernetes. In his free time he enjoys spending time with his cat, inhaling chocolate, and playing tennis.

Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom GI

Data Processing + Storage

Content Experience Level Intermediate

11:00am MST

Kubernetes Workspaces: Enhancing Multi-Tenancy with Intelligent Apiserver Proxying - James Munnelly & Andrea Tosatto, Apple

Thursday November 14, 2024 11:00am - 11:35am MST

Salt Palace | Level 2 | 255 EF

Multi-tenancy in Kubernetes means sacrificing essential features like cluster-scoped list/watches and multi-namespace/cluster-scoped RBAC. This often leads to additional complexity when configuring operators and forces discrepancies and friction with cluster-as-a-service type offerings. In this talk we will go through a demonstration of an intelligent Kubernetes apiserver proxy that introduces the concept of a ‘workspace’. Borrowing the name from the KCP project, a Workspace is a virtual apiserver endpoint that provides a ‘cluster-scoped’ view over a group of namespaces in a remote cluster. We’ll then go on to discuss optimisations and changes that we’d like to make within Kubernetes to better support apiserver proxying for multi-tiered caching, routing and scoping purposes.

Speakers

James Munnelly

Staff Field Engineer, Apple

James Munnelly is a Field Engineer at Apple, helping customers adopt and adapt Kubernetes, and driving adoption of OSS cloud native technologies. James is also the founder of the cert-manager project, a Kubernetes extension for managing x509 certificates. He's an active member of... Read More →

Andrea Tosatto

Site Reliability Engineer, Apple

Andrea works at Apple as a Site Reliability Engineer. His day to day job consists in managing the lifecycle and ensuring the reliability of a multi-tenant compute platform built on top of Kubernetes. He is deeply passionate about multi-tenancy and any related topic, ranging from runtime... Read More →

Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 255 EF

Emerging + Advanced

Content Experience Level Intermediate

11:00am MST

Navigating the Cgroup Transition: Bridging the Gap Between Kubernetes and User Expectations - Sohan Kunkerkar, Red Hat Inc

Thursday November 14, 2024 11:00am - 11:35am MST

Salt Palace | Level 1 | 155 BC

As Kubernetes and container technologies evolve, shifting from cgroup v1 to cgroup v2 has become a pivotal development. With cgroup v2 available in Kubernetes since v1.25, we're at a crossroads where many users and organizations must decide when and how to transition fully to this new system. Despite the benefits of cgroup v2, including better resource management and enhanced capabilities, users frequently encounter unexpected challenges signaling a gap in readiness and understanding. This talk will address the practical implications of moving to cgroup v2, discuss the coordinated efforts to deprecate cgroup v1, and propose actionable strategies to bridge the gap between the Kubernetes community, system administrators, and developers. By focusing on real-world experiences and providing clear guidance, this session aims to equip you with the knowledge and tools to navigate this significant change confidently.

Speakers

Sohan Kunkerkar

Senior Software Engineer, Red Hat Inc

Sohan Kunkerkar is a Senior Software Engineer at Red Hat, bringing expertise in distributed systems, backend engineering, and containers. His active contributions extend to CRI-O, a container runtime engine, and various sub-projects within the Kubernetes Sig-Node community. Sohan... Read More →

Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | 155 BC

Operations + Performance

Content Experience Level Intermediate

11:00am MST

How We Made OpenTelemetry Be Our Fitness Tracker for Your CI/CD Pipelines! - Nicolas Woerner, Clario & Andreas Grabner, Dynatrace

Thursday November 14, 2024 11:00am - 11:35am MST

Salt Palace | Level 2 | 250

CI/CD pipelines are the heartbeat of modern cloud-native software delivery. Healthy pipelines ensure rapid and continuous deployments every time code gets committed to the Git repositories! Every new repository and commit puts more load on the CI/CD tool making it more challenging to keep this crucial heartbeat healthy! In this session, engineers from Clario will demonstrate how they leverage OpenTelemetry to observe, validate, report and optimize their CI/CD pipelines, keeping their deployments healthy despite increased scale and unlocking the full potential of modern software delivery on Kubernetes with GitLab.

Speakers

Andi Grabner

CNCF Ambassador and DevRel, Dynatrace

Andreas Grabner (@grabnerandi) has 20+ years of experience as a software developer, tester and architect and is an advocate for high-performing cloud scale applications. He is a CNCF ambassador, contributor to the CNCF project keptn and a DevRel for Dynatrace. Andreas is also a regular... Read More →

Nicolas Woerner

Associate DevOps Engineer, Clario

Nicolas Wörner works in the Platform Engineering Team at Clario. With a background in software and DevOps engineering he focuses on continuously enhancing the software delivery workflow at Clario. Nicolas is passionate about leveraging CNCF software to drive efficiency and reliability... Read More →

Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 250

SDLC

Content Experience Level Intermediate

11:55am MST

How to Move from Ingress to Gateway API with Minimal Hassle - Keith Mattix, Microsoft

Thursday November 14, 2024 11:55am - 12:30pm MST

Salt Palace | Level 1 | 155 EF

For many, the Ingress resource was one of the first Kubernetes APIs they used, adding HTTP routing rules and SSL certs for cluster-external traffic. These APIs are used for production in clusters across the world today, configuring ingress gateways serving hundreds of thousands of connections per second. As of October 2023, the Ingress API has been superseded by the Gateway API, a new set of Kubernetes resources with over 20 implementations that enforces security best practices by design. However, migrating networking APIs is an intimidating task, and doing so safely is every company’s primary concern. Join this session to learn how to make this migration safe by identifying the best migration path, implementing Gateway API best practices, and utilizing community-supported migration tools such as ingress2gateway.

Speakers

Keith Mattix

Senior Software Engineering Lead, Microsoft

Keith Mattix is an Engineering Lead at Microsoft focused on Istio, Gateway API, and other networking projects.

Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | 155 EF

Connectivity

Content Experience Level Intermediate

11:55am MST

Database DevOps: CD for Stateful Applications - Stephen Atwell, Harness.io & Christopher Crow, Pure Storage

Thursday November 14, 2024 11:55am - 12:30pm MST

Salt Palace | Level 1 | Grand Ballroom GI

Running stateful applications on Kubernetes can provide many of the same advantages as stateless applications. In this talk, Stephen and Chris will share some thoughts on managing stateful applications as part of a CD Pipeline so that applications - and the application's data - can be versioned and deployed safely and repeatedly. This talk will discuss managing persistent data within kubernetes, as well as managing structural changes to a database as part of a CD process. With Kubernetes and liquibase, we can provide something better than before: A more testable, repeatable, and open way to deploy stateful applications. This talk features a practical demo of how CD tooling can empower users to automate data migrations within Kubernetes.

Speakers

Christopher Crow

Technical Marketing Engineer, Pure Storage

Chris Crow works as a cloud architect at Portworx. He has worked previously as an education, systems administrator. He is a lifelong open-source enthusiast.

Stephen Atwell

Principal Product Manager, Harness.io

With over 26 years of technology experience, Stephen focuses on solving problems encountered in his previous roles. He has worn hats ranging from network administrator, to database administrator, to software engineer, to product manager. Outside of work, Stephen develops open source... Read More →

Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom GI

Data Processing + Storage

Content Experience Level Intermediate

11:55am MST

Multi-Zone Clusters Inside and Out - Tom Dean, Buoyant

Thursday November 14, 2024 11:55am - 12:30pm MST

Salt Palace | Level 1 | 155 BC

Multi-zone clusters are a great tool for improving application reliability — and also a great way to spend a ton of cash. Why? What really happens when you set these things up? How do you use them effectively without bankrupting your whole organization? In this session, we'll dig into the nuts and bolts of what goes on under the hood of a multi-zone cluster, including what a zone is, what Kubernetes understands about zones, how zones affect routing, and why multi-zone clusters can drive costs up. We'll spend some time on Kubernetes' Topology Aware Routing, covering its advantages as well as its very real limitations. Finally, we'll dive into how you can influence Kubernetes' choices to take advantage of multi-zone clusters' reliability while containing costs. Join us for learning and live demos!

Speakers

Tom Dean

Field Engineer, Buoyant

Tom Dean started programming BASIC on Apple IIs over 40 years ago, and has been hooked on tech since then. A long-time user of Linux and Open Source, he has been expanding his Cloud, Cloud Native and adjacent subject matter knowledge to become a more well-rounded technologist, and... Read More →

Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | 155 BC

Operations + Performance

Content Experience Level Intermediate

11:55am MST

From Chaos to Calm: Building a Unified and Scalable CI/CD Pipeline at Akamai - Tomer Patel, Akamai Technologies Inc.

Thursday November 14, 2024 11:55am - 12:30pm MST

Salt Palace | Level 2 | 250

Are you struggling with a chaotic development process? Join Akamai's talk and discover how we built a unified and scalable CI/CD pipeline, saving 40% of our QA, Performance, Dev, and Ops daily work, and how you can do that in your organization! This session dives into the architecture, key features, and its impact on development efficiency. You will learn how to: - Conquer cloud-native deployments by adding the right tools - such as Argo Rollouts, and Backstage - Integrate CI/CD tools (ArgoCD, Jenkins, DevSpace, Grafana, Prometheus, Thanos) for a smoother workflow. - Leverage best-in-breed, cost-efficient open-source solutions

Speakers

Tomer Patel

Senior Engineering Manager, Akamai Technologies Inc.

Tomer currently works as Senior Engineering Manager at Akamai Technologies, where he leads a group of Data engineers, Software developers and DevOps at scale. Previously Tomer worked as Team Lead at Clarizen (Now Planview).

Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 2 | 250

SDLC

Content Experience Level Intermediate

11:55am MST

What Agent to Trust with Your K8s: Falco, Tetragon or KubeAmor? - Henrik Rexed, Dynatrace

Thursday November 14, 2024 11:55am - 12:30pm MST

Salt Palace | Level 1 | 151

In the CNCF landscape we have plenty of ebpf based security solutions that help us protect our k8s cluster from runtime vulnerabilities. On paper though Falco, Tetragon and KubeArmor look very similar. Eventually you have to make a choice on which one best fits your needs. To give you additional insights to make your decision join this session. We have run extensive benchmarks against those three solutions and will answer the following questions that came out of our testing: - What are the different featuresets? - What about the performance impact of each agent? - Which privileges does each solution need? - What are the pros and cons across the three options?

Speakers

Henrik Rexed

Cloud Native Advocate, Dynatrace

Henrik is a Cloud Native Advocate at Dynatrace, the leading Observability platform. Prior to Dynatrace, Henrik has worked more than 15 years, as Performance Engineer. Henrik Rexed Is Also one of the Organizer of the conferences named WOPR, KCD Austria and the owner of the Youtube... Read More →

Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | 151

Security

Content Experience Level Intermediate

2:30pm MST

Unlocking Potential of Large Models in Production - Yuan Tang, Red Hat & Adam Tetelman, NVIDIA

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | Hall DE

The recent paradigm shift from traditional ML to GenAI and LLMs has brought with it a new set of non-trivial LLMOps challenges around deployment, scaling, and operations that make building an inference platform to meet all business requirements an unsolved problem. This talk highlights these new challenges along with best-practices and solutions for building out large, scalable, and reliable inference platforms on top of cloud native technologies such as Kubernetes, Kubeflow, Kserve, and Knative. Which tools help effectively benchmark and assess the quality of an LLM? What type of storage and caching solutions enable quick auto-scaling and model downloads? How can you ensure your model is optimized for the specialized accelerators running in your cluster? How can A/B testing or rolling upgrades be accomplished with limited compute? What exactly do you monitor in an LLM? In this session we will use KServe as a case study to answer these questions and more.

Speakers

Yuan Tang

Principal Software Engineer, Red Hat

Yuan is a principal software engineer at Red Hat, working on OpenShift AI. Previously, he has led AI infrastructure and platform teams at various companies. He holds leadership positions in open source projects, including Argo, Kubeflow, and Kubernetes. He's also a maintainer and... Read More →

Adam Tetelman

Principal Product Architect, NVIDIA

Adam Tetelman is a principal architect at NVIDIA leading cloud native initiatives and CNCF engagements across the company; building inference platforms for NVIDIA AI Enterprise and DGX Cloud. He has degrees in computational robotics, computer & systems engineering, and cognitive science... Read More →

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Hall DE

AI + ML

Content Experience Level Intermediate

2:30pm MST

How the Tables Have Turned: Kubernetes Says Goodbye to Iptables - Casey Davenport, Tigera & Dan Winship, Red Hat

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | 155 EF

For decades, iptables has been the preferred packet filtering system in the Linux kernel. Used extensively across the Kubernetes networking ecosystem, iptables is now on the way out and is expected to be removed from the next generation of Linux distributions. With iptables past its prime, where does that leave Kubernetes? The successor to iptables -- nftables -- is ready to carry the torch instead, with a newly released beta kube-proxy implementation in v1.31 and network policy using Calico’s nftables backend. In this talk, Dan and Casey will share what they have learned building Kubernetes Service and NetworkPolicy implementations using nftables. They will cover the history and current status of iptables usage in Kubernetes, the capabilities and performance characteristics of Kubernetes networks running on nftables, and why eBPF may not be the right tool for the job.

Speakers

Casey Davenport

Casey Davenport, Tigera

Casey is a core developer on Calico and has been building Kubernetes networking systems since 2016.

Dan Winship

Senior Principal Software Engineer, Red Hat

Dan is a Tech Lead for Kubernetes SIG Network and has been working on Kubernetes and OpenShift networking for 7 years at Red Hat.

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | 155 EF

Connectivity

Content Experience Level Intermediate

2:30pm MST

Distributed Cache Empowers AI/ML Workloads on Kubernetes Cluster - Yuichiro Ueno & Toru Komatsu, Preferred Networks, Inc.

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | Grand Ballroom GI

Today, storage technologies play a fundamental role in the realm of AI/ML. Read performance is essential for swiftly moving datasets from storage to AI accelerators. However, the rapid enhancement of AI accelerators' performance often outpaces I/O, bottlenecks the training. Due to the scheduling of pods in Kubernetes across multiple nodes, utilizing node-local storage effectively presents a challenge. To address this, we introduce a distributed cache system built atop node-local storages, designed for AI/ML workloads. This cache system has been successfully deployed on our on-premise 1024+ GPUs Kubernetes cluster within a multi-tenancy environment. Throughout our two-year experience operating this cache system, we have overcome numerous hurdles across several components, including the I/O library, load balancers, and the storage backend. We will share the challenges and the solutions we implemented, leading to a system delivering 50+ GB/s throughput and less than 2ms latency.

Speakers

Toru Komatsu

Engineer, Preferred Networks, Inc.

Toru is a machine learning platform engineer at Preferred Networks in Japan. He is the creator and lead developer of youki, an OCI Runtime in Rust, and a maintainer of the OCI Runtime Specification. Additionally, he serves as a reviewer for runwasi and is involved in developing a world that utilizes containers and Wasm. Additionally, he is a member of the Kubernetes org and is especially interested in... Read More →

Yuichiro Ueno

Engineer, Preferred Networks, Inc.

He is currently a machine learning platform engineer at Preferred Networks in Japan. His research and engineering interests include a range of high-performance computing (distributed deep learning, networking/RDMA, and storage technologies), performance engineering, and Kubernete... Read More →

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom GI

Data Processing + Storage

Content Experience Level Intermediate

2:30pm MST

Low-Overhead, Zero-Instrumentation, Continuous Profiling for OpenTelemetry - Christos Kalkanis, Elastic

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | Grand Ballroom HJ

Elastic has recently donated its whole-system continuous profiling agent to OpenTelemetry. After a thorough community review process, the donation was enthusiastically accepted. Leveraging eBPF, the profiling agent provides unprecedented visibility into the runtime behavior of all applications: it builds stacktraces that go from the kernel to userspace native code, all the way into code running into higher level runtimes, enabling users to identify performance regressions, reduce wasteful computations, and debug complex issues faster. This session will explore: - Benefits of eBPF-based continuous profiling compared to conventional approaches that rely on application instrumentation - How the agent builds profiles that seamlessly span kernel, native code and most widely used application runtimes - Integration with the rest of OpenTelemetry: OTLP and Collector

Speakers

Christos Kalkanis

Principal Software Engineer, Elastic

Christos is the technical lead for the edge collection group at Elastic, a maintainer for the OpenTelemetry Profiling SIG and a co-author of the donated OpenTelemetry profiling agent previously known as the Elastic Universal Profiling agent. After more than a decade of focusing on... Read More →

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom HJ

Observability

Content Experience Level Intermediate

2:30pm MST

One Inventory to Rule Them All: Standardizing Multicluster Management - Corentin Debains, Google & Ryan Zhang, Microsoft

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | 155 BC

Most Kubernetes users run more than one cluster, and some run hundreds or more. Crossing cluster boundaries has always been a challenge, because most Kubernetes APIs, tools, and operators are cluster-centric. In fact, there’s a remarkable lack of standard tools and patterns for multi-cluster. Over time users have found ways to stitch clusters together but the community has been asking for standardization.To share multi-cluster tools, Kubernetes sig-multicluster has introduced the “ClusterProfile” API, a critical building block for multi-cluster capabilities. This API provides a canonical way for multicluster controllers and users to iterate over clusters, and to install or manage multi-cluster features. In this talk, we will look at some of the problems inherent to multi-clustering, explain the concepts introduced by this new API and look at implementations and consumers of it.We dive into real life examples of patterns and usage, with products such as Kueue, ArgoCD, and Argo workflow.

Speakers

Ryan Zhang

Principal Software Engineering Manager, Microsoft

Dr. Ryan Zhang is a Principal Software Engineering Manager at Microsoft, working on Azure Kubernetes Service Team. Ryan has been working on Cloud Native open source projects for the past few years including CloudEvents, Open Application Model (OAM) and multi-cluster related initi... Read More →

Corentin Debains

Software Engineer, Google

Corentin Debains is a software engineer at Google working on the GKE Fleet (multicluster platform). He is an active member of Kubernetes’ special interest group sig-multicluster.

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | 155 BC

Operations + Performance

Content Experience Level Intermediate

2:30pm MST

Mastering Cell-Based Architecture: Practical Solutions and Best Practices - Shweta Vohra, Booking.com & Asanka Abeysinghe, WSO2

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 2 | 250

Are you struggling to validate your cell boundaries or facing challenges with greenfield versus brownfield cell-based architectures (CBA)? Do you find it difficult to define enterprise-wide cell boundaries or wish there were best practices to guide you? If these pain points sound familiar, this session is tailored for you. In this talk, we will first guide you through the process of defining an enterprise-wide cell-based architecture for your organization or context. Then we will explore best practices for greenfield, brownfield, and hybrid cell implementations using CBA. By translating common user challenges into actionable implementation references, we aim to elevate your understanding of CBA with real-world use cases and best practices. This session will also cover best practices for the data, security, application, and infrastructure layers, ensuring a comprehensive approach to CBA implementation. Join us to take your knowledge of CBA to the next level!

Speakers

Asanka Abeysinghe

CTO, WSO2

Asanka, WSO2's CTO, is a technology visionary with over 20 years of experience designing and implementing scalable distributed systems, microservices, and business integration solutions. He advances WSO2's corporate reference architecture, collaborates with customers and industry... Read More →

Shweta Vohra

Enterprise Architect, Booking.com

Shweta is an Enterprise Architect and a Cloud Navigator! 🚀 As a seasoned Architect with a vast toolkit in Cloud, Platforms, Data, and ML technologies. She has spent over two decades crafting solutions across various domains and complexity levels. She is a frequent conference speaker... Read More →

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 2 | 250

SDLC

Content Experience Level Intermediate

2:30pm MST

From Standards to Practice: The Journey to Container Maturity - Carmen Chow & Thomas Robinson, Yelp

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | 151

Yelp runs tens of thousands of Docker containers in Kubernetes. How do we track their vulnerabilities, baseline their security needs, and prioritize our most critical findings? Security standards change constantly, so we need a robust model of container maturity to guide our adoption of these standards in a way that addresses Yelp’s specific needs and risk tolerance. Finally, to maximize our model’s value, over 1,000 engineers must understand its practical guidance well enough to apply it to their daily work. This talk covers designing and incorporating a container maturity model into Yelp’s development lifecycle, along with our strategy for proactively improving our security posture. We believe our experiences will assist others in creating similar models that work for their organizations, help evaluate and assess risks to their own containers, and drive next steps towards future risk evaluation platforms.

Speakers

Carmen Chow

Software Engineer, Yelp

Carmen Chow is a Software Engineer on Yelp’s Infrastructure Security team, where she has worked on cost modeling, data lifecycle tools, and Kubernetes observability. Previously, she was an infrastructure developer responsible for containerizing services and migrating them to Kubernetes... Read More →

Thomas Robinson

Software Engineer, Yelp

Tom is a software engineer living near Seattle, Washington. Having previously worked in security research and antivirus software, he's spent the last decade helping keep Yelp secure.

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | 151

Security

Content Experience Level Intermediate

3:25pm MST

Kubernetes Multi-Cluster Networking 101 - Niranjan Shankar, Microsoft & Ram Vennam, Solo.io

Thursday November 14, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 1 | 155 EF

You’ve (somewhat) grasped the networking model of a single Kubernetes cluster. But how do you enable Pods to communicate across clusters? How do service discovery and DNS work for a multi-cluster setup? How do you secure inter-cluster traffic and manage certificates? Not sure? Don’t worry - this session will have the answers. We’ll start by outlining the core requirements for workloads to communicate across clusters. You’ll then learn some common multi-cluster networking topologies, like flat and multi-network setups, and how inter-cluster connectivity and IP address management differ for each of them. Finally, we’ll cover some popular tools for managing and securing traffic between clusters, like service mesh, CNIs, and gateways, and discuss their use-cases. You’ll leave this session with a solid understanding of fundamental terms and concepts - like virtual networking peering, external DNS, trust domains, etc - needed for navigating the multi-cluster networking landscape.

Speakers

Ram Vennam

Solutions Engineer, Solo.io

Ram Vennam is the Director of Solutions Engineering at Solo.io where he helps companies design and build highly scalable, resilient, distributed systems with the latest cloud-native technology. Previously, he was at IBM where he was a Technical Product Manager and Developer Advocate... Read More →

Niranjan Shankar

Software Engineer, Microsoft

Niranjan Shankar is a software engineer at Microsoft working on the Istio-based service mesh add-on for Azure Kubernetes Service (AKS). He has experience with multi-cluster operations, edge traffic management and security, GitOps-based patterns, and policy enforcement with Kubernetes... Read More →

Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | 155 EF

Connectivity

Content Experience Level Intermediate

3:25pm MST

Elastic Data Streaming: Autoscaling Apache Kafka - Jakub Scholz, Red Hat

Thursday November 14, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 1 | Grand Ballroom GI

Autoscaling is an important part of modern cloud-native architecture. It allows applications to handle a big load at peak times while helping to optimize costs and make deployments more green and sustainable at the same time. Apache Kafka is well known for its scalability. It can grow with your project from a small cluster up to hundreds of brokers. But it was not very elastic for a long time and using dynamic autoscaling with it was very hard. This talk will guide the attendees through the main challenges of auto-scaling Apache Kafka on Kubernetes. It will show how these challenges can be solved with the help of new features added recently in Strimzi and Apache Kafka projects such as auto-rebalancing, node pools, or tiered storage. And it will help the users get started with the auto-scaling of Apache Kafka.

Speakers

Jakub Scholz

Senior Principal Software Engineer, Red Hat

Jakub works at Red Hat as Senior Principal Software Engineer. He has long-term experience with messaging and currently focuses mainly on Apache Kafka and its integration with Kubernetes. He is one of the maintainers of the Strimzi project which provides tooling for running Apache... Read More →

Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom GI

Data Processing + Storage

Content Experience Level Intermediate

3:25pm MST

Load-Aware GPU Fractioning for LLM Inference on Kubernetes - Olivier Tardieu & Yue Zhu, IBM

Thursday November 14, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 2 | 255 EF

As the popularity of Large Language Models (LLMs) grows, LLM serving systems face challenges in efficiently utilizing GPUs on Kubernetes. In many cases, dedicating an entire GPU to a small or unpopular model is a waste, however understanding the relationship between request load and resource requirements has been difficult. This talk will study GPU compute and memory requirements for LLM inference servers, like vLLM, revealing an analytical relationship between key configuration parameters and performance metrics such as throughput and latency. This novel understanding makes it possible to decide at deployment time an optimal GPU fraction based on the model's characteristics and estimated load. We will demo an open-source controller capable of intercepting inference runtime deployments on Kubernetes to automatically replace requests for whole GPUs with fractional requests using MIG (Multi-Instance GPU) slices, increasing density hence LLM sustainability without sacrificing SLOs.

Speakers

Olivier Tardieu

Principal Research Scientist, Manager, IBM

Dr. Olivier Tardieu is a Principal Research Scientist and Manager at IBM T.J. Watson, NY, USA. He joined IBM Research in 2007. His current research focuses on cloud-related technologies, including Serverless Computing and Kubernetes, as well as their application to Machine Learning... Read More →

Yue Zhu

Research Scientist, IBM Research

Dr. Yue Zhu is a Research Scientist at IBM Research specializing in foundation model systems and distributed storage systems. Yue obtained a Ph.D. in Computer Science from Florida State University in 2021 and has consistently contribute to sustainability for foundation models and... Read More →

Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 2 | 255 EF

Emerging + Advanced

Content Experience Level Intermediate

3:25pm MST

Measuring All the Costs with OpenCost Plugins - Alex Meijer, Stackwatch

Thursday November 14, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 1 | Grand Ballroom HJ

The CNCF OpenCost project is approaching 5,000 stars on GitHub and has become one of the most popular cost monitoring systems in use. Originally focused on cloud provider and Kubernetes cost monitoring, OpenCost expanded its scope in May 2024 by launching OpenCost Plugins with Datadog as the first reference implementation. These plugins allow users to measure and visualize virtually any cost in OpenCost, without writing a single line of OpenCost code. Alex Meijer, OpenCost and OpenCost Plugins maintainer, will speak on how the OpenCost Plugins ecosystem works and will dive into the use of the open-source FOCUS spec in OpenCost, which is the key to being able to measure nearly any cost. A plugin-enabled OpenCost deployment will be demoed, with an external cost (Datadog) visualized alongside the traditional Kubernetes and cloud provider costs. Alex will also share how to get started with plugins so that users can start analyzing the costs of whatever matters to their unique use case!

Speakers

Alex Meijer

Staff Software Engineer, Stackwatch

Alex Meijer has been working with Kubernetes for his entire career, being at various times a user, operator, and currently as someone working to help others use Kubernetes better. He has served in startups ranging in size from 5-90 people. Alex contributes to the Opencost project... Read More →

Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom HJ

Observability

Content Experience Level Intermediate

3:25pm MST

From Chaos to Harmony, Transforming ML Engineering: A Kubernetes Adoption Journey

Thursday November 14, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 1 | Grand Ballroom BDF

How Ekstra Bladet’s Data Science team went from a small team of ML engineers, who needed to deliver quickly without deep technical infrastructure knowledge, to a rigid and proprietary ML pipeline built from AWS components and triggered by a large and chaotic Infrastructure as Code project. This made it difficult to achieve freedom and required a lot of work to implement and debug. One of the key reasons for adopting Kubernetes for our ML team emerged when we realized that we should serve all stakeholders across the JP/Politikens Hus organization, not just Ekstra Bladet. We then chose Kubernetes as our container infrastructure, which transformed the ML team into a dynamic ML ecosystem with great freedom under responsibility.

Initially, we focused on building robust frameworks for training and deploying ML models as API services and model training. Today, our ML team operates at the forefront of innovation, where we embrace GitOps principles to streamline our machine learning platform. Through careful adoption of advanced techniques such as autoscaling, scheduling, event triggers, and dynamic service deployment, we ensure seamless integration of new ML models into our infrastructure. This evolution has allowed us to effectively meet our diverse needs, while maintaining agility and scalability in our ML operations.

Speakers

Paris Nakita Kejser

Cloud Engineer, JP Politikens Hus

As a certified Cloud Engineer specializing in AWS and Kubernetes, I'm integral to Ekstra Bladet’s Data Science team. My focus lies in optimizing cloud infrastructure, integrating AWS and Kubernetes setups, and driving technological advancements. I contribute to Ekstra Bladet's digital... Read More →

Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom BDF

Platform Engineering

Content Experience Level Intermediate

4:30pm MST

Microsegment Your Network Like Mastercard with AdminNetworkPolicy - John Zaiss, Mastercard & Surya Seetharaman, Red Hat

Thursday November 14, 2024 4:30pm - 5:05pm MST

Salt Palace | Level 1 | 155 EF

Do you manage Kubernetes clusters and need to enforce airtight workload security on a cluster-wide level? This is vital in the Financial Services industry to comply with the PCI Data Security Standard. Mastercard was looking for a built-in Kubernetes solution enabling admins to govern network access between workloads at scale. While exploring different options, they found namespace-scoped NetworkPolicies but wanted to avoid duplicating policies for each namespace. When Kubernetes SIG-Network added AdminNetworkPolicies in v1.25, Mastercard found what they needed! In this session, we will introduce AdminNetworkPolicy and demonstrate applying granular, non-overridable network controls on a live cluster for multi-tenant isolation. Join us to learn how Mastercard is securing microservices in production based on the principle of least privilege and zero trust. We will also share our operational challenges and lessons learnt. Attendees will gain actionable strategies to secure clusters.

Speakers

John Zaiss

Principal Software Engineer, Mastercard

As a Principal Engineer, John brings extensive expertise in Kubernetes, automation, cloud identity architecture, server architecture, VMware ESX, mobile device management, and IT strategy. He is a seasoned information technology professional with a BS in Cybersecurity and a MS in... Read More →

Surya Seetharaman

Principal Software Engineer, Red Hat Inc.

Surya is an Open Source advocate and contributor, active in the Kubernetes SIG-Network working group. She is working as a Principal Software Engineer at Red Hat in the OpenShift Networking team. Her areas of interest include Cloud Infrastructure and Networked Services and Systems... Read More →

Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | 155 EF

Connectivity

Content Experience Level Intermediate

4:30pm MST

Per-Node Api-Server Proxy: Expand the Cluster's Scale and Stability - Weizhou Lan & Iceber Gu, DaoCloud

Thursday November 14, 2024 4:30pm - 5:05pm MST

Salt Palace | Level 1 | 155 BC

For lots of CNCF projects, kinds of daemonsets simultaneously synchronize datas from the Api-server from each node. Especially in large-scale clusters, it creates significant pressure on the Api-server, burdens the network, even affects the stability of the cluster. Some projects have implemented optimization to address this. For instance, Cilium aggregates endpoint information into the CRD CiliumEndpointSlice before distributing it to its daemonset. However, many projects have not yet adopted such data aggregation optimizations and Currently, there is still no project to help improve the communication between all components and the Api-server. ClusterPedia supports to launch per-node Api-server proxies to serve all local pods, and utilize eBPF to resolve the API server's clusterIP to the local proxy, which transparently implements API server access redirection on demand. In large-scale clusters, this can significantly improve the stability of all cluster's services.

Speakers

Iceber Gu

Software Engineer, DaoCloud

Senior open source enthusiast, focused on cloud runtime, multi-cloud and WASM. I am a CNCF Ambassador and founded Clusterpedia and promoted it as a CNCF Sandbox project. I also created KasmCloud to promote the integration of WASM with Kubernetes and contribute it to the WasmCloud... Read More →

Weizhou Lan

Senior Tech Lead, Daocloud

Weizhou Lan, 13+ years of engineering experience, engaged in kubernetes since 2018. a senior tech lead at Daocloud focusing on private cloud, a speaker at KubeCon NA/EU and KCD China, a Program Committee Member for KubeCon, the initiator and maintainer of the CNCF sandbox project... Read More →

Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | 155 BC

Operations + Performance

Content Experience Level Intermediate

4:30pm MST

Mish-Mesh: Abusing the Service Mesh to Compromise Kubernetes Environments - Hillai Ben-Sasson & Nir Ohfeld, Wiz

Thursday November 14, 2024 4:30pm - 5:05pm MST

Salt Palace | Level 1 | 151

Service mesh solutions are common components in almost every large Kubernetes environment. Many engineers and security teams have adopted solutions like Linkerd and Istio to better segment and isolate their Kubernetes networks. In this talk, we will demonstrate how we were able to exploit common misconfigurations and insecure features in popular service mesh solutions, to escalate low-severity vulnerabilities to critical service takeovers. Our real-life examples include several major cloud service providers, where these vulnerabilities allowed us to gain unauthorized access to internal systems and sensitive secrets. This talk will help engineers understand whether their service mesh deployment acts as a proper security barrier, and how to make sure that it does. Security teams – both attackers and defenders – will learn new techniques for hacking Kubernetes environments, and how to properly defend against them.

Speakers

Hillai Ben-Sasson

Nir Ohfeld

Security Researcher, Wiz

Nir Ohfeld is a 25-years-old senior security researcher at Wiz. Ohfeld focuses on cloud-related security research and specializes in research and exploitation of cloud service providers, web applications, application security, and in finding vulnerabilities in complex high-level systems... Read More →

Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | 151

Security

Content Experience Level Intermediate