Loading…
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
or to bookmark your favorites and sync them to your phone or calendar.
strong>Salt Palace | Level 1 | 155 E [clear filter]
arrow_back View All Dates
Wednesday, November 13
 

11:15am MST

All Your Routes Are Ready, More or Less - Dave Protasowski
Wednesday November 13, 2024 11:15am - 11:50am MST
Gateway API is the official next gen Kubernetes API for Ingress, Load Balancing and Service Meshes. Many proxies implement the API and pass conformance with glowing colours! But what is it really like to use the API? What isn't covered by the conformance tests that end-users should know. In the talk we'll highlight our experience adopting the Gateway API in the Knative Serving project. We'll talk about the problems we encountered and how we addressed them. Come to the talk and we'll pit some implementations against each other and show some numbers!
Speakers
avatar for Dave Protasowski

Dave Protasowski

Staff Engineer
Dave Protasowski is part of the Knative Technical Committee and a Serving Working Group Lead. Prior he worked on Tanzu stuff at VMware/Broadcom and Cloud Foundry things at Pivotal.
Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 1 | 155 E
  Connectivity

12:10pm MST

Can Your Kubernetes Network Handle the Heat? Building Resilience with AI Chaos - Lior Lieberman, Google & Surya Seetharaman, Red Hat
Wednesday November 13, 2024 12:10pm - 12:45pm MST
Kubernetes networking is complex with many APIs, numerous configurations and potential failure points. In the rapidly evolving world of cloud-native applications, ensuring your Kubernetes network can withstand unexpected failures is not just an advantage—it is a necessity. In this talk Surya and Lior, holding distinct leadership roles in Gateway API and NetworkPolicy API, will demonstrate how you can leverage AI-powered Chaos Engineering to stress test Gateways, NetworkPolicies, and Services on a live cluster! They will share their experiences and lessons learned from using Litmus and enhancing K8sGPT to design and execute AI Chaos experiments, as well as focusing on how you can proactively find gaps and bottlenecks in the network infrastructure. This is a great opportunity to learn from real-world disruption scenarios and participate in a collaborative discussion on how we can leverage AI to build robust Kubernetes Networks.
Speakers
avatar for Surya Seetharaman

Surya Seetharaman

Principal Software Engineer, Red Hat Inc.
Surya is an Open Source advocate and contributor, active in the Kubernetes SIG-Network working group. She is working as a Principal Software Engineer at Red Hat in the OpenShift Networking team. Her areas of interest include Cloud Infrastructure and Networked Services and Systems... Read More →
avatar for Lior Lieberman

Lior Lieberman

Site Reliability Engineer, Google
Lior is site reliability engineer at Google working on Google Compute Engine. He is a leading maintainer of ingress2gateway, and an active contributor to Kubernetes SIG network focused on Gateway API.
Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 1 | 155 E
  Connectivity

2:30pm MST

Cilium, eBPF, WireGuard: Can We Tame the Network Encryption Performance Gap? - Daniel Borkmann & Anton Protopopov, Isovalent
Wednesday November 13, 2024 2:30pm - 3:05pm MST
To increase data security for cloud and hybrid cloud deployments, many companies, governments, standards, and tenders require data in transit to be protected. However, network encryption comes at a cost - what is the performance impact and how can we reduce it? In this session, we explore how network encryption can be efficiently enforced with Cilium, eBPF, and WireGuard. We dive deep into Cilium’s integration of WireGuard and elaborate on both the management plane and Cilium’s eBPF datapath. We analyze and benchmark what performance cost one can expect and explore opportunities in the Linux kernel to reduce that price. This talk is for operators and security teams that need to encrypt network traffic, but also want to minimize its overhead. The audience will walk away understanding whether network encryption needs to come at a high toll and whether there are opportunities for optimizations.
Speakers
avatar for Daniel Borkmann

Daniel Borkmann

Software Engineer, Isovalent at Cisco
Daniel Borkmann co-created eBPF and is a kernel developer at Isovalent working on eBPF, the Linux kernel and Cilium. He is a long-term Linux kernel core contributor in the eBPF and networking subsystem for over a decade and co-maintains eBPF and XDP. In his spare time, he loves to... Read More →
avatar for Anton Protopopov

Anton Protopopov

Software Engineer, Isovalent at Cisco
Anton is a software engineer at Isovalent, which is now part of Cisco.Anton is leading a team building new generation of Isovalent products and also participates in developing eBPF-based parts of Cilium stack and on eBPF support in the Linux Kernel.During his career, Anton played... Read More →
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | 155 E
  Connectivity

3:25pm MST

Extending the Gateway API: The Power and Challenges of Policies - Kate Osborn, NGINX
Wednesday November 13, 2024 3:25pm - 4:00pm MST
From the beginning, the Gateway API has been designed to be extensible. With over 25 implementations to date, it’s crucial that these implementations have a way to support implementation-specific features without resorting to annotations. Among the various ways to extend the Gateway API, the Policy Attachment mechanism stands out as the most potent and challenging. In this session, we will explain what Policy Attachment is and share the lessons we learned at NGINX when implementing our own Policies. You will learn about: - The difference between direct and inherited policies. - How policy inheritance and merging works. - Corner cases, such as conflicting policies and invalid target refs. - Techniques to verify if a policy has been successfully applied. - Strategies for troubleshooting policies. We will show you examples of Gateway API policies as well as policies from multiple Gateway API implementations.
Speakers
avatar for Kate Osborn

Kate Osborn

Senior Software Engineer, NGINX
Maintainer of NGINX Gateway Fabric. Kubernetes enthusiast since 2018.
Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | 155 E
  Connectivity

4:30pm MST

From Observability to Performance - Nadia Pinaeva, Red Hat & Antonio Ojea, Google
Wednesday November 13, 2024 4:30pm - 5:05pm MST
No matter how fast the Services on your Kubernetes cluster are, users would love them to be faster. But how do you get from a huge pile of metrics across a distributed system to real user experience improvements? There is a way, and with the right tools and the right approach, you can better understand and evaluate Service performance. In this talk, you'll learn how to identify the performance parameters that directly translate to user experience. We will explore how to collect performance metrics from running Kubernetes clusters without disrupting normal operations using tools like Prometheus, Grafana, kube-burner, and custom instrumentation. We will discuss how to translate the collected metrics and analysis into concrete actions and how to identify bottlenecks and implement optimizations to enhance Service performance. This talk is ideal for k8s networking developers, administrators, SREs, DevOps engineers, and anyone responsible for managing or optimizing Kubernetes networking.
Speakers
avatar for Antonio Ojea

Antonio Ojea

Software Engineer, Google
Antonio Ojea is a Software Engineer at Google, where he works on Kubernetes. He is one of the top contributors of the Kubernetes project, with a stronger presence on the areas of networking and reliability. He has a vast experience in Open Source, networking and distributed systems... Read More →
avatar for Nadia Pinaeva

Nadia Pinaeva

Senior Software Engineer, Red Hat
Nadia Pinaeva is a Senior Software Engineer at Red Hat working on Openshift Networking. She collaborates with the SIG-network-policy to improve network security for Kubernetes clusters, and works on ovn-kubernetes network plugin.
Wednesday November 13, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | 155 E
  Connectivity

5:25pm MST

Building Resilience for Large-Scale AI Training: GPU Management, Failure Detection, and Beyond - Ganeshkumar Ashokavardhanan, Microsoft & Ace Eldeib, Cohere
Wednesday November 13, 2024 5:25pm - 6:00pm MST
As AI training scales to thousands of GPUs across hundreds of machines, hardware failure becomes an expensive risk. From GPU faults to network performance degradation, undetected problems can sabotage training jobs, inflating costs, and slowing development. This talk dives into failure and orchestration challenges in the context of ML training, particularly distributed training. We will explore the spectrum of GPU issues, and why even minor performance drops can cripple large jobs. Learn how observability (leveraging tools like NVIDIA DCGM) enables proactive problem detection through GPU health checks. Understand principles of fault-tolerant distributed training to mitigate GPU failure fallout. Drawing on experience from cloud providers and training large language models, we will share best practices for efficient identification, remediation, and prevention of GPU failures.
Speakers
avatar for Ganeshkumar Ashokavardhanan

Ganeshkumar Ashokavardhanan

Software Engineer, Microsoft
Ganesh is a Software Engineer on the Azure Kubernetes Service team at Microsoft, working on node lifecycle, and is the lead for the GPU workload experience on this kubernetes platform. He collaborates with partners in the ecosystem like NVIDIA to support operator models for machine... Read More →
avatar for Ace Eldeib

Ace Eldeib

Staff Software Engineer, Cohere
Ace is a Staff Software Engineer at Cohere working on training and serving infrastructure for large language models. Prior to that, he worked on Azure Kubernetes service and ran self-managed Kubernetes for other Azure services.
Wednesday November 13, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | 155 E
  AI + ML
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date - 
  • 🚨 Contribfest
  • 🪧 Poster Sessions
  • AI + ML
  • Breaks
  • ⚡ Lightning Talks
  • Cloud Native Experience
  • Cloud Native Novice
  • CNCF-hosted Co-located Events
  • Connectivity
  • Data Processing + Storage
  • Diversity + Equity + Inclusion
  • Emerging + Advanced
  • Experiences
  • Keynote Sessions
  • Maintainer Track
  • Observability
  • Operations + Performance
  • Platform Engineering
  • Project Opportunities
  • Registration
  • SDLC
  • Security
  • Solutions Showcase
  • Sponsor-hosted Co-located Event
  • Tutorials