Loading…
Attending this event?
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
arrow_back View All Dates
Thursday, November 14
 

11:00am MST

Shifting Gears: Leveraging CNCF Tools to Streamline Operations at Toyota Connected - Benson Phillips & Rob Heckel, Toyota Connected
Thursday November 14, 2024 11:00am - 11:35am MST
In the evolving landscape of cloud-native ecosystems, aligning teams and standardizing practices is crucial for operational excellence. At Toyota Connected, we faced significant challenges due to inconsistent practices and fragmented collaboration across departments. To address this, we adopted a suite of CNCF tools including ArgoCD, Backstage, Harbor, External Secrets Operator, and OpenCost. This session will delve into our journey of implementing these tools to unify our approach, streamline workflows, and enhance cross-team collaboration. Attendees will gain insights into the practical application of these tools, our successes and failures, and the substantial reduction in time to market achieved. By focusing on the integration of technical solutions and effective team practices, we aim to foster a cohesive and efficient cloud-native environment. This presentation provides actionable strategies for leveraging CNCF tools to drive innovation and excellence in your organization.
Speakers
avatar for Benson Phillips

Benson Phillips

Platform Architect, Toyota Connected
Software oriented, primarily working with cloud native computing. But my interests do not stop there as my love for technology is boundless.
avatar for Rob Heckel

Rob Heckel

Platform Architect, Toyota Connected North America
Rob has over 15 years in technology, specializing in open source and developer enablement. As a Platform Architect for Toyota Connected, he enhances DevOps, SDLC, and SRE practices. He has led the creation of an internal developer platform, streamlined tool integrations, and promoted... Read More →
Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 255 BC
  Cloud Native Experience
  • Content Experience Level Any

11:00am MST

Lesson’s Learned Adopting OpenTelemetry at Scale - Alex Arnell, Heroku / Salesforce
Thursday November 14, 2024 11:00am - 11:35am MST
OpenTelemetry makes bold promises to unlock and unleash your observability, providing you with open standards, no vendor lock-in and interoperability with just about everything. You believe that your organization could really benefit from an uplift to modern observability. It would be easy to adopt if you were was starting out fresh, but let’s face it, most organizations have sprawling codebases and architectures. Decisions, infrastructure and often engineers that have been in place for decades. How do you even get started? This Heroku case study dives into our OpenTelemetry journey where you'll discover strategies on adoption, how to deal with internal resistance, and technical guidance on rolling out the change. Learn from our missteps and what we wished we had done differently. You’ll even see how a bit of luck can help drive adoption over the finish line. This session will equip you to navigate OpenTelemetry adoption in the most entrenched environments.
Speakers
avatar for Alex Arnell

Alex Arnell

Principal Engineer, Heroku / Salesforce
Alex Arnell is a Principal Engineer at Heroku / Salesforce with over two decades of software development experience. Alex has spent the last decade specializing in telemetry and observability systems. Alex is the lead engineer of the Telemetry team at Heroku, responsible for the collection... Read More →
Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom HJ
  Observability
  • Content Experience Level Any

11:00am MST

Engineering a Kubernetes Operator: Lessons Learned from Versions 1 to 5 - Andrew L'Ecuyer, Crunchy Data
Thursday November 14, 2024 11:00am - 11:35am MST
Join me to uncover insights and hard-learned lessons from our journey through the first five versions of a Kubernetes Operator for Postgres. I will trace the development lifecycle from version 1 started in 2017 to version 5 now. Each version represents a milestone in addressing specific challenges, functionality, stability, and performance. We will discuss the architectural decisions, design patterns, and implementation strategies that shaped the evolution of the Operator. Key topics will include handling stateful applications, ensuring high availability, building for flexible deployment models, scalability, and managing rolling upgrades for both the Operator and underlying software. By the end of this session, participants will be equipped with practical knowledge and actionable strategies for engineering their own Kubernetes Operators, ready to accelerate their development process and avoid common pitfalls.
Speakers
avatar for Andrew L'Ecuyer

Andrew L'Ecuyer

Sr. Director of Kubernetes Engineering, Crunchy Data
Andrew head’s up the Kubernetes Engineering Team at Crunchy Data. With a diverse background spanning both the public and private sectors, Andrew has played a key role in designing, building and integrating complex systems of all shapes and sizes. He holds degrees in both Computer... Read More →
Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any

11:00am MST

Yahoo’s Kubernetes Journey from on-Prem to Multi-Cloud at Scale - Nandhakumar Venkatachalam & Payal Patel, Yahoo
Thursday November 14, 2024 11:00am - 11:35am MST
Yahoo is an early adopter of Kubernetes, operating 37 on-prem and 42 multi-cloud production clusters hosting 2700 applications. Our team offers a simple yet powerful interface for users to deploy applications onto our managed clusters. Since 2015, we have handled multiple complex upgrades, including Operating Systems and Kubernetes, upgrading from version 1.0.3 to 1.30.0. In 2023, Yahoo announced plans to migrate to both GCP and AWS cloud platforms. Leveraging extensive knowledge, our team successfully provisioned Kubernetes clusters in a multi-cloud environment within a short period. Our team faced numerous challenges during the cloud adoption process, including networking, security, cluster autoscaling, and cost. In this talk, we will share managing K8S in a multi-cloud and discuss the challenges faced and solutions found. Key topics include Shared VPC, IP Space for K8s, securely accessing private clusters, multi-tenant workload identity, and maintaining a user interface to K8S.
Speakers
avatar for Nandhakumar Venkatachalam

Nandhakumar Venkatachalam

Sr Princ Production Engineer, Yahoo Inc
Nandhakumar Venkatachalam is a Senior Principal Production Engineer at Yahoo Inc. As a lead engineer responsible for operating the large-scale Kubernetes cluster, he has played a key architect role in building scalable cloud infrastructure. Nandha has been with Yahoo for over 17 years... Read More →
avatar for Payal Patel

Payal Patel

Principal Software Development Engineer, Yahoo
Payal Patel is a Principal Software Development Engineer in the Cloud Infrastructure team at Yahoo. She is currently developing a hybrid cloud solution for Kubernetes clusters in AWS and GCP to set up the Kubernetes clusters at scale. Before that, she worked on managing the Kubernetes... Read More →
Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 2 | 251
  Platform Engineering
  • Content Experience Level Any

11:55am MST

Democratizing AI Model Training on Kubernetes with Kubeflow TrainJob and JobSet - Andrey Velichkevich, Apple & Yuki Iwai, CyberAgent, Inc.
Thursday November 14, 2024 11:55am - 12:30pm MST
Running model training on Kubernetes is challenging due to the complexity of AI/ML models, large training datasets, and various distributed strategies like data and model parallelism. It is crucial to configure failure handling, success criteria, and gang-scheduling for large-scale distributed training to ensure fault tolerance and elasticity. This talk will introduce the new Kubeflow TrainJob API, which democratizes distributed training and LLM fine-tuning on Kubernetes. The speakers will demonstrate how TrainJob integrates with Kubernetes JobSet to ensure scalable and efficient AI model training with simplified Python experience for Data Scientists. Additionally, they will explain the innovative concept of reusable and extendable training runtimes within TrainJob. The speakers will highlight how these capabilities empower data scientists to rapidly iterate on their ML development, making Kubernetes more accessible and beneficial for the entire ML ecosystem.
Speakers
avatar for Andrey Velichkevich

Andrey Velichkevich

Senior Software Engineer, Apple
Andrey Velichkevich is a Senior Software Engineer at Apple and is a key contributor to the Kubeflow open-source project. He is a member of Kubeflow Steering Committee and a co-chair of Kubeflow AutoML and Training WG. Additionally, Andrey is an active member of the CNCF WG AI. He... Read More →
avatar for Yuki Iwai

Yuki Iwai

Software Engineer, CyberAgent, Inc.
Yuki is a Software Engineer at CyberAgent, Inc. He works on the internal platform for machine-learning applications and high-performance computing. He is currently a Technical Lead for Kubeflow WG AutoML / Training. He is also a Kubernetes WG Batch active member and a Kubernetes... Read More →
Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Hall DE
  AI + ML
  • Content Experience Level Any

11:55am MST

Tick, TAG, TOC - Keeping Cloud Native Running - Karena Angell & Emily Fox, Red Hat; Rajas Kakodkar, Broadcom; Alex Chircop, Akamai; Ricardo Aravena, Truera
Thursday November 14, 2024 11:55am - 12:30pm MST
With only so many hours in the day, how does the cloud native community keep things running? Over 190 projects, thousands of contributors, and an array of groups all contribute to what we know as “cloud native” but there is more going on behind the scenes that keep the machine of cloud native running smoothly and driving the technical direction of the landscape. In this panel discussion, you’ll hear from Chairs and Technical Leads of Technical Advisory Group (TAG) Runtime, Storage, App Delivery and the chair of the CNCF Technical Oversight Committee (TOC) on - How they are defining the roadmap for the future - The glue and oil of collaboration between advisory, oversight, and projects’ health - How you can time your engagement with these groups to have an outsized impact! This is not a maintainer track session. While they are separate tracks for specific CNCF TAG and TOC activities, this is meant to be your backstage pass to see how the CNCF landscape gets shaped!
Speakers
avatar for Alex Chircop

Alex Chircop

Chief Product Architect at Akamai, Akamai
Chief Product Architect at Akamai. Previously a founder and CTO of Ondat (formerly StoraeOS), building software defined solutions for cloud native environments. Alex is also a co-chair of the CNCF Storage TAG (previously SIG). Before embarking on the startup adventure he spent over... Read More →
avatar for Ricardo Aravena

Ricardo Aravena

Cloud Native Lead, Truera
Ricardo currently works at TruEra as a Cloud Infrastructure Lead helping automate everything with cloud native technologies. He's an open source enthusiast and co-chair of the CNCF TAG-Runtime. He has been working in tech for more than 20 years and comes from a diverse professional... Read More →
avatar for Karena Angell

Karena Angell

Senior Principal Chief Architect, Red Hat
Karena Angell is a Senior Principal Chief Architect at Red Hat focusing on cloud native application workloads for Kubernetes, open source software projects, as well as solutions for the 'open' hybrid cloud.
avatar for Rajas Kakodkar

Rajas Kakodkar

Senior Member of Technical Staff | Tech Lead TAG Runtime CNCF, Broadcom
Rajas is a senior member of technical staff at Broadcom and a tech lead of the CNCF Technical Advisory Group, Runtime. He is actively involved in the AI working group in the CNCF. He is a Kubernetes contributor and has been a maintainer of the Kube Proxy Next Gen Project. He has also... Read More →
avatar for Emily Fox

Emily Fox

Emerging Technologies Security Lead, Red Hat
Emily Fox is a DevOps enthusiast, security unicorn, and advocate for Women in Technology. She promotes the cross-pollination of development and security practices. She has worked in security for over 14 years to drive a cultural change where security is unobstructive, natural, and... Read More →
Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 2 | 255 BC
  Cloud Native Experience
  • Content Experience Level Any

11:55am MST

Running Quantum-Safe Applications on Kubernetes - Paul Schweigert & Michael Maximilien, IBM Quantum
Thursday November 14, 2024 11:55am - 12:30pm MST
Quantum computers pose a unique threat to computer security, as the encryption standards we rely upon are vulnerable to powerful quantum computers. While those computers are still several years away, "harvest now, decrypt later" attacks put all data not protected using quantum-safe security at risk. So what can we do now to protect our applications? In this talk, Paul will demo how to deploy a quantum-safe application on Kubernetes. He'll provide a brief overview of quantum-safe cryptography and why it's needed, highlight key work being done in the open source community to migrate to quantum-safe cryptography, and conclude with a demo of how to build a quantum-safe cloud-native application. In particular, he'll show where and how to make changes to a Kubernetes environment to ensure users are protected by quantum-safe connections. At the conclusion of this session, listeners will have a set of practical steps they can take to help secure their applications in a post-quantum world.
Speakers
avatar for Michael Maximilien

Michael Maximilien

Distinguished Engineer, IBM
My name is Michael Maximilien, better known as max or dr.max, and I am a currently a Distinguished Engineer with IBM. I am the leader for IBM’s Open Source team contributing to all things Serverless and Platform-as-a-Service (PaaS). I have worked at various divisions of IBM. At... Read More →
avatar for Paul Schweigert

Paul Schweigert

Senior Software Engineer, IBM
Paul Schweigert works on quantum and serverless technologies at IBM. He has extensive experience in open source (Knative and Kubernetes in particular) and has spoken at numerous conferences. He has also led various platform engineering and data science teams. In a previous life, he... Read More →
Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 2 | 255 EF
  Emerging + Advanced
  • Content Experience Level Any

11:55am MST

Cognitive and Self-Adaptive System for Effective Distributed-Tracing in Applications - Mitul Tandon & Akash Gusain, VMware; Susobhit Panigrahi, Broadcom
Thursday November 14, 2024 11:55am - 12:30pm MST
In response to challenges of limited trace capture in dynamic API tracing systems, the solution leverages Machine Learning and Cognitive approach for unbiased trace collection. Unlike existing implementations with a skewed distribution(~5%) towards normal traces, our self-adaptive system dynamically learns to prioritise and capture diverse traces, crucial for effective diagnosis of API failures and performance issues. This innovative approach significantly enhances the SREs ability to triage complex issues, leading to a game-changing reduction in Mean Time to Resolve (MTTR). The Adaptive Sampling approach analyses existing system traces and autonomously adjusts the sampling rate, eliminating manual configs. This ML-based solution outcome includes streamlined trace metric analysis, enhanced reliability work efficiency, and considerable infrastructure cost reduction through targeted trace collection, ultimately making a significant impact on operational effectiveness & reliability
Speakers
avatar for Susobhit Panigrahi

Susobhit Panigrahi

Senior Software Engineer
As a Developer and DevOps Engineer at VMware, I specialize in developing scalable cloud software. My focus includes deploying and managing services with Kubernetes, Helm, and Istio. I'm keen to contribute to the open-source community, especially in Kubernetes and other CNCF projects... Read More →
avatar for Akash Gusain

Akash Gusain

Software Engineer, VMware
Akash Gusain is a Software Engineer at VMware with over two years of experience in building and deploying cloud-native applications. At VMware, Akash has contributed to the development of scalable and robust cloud solutions, demonstrating expertise in various technologies and fra... Read More →
avatar for Mitul Tandon

Mitul Tandon

DevOps Engineer, VMware
A DevOps/SRE Engineer at VMware with 2+ years of experience with working on distributed systems and containerised applications.
Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom HJ
  Observability
  • Content Experience Level Any

11:55am MST

Evolving Reddit’s Infrastructure via Principled Platform Abstractions - Karan Thukral & Harvey Xia, Reddit
Thursday November 14, 2024 11:55am - 12:30pm MST
Reddit’s approach to infrastructure management has grown organically over time, adapted to solve tactical, near term problems. We have now reached a point where the only way to scale infrastructure capabilities to a growing engineering organization is through platform abstractions offering self-service management of standardized infrastructure patterns. Beginning in 2021, a concerted effort was made to reimagine infrastructure as an internal platform that empowers both application and infrastructure engineers to build impactful and maintainable systems. We present a case study of Reddit’s ongoing journey in evolving its infrastructure management practices from inefficient, human-in-the-loop processes to efficient, self-service interfaces. By treating Kubernetes as a universal control plane and extending it with custom control processes fronted by well-designed interfaces, we are moving the organization towards this vision. This will cover the the many trade-offs and lessons learnt.
Speakers
avatar for Harvey Xia

Harvey Xia

Staff Engineer, Compute Infrastructure @ Reddit, Reddit
I'm a software engineer with experience across a variety of disciplines including backend engineering, data engineering, and most recently, infrastructure engineering. I specialize in building cloud native infrastructure platform features.
avatar for Karan Thukral

Karan Thukral

Senior Engineer, Compute Infrastructure @ Reddit, Reddit
Karan is a Senior Software Engineer at Reddit working on the Compute team to build an easy to use internal developer platform which is scalable and reliable. He has been working in this problem space since 2017 building both internal and external developer platforms including App... Read More →
Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any

2:30pm MST

What Istio Got Wrong: Learnings from the Last Seven Years of Service Mesh - Christian Posta & Louis Ryan, Solo.io
Thursday November 14, 2024 2:30pm - 3:05pm MST
Building complex systems often requires simplicity in components—a lesson the Istio project has learned throughout its seven(plus)-year journey. Although Istio offers a lot of powerful features for application networking, crucial for many organizations, the path to maturity and broader adoption was fraught with challenges. In this talk, we explore the key mistakes made during Istio's development, including its initially complex architecture, an overload of features, premature release of version 1.0, difficulties faced by contributors, and delays in joining the CNCF. We will discuss the impact of these mistakes, how these missteps were addressed, and how they have positioned Istio as a leader in the service mesh market. This presentation will detail how Istio's evolution reflects a shift towards simpler, more modular components that together offer effective solutions for managing APIs and service-to-service communication regardless of platform.
Speakers
avatar for Louis Ryan

Louis Ryan

CTO, Solo.io
Co-creator of Istio and gRPC
avatar for Christian Posta

Christian Posta

Global Field CTO, Solo.io
Christian Posta (@christianposta) is Global Field CTO at Solo.io. He is the author of Istio in Action and many other books on cloud-native architecture. He's well known in the cloud-native community for being a speaker, blogger (https://blog.christianposta.com) and contributor to... Read More →
Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 2 | 255 BC
  Cloud Native Experience
  • Content Experience Level Any

2:30pm MST

Tutorial: Live with Gateway API V1.2 - Flynn, Buoyant & Mike Morris, Microsoft
Thursday November 14, 2024 2:30pm - 4:00pm MST
Gateway API v1.2 is here! We have GA support for service mesh! We have timeouts in HTTPRoutes! We have GRPCRoutes! And we still have precious few real-world walkthroughs of using Gateway API to get real things done… In this hands-on workshop hosted by Gateway API contributors and GAMMA co-leads, we’ll start with completely unconfigured clusters, walk through installing a demo app with your choice of ingress controller and service mesh (Envoy Gateway + Linkerd, or Istio), then dig into actually using Gateway API for routing, resilience, and progressive delivery with an application using HTTP and gRPC at the same time. You’ll walk away with practical, real-world knowledge about what Gateway API can do and how to use it, and portable skills you’ll be able to apply to the many projects implementing Gateway API!
Speakers
avatar for Flynn -

Flynn -

Tech Evangelist, Buoyant
Flynn is a tech evangelist at Buoyant, educating developers about Linkerd, Kubernetes, and cloud-native development in general. He has spent 40 years in software engineering (from the kernel up through distributed applications, with a common thread of communications and security throughout... Read More →
avatar for Mike Morris

Mike Morris

Senior Product Manager, Microsoft
Mike is a product manager at Microsoft working on upstream open source projects with a focus on Istio service mesh, and a Gateway API for service mesh co-lead. He is interested in building healthy, sustainable communities and scalable distributed systems, and working collaboratively... Read More →
Thursday November 14, 2024 2:30pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom ACE

3:25pm MST

TLS and MTLS: Introduction to Modern Security - Andrew Davis, Independent & Sandeep Kanabar, Gen (formerly NortonLifeLock)
Thursday November 14, 2024 3:25pm - 4:00pm MST
A constant presence in our lives for nearly 25 years, TLS is a cornerstone of modern security practice — especially in a zero-trust world. In cloud native, mTLS comes up every time service meshes get mentioned. Even so, both these technologies are still sources of endless questions. How do they work? How are they related? What problems do they solve – and which others do they not solve? How does it relate to end-user auth? What's all this stuff with certificates anyway? And why should you care about these things? Thankfully, answering these questions isn't that complex. Sandeep Kanabar, Lead Software Engineer at Gen, and Andrew Davis, a Cybersecurity Expert—both Deaf & Hard of Hearing WG members—will discuss what TLS and mTLS are, what they do, how they work, why they matter as standards, and what nearly 25 years of attacking them have to say about security. They'll use Linkerd as an example, but this talk will apply to any situation involving mTLS or TLS, no matter the implementation.
Speakers
avatar for Sandeep Kanabar

Sandeep Kanabar

Lead Software Engineer, Gen (formerly NortonLifeLock)
Hailing from India, Sandeep is a passionate software engineer working at Gen (formerly NortonLifeLock). A frequent meetup speaker, Sandeep enjoys sharing his lessons learned from 15+ years in the tech space with the community. He's a staunch advocate for diversity and inclusion and... Read More →
avatar for Andrew Davis

Andrew Davis

Cybersecurity Specialist, Not Applicable
A passionate self-taught cybersecurity expert, Andrew Davis is a big believer in life-long learning. He has worked for various Fortune 500 companies, including DELL and Fidelity Investments. Deaf himself, Andrew is a strong advocate for accessibility. He's an active member of the... Read More →
Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 2 | 251
  Cloud Native Novice
  • Content Experience Level Any

3:25pm MST

You're Overpaying for CI - Kyle Penfound, Dagger
Thursday November 14, 2024 3:25pm - 4:00pm MST
In recent years, the computational power of developer workstations has surged dramatically. With so much compute available at every developer's fingertips, why do we continue to waste time and money with lengthy build times on sluggish CI compute? Some forward-thinking organizations are re-evaluating this approach, questioning the necessity of paying for CI compute when the developers' workstations, which are already more powerful and paid for, remain underutilized. In this technical session we will transition a fully functioning production CI system from cloud-based compute to local workstation compute. We will explore the intricacies of replicating the functionality of a modern CI system, leveraging the power of developer workstations, all using open source software.
Speakers
avatar for Kyle Penfound

Kyle Penfound

Solutions Engineer, Dagger
Kyle is part of the ecosystem team at dagger.io working on the future of CICD. He has a background in DevOps and just loves giving demos!
Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 2 | 250
  SDLC
  • Content Experience Level Any

3:25pm MST

It's Dangerous to Build It Alone, Take This. - Jeremy Rickard & Ashna Mehrotra, Microsoft
Thursday November 14, 2024 3:25pm - 4:00pm MST
You've got high and critical CVEs in open source software packages that are critical to your platform or business. Time is almost up to patch them, and the upstream project hasn't fixed things. If you don't patch, your accreditation might be at risk. You're going to have to do it yourself! But where do you start? Fork the projects? Can you just patch in place? In this session, you'll learn about tools and strategies that can help you respond to CVEs in your container images faster, starting with patching existing images in place with Copacetic and moving on to patching and building projects from scratch. We'll look at challenges to building and testing upstream projects using existing tools and learn from emerging practices in industry. We'll also talk about how to inform your teams to stop using bad images! After this session, you'll have best practices and tools at your disposal, understand some of the pitfalls of owning your entire open source software supply chain.
Speakers
avatar for Ashna Mehrotra

Ashna Mehrotra

Software Engineer, Microsoft
Ashna Mehrotra is a software engineer on the Upstream Security team, working on cloud-native open source security projects at Microsoft.
avatar for Jeremy Rickard

Jeremy Rickard

Principal Software Engineer, Microsoft
Jeremy Rickard is a principal software engineer at Microsoft where he works on the Azure Container Upstream team. He is currently a co-chair for SIG Release and serves on both the CNCF and the Kubernetes Code of Conduct Committees. He was also the Kubernetes 1.20 Release Lead.
Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | 151
  Security
  • Content Experience Level Any

4:30pm MST

Which GPU Sharing Strategy Is Right for You? a Comprehensive Benchmark Study Using DRA - Kevin Klues & Yuan Chen, NVIDIA
Thursday November 14, 2024 4:30pm - 5:05pm MST
Dynamic Resource Allocation (DRA) is one of the most anticipated features to ever make its way into Kubernetes. It promises to revolutionize the way hardware devices are consumed and shared between workloads. In particular, DRA unlocks the ability to manage heterogeneous GPUs in a unified and configurable manner without the need for awkward solutions shoehorned on top of the existing device plugin API. In this talk, we use DRA to benchmark various GPU sharing strategies including Multi-Instance GPUs, Multi-Process Service (MPS), and CUDA Time-Slicing. As part of this, we provide guidance on the class of applications that can benefit from each strategy as well as how to combine different strategies in order to achieve optimal performance. The talk concludes with a discussion of potential challenges, future enhancements, and a live demo showcasing the use of each GPU sharing strategy with real-world applications.
Speakers
avatar for Kevin Klues

Kevin Klues

Distinguished Engineer, NVIDIA
Kevin Klues is a distinguished engineer on the NVIDIA Cloud Native team. Kevin has been involved in the design and implementation of a number of Kubernetes technologies, including the Topology Manager, the Kubernetes stack for Multi-Instance GPUs, and Dynamic Resource Allocation (DRA... Read More →
avatar for Yuan Chen

Yuan Chen

Principal Software Engineer, NVIDIA
Yuan Chen is a Principal Software Engineer at NVIDIA, working on building NVIDIA GPU Cloud for AI. He served as a Staff Software Engineer at Apple from 2019 to 2024, where he contributed to the development of Apple's Kubernetes infrastructure. Yuan has been an active code contributor... Read More →
Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Hall DE
  AI + ML
  • Content Experience Level Any

4:30pm MST

The Maintainer Monologues - Sarah Christoff, Defense Unicorns; Karen Chu, Fermyon; Jason Hall, Chainguard; Scott Rigby, Independent; Ryan Nowak, Microsoft
Thursday November 14, 2024 4:30pm - 5:05pm MST
Are maintainers born? Or made? Made. They’re definitely made. Oftentimes it’s a combination of trial and error, luck, and lots of hard work. With a mixed group of first time and experienced maintainers, join us for a panel covering the origin stories and learnings of CNCF sandbox/incubating/graduated project maintainers. They’ll share their journeys as their projects evolved, and cover topics such as: - Project milestones (inception, MVP, & donation) - Learning the ecosystem - Blind spots - Navigating social dynamics (community building, getting more help, navigating challenges) - Work life balance / open source burnout With this knowledge, you’ll be better equipped to become the next open source contributor, maintainer, or creator of projects, ready to navigate the ecosystem.
Speakers
avatar for Karen Chu

Karen Chu

OSS Community PM
Karen Chu is an OSS Community PM. Having participated in the cloud native community since 2015, she is a CNCF Ambassador, Helm community manager/maintainer, emeritus Kubernetes Code of Conduct Committee member, meet-up organizer, and conference organizer. She has also worked on The... Read More →
avatar for Sarah Christoff

Sarah Christoff

Software Engineer, Defense Unicorns
Sarah is a software engineer at Defense Unicorns who loves making complex code more digestible. She is the self-proclaimed founder of the Leslie Lamport fan club. When she's not bugbusting, she is running her animal rescue and competing in triathlons. She believes code should be like... Read More →
avatar for Scott Rigby

Scott Rigby

Senior Cloud Solutions Architect, NASA / Navteca
Scott is an artist, engineer & dad, collaborating on a different kind of world. Into collective art, activism, therapy & open source nerdy stuff. Scott is a Cloud Native Ambassador, speaker, organizer of CNCF community events including the New York Kubernetes Meetup, and international... Read More →
avatar for Jason Hall

Jason Hall

Principal Software Engineer, Chainguard
Jason is a hopeless container image tooling nerd, living in Brooklyn with his wife, two children and (most importantly) lots of pizza.
avatar for Ryan Nowak

Ryan Nowak

Incubations Architect, Microsoft
Ryan is an architect working on open-source projects from the Azure CTO's office. He's passionate about designing software for humans, incubating risky ideas, releasing them in open-source so everyone can benefit. At Microsoft, he's had a 15+ year career building developer-centric... Read More →
Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 2 | 255 BC
  Cloud Native Experience
  • Content Experience Level Any

4:30pm MST

Elevating Kubeflow Spark Operator's Future: Best Practices and Enhancements - Vara Bonthu, AWS & Chaoran Yu, Apple Inc
Thursday November 14, 2024 4:30pm - 5:05pm MST
As Kubernetes becomes the leading platform for data processing, mastering the deployment and management of Apache Spark on it is crucial. In this presentation, you'll hear from the new maintainers of the Kubeflow Spark Operator project, who will provide an overview of scaling the Spark Operator on Kubernetes, emphasizing best practices to optimize performance and efficiency. Attendees will explore the migration of the Spark Operator repository from Google to Kubeflow, gaining insights into the roadmap and key takeaways. The session will cover strategies for achieving multi-tenancy, managing multiple Spark Operator instances for large-scale deployments, ensuring robust security, and performing seamless upgrades. Participants will learn advanced techniques to maximize their Spark on Kubernetes deployments, making their data processing pipelines more efficient, reliable, and secure. This talk is for Data, ML, DevOps, and MLOps pros to enhance their Spark on Kubernetes skills.
Speakers
avatar for Chaoran Yu

Chaoran Yu

Software Engineer, Apple Inc
Chaoran Yu is a software engineer at Apple. He leads a team that builds and operates a large-scale batch analytics data platform to meet the demanding requirements of data scientists and engineers. His passion lies in delivering the best value to stakeholders through best-of-breed... Read More →
avatar for Vara

Vara

Principal OSS Specialist, AWS
Vara Bonthu is a dedicated technology professional and Worldwide Tech Leader for Data on EKS, specializing in assisting AWS customers ranging from strategic accounts to diverse organizations. He is passionate about open-source technologies, Data Analytics, AI/ML, and Kubernetes, and... Read More →
Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Grand Ballroom GI
  Data Processing + Storage
  • Content Experience Level Any

4:30pm MST

Mastering OpenTelemetry Collector Configuration - Steve Flanders, Cisco
Thursday November 14, 2024 4:30pm - 5:05pm MST
Configuring the OpenTelemetry Collector can be a daunting task for both novices and seasoned professionals alike. Yet, mastering this crucial aspect is essential for unlocking the full potential of your observability stack. In this session, you will embark on a journey to gain the knowledge and skills needed to conquer common OpenTelemetry Collector configuration challenges. This session will draw from real-world experiences and best practices and provide live demonstrations to navigate the intricacies of OpenTelemetry Collector configuration. Whether you are a novice looking to get started or a seasoned veteran seeking to level up your skills, this session promises to empower you with the knowledge and confidence needed to properly and efficiently configure the OpenTelemetry Collector.
Speakers
avatar for Steve Flanders

Steve Flanders

Senior Director of Engineering, Cisco
Steve Flanders is a Senior Director of Engineering at Splunk (acquired by Cisco) responsible for the Observability Platform team, which includes contributions to the OpenTelemetry project. He was previously the Head of Product at Omnition (acquired by Splunk). Prior to Omnition, he... Read More →
Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Grand Ballroom HJ
  Observability
  • Content Experience Level Any

4:30pm MST

Tutorial: No Mess Rollouts with Gateway API: Leveraging Gateway API and Argo Rollouts for Progressive Delivery - Nina Polshakova & Lawrence Gadban, Solo.io
Thursday November 14, 2024 4:30pm - 6:00pm MST
Modern application delivery has many pitfalls: version transitions, traffic management, quality assurance, performance monitoring, and rollbacks. If you encounter an upgrade issue, what can you do? Mirror traffic? Debug locally? Roll back? Argo Rollouts lets teams gradually and safely deploy new versions of applications. A standard Gateway API enables any provider to support Argo Rollouts without provider-specific code. Argo Rollouts monitors Prometheus metrics to verify performance and reverts if success criteria aren’t met. This hands-on lab guides you on integrating Argo Rollouts with applications using different Gateway API implementations. Using Argo and Gateway API resources (HTTPRoute), you’ll learn to adjust traffic weights and gradually direct more traffic to a new version. We will also explore challenges in route delegation and role-based access control within Gateway API and potential extensions to address gaps in traffic shaping, access control, and debugging rollouts.
Speakers
avatar for Lawrence Gadban

Lawrence Gadban

Software Engineer, Solo.io
Lawrence is a Field Engineer at Solo.io where he works with organizations of all sizes to architect, adopt, and operationalize components such as Envoy proxy, API gateways, and service mesh. Most recently, he has been working directly with several organizations at various stages of... Read More →
avatar for Nina Polshakova

Nina Polshakova

Software Engineer, Solo.io
Nina is a software engineer working on multi-cluster Istio solutions on the Gloo Platform team at Solo.io. She is a CNCF Ambassador and has also been on several Kubernetes release teams. She led the Enhancements team for the 1.29 release and is the current lead for the Release Notes... Read More →
Thursday November 14, 2024 4:30pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom ACE
  Tutorials, Operations + Performance
  • Content Experience Level Any

5:25pm MST

Managing and Distributing AI Models Using OCI Standards and Harbor - Steven Zou & Steven Ren, VMware by Broadcom
Thursday November 14, 2024 5:25pm - 6:00pm MST
Just as container images are vital to cloud-native technology, AI models are crucial to AI technology. Effectively, conveniently, and safely managing, maintaining, and distributing AI models is critical for supporting workflows like AI model training, inference, and application deployment. This presentation explores AI model management based on OCI standards and the Harbor project. Standardizing AI model structures and characteristics using OCI specifications and extension mechanisms like OCI Reference to link datasets and dependencies. When large models require efficient loading or privacy considerations, model replication or proxy with upstream repositories like Hugging Face becomes essential. Enhancing model distribution security through signing, vulnerability scanning, and policy-based governance is often necessary. Additionally, introducing acceleration mechanisms such as P2P can significantly improve the efficiency of large model loading.
Speakers
avatar for Steven Ren

Steven Ren

Senior Manager, Broadcom
avatar for Steven Zou

Steven Zou

Staff II Engineer, VMware by Broadcom
Steven Zou is a senior engineer with years of experience in cloud computing and cloud-native technology. He is currently working as a Staff II engineer at VMware, focusing on cloud-native and Kubernetes-related platform services. In addition, he is a core maintainer of the CNCF open-source... Read More →
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Hall DE
  AI + ML
  • Content Experience Level Any

5:25pm MST

Navigating Failures in Pods with Devices: Challenges and Solutions - Sergey Kanzhelev, Google & Mrunal Patel, Red Hat
Thursday November 14, 2024 5:25pm - 6:00pm MST
Pods are no longer running with just CPU and Memory. We provision GPUs, network cards, request special placement of those devices and allocated memory. And the more efficient or effective you want your set up to be, the more complicated those device requirements are, the more chances you will hit an edge case Kubernetes has not accounted for yet. Come to the talk to learn from Node Maintainers about some of those shortcomings in Kubernetes. If you are only starting with AI/ML and devices, you will be interested to learn what to expect. If you have lots of experience, you may still learn new things. With the increased focus on AI/ML workloads, highlighting those scenarios is important. As Kubernetes plans to fix those problems, you can give feedback on what would work best for you.
Speakers
avatar for Sergey Kanzhelev

Sergey Kanzhelev

Staff Software Engineer, Google
Sergey Kanzhelev is a seasoned open source and cloud native maintainer working actively on Kubernetes. Sergey is serving as co-chair of SIG node. He is also one of the founders of OpenTelemetry. He is working on engineering aspect of software and its practical application. He is contributing... Read More →
avatar for Mrunal Patel

Mrunal Patel

Distinguished Engineer, Red Hat
Mrunal Patel is a Senior Principal Software Engineer at Red Hat working on containers for Openshift. He is a maintainer of runc/libcontainer and the OCI runtime specification. He started the CRI-O runtime. He is a SIG-Node chair and tech lead.
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 2 | 250
  AI + ML
  • Content Experience Level Any

5:25pm MST

Engaging the KServe Community, The Impact of Integrating a Solutions with Standardized CNCF Projects - Adam Tetelman, NVIDIA; Taneem Ibrahim, Red Hat; Johnu George, Nutanix; Tessa Pham, Bloomberg; Andreea Munteanu, Canonical
Thursday November 14, 2024 5:25pm - 6:00pm MST
Building a new solution and contemplating whether or not the OSS path is right for you? Wondering where to get started with a large cloud initiative and where the pitfalls may lie? Curious to know all the benefits waiting if your organization embraces a rich CNCF ecosystem? In this talk we will discuss the trade-offs between building a product on a full OSS platform vs. a DIY approach. We will delve into the issues of working with internal stakeholders or partners to embrace an OSS community and will cover the benefits and scaling factors that come when embracing open standards. We will use the recent integration of NVIDIA NIM into KServe as a case study and talk through the trials and tribulations that paid off in a win-win-win situation for our solutions, the OSS projects, and our users. We will cover Kubeflow, Knative, Istio, KServe, and wg-serve as well as a network of companies building enterprise K8s platforms and enterprise AI applications on top of these foundations.
Speakers
avatar for Andreea Munteanu

Andreea Munteanu

AI Product Manager, Canonical
I lead AI at Canonical, the publisher of Ubuntu and a provider of open source security, support and services. With a background in data science across industries like retail and telecommunications, I help enterprises make data-driven decisions with AI. I am passionate about amplifying... Read More →
avatar for Tessa Pham

Tessa Pham

Senior Software Engineer, Bloomberg
Tessa Pham is a Senior Software Engineer on Bloomberg's Cloud Native Compute Services organization. She works on building an inference platform for Bloomberg’s Data Science Platform, used by engineers and data scientists for training, deploying and serving ML models. Tessa is a... Read More →
avatar for Johnu George

Johnu George

Staff Engineer, Nutanix
Johnu George is a staff engineer at Nutanix with a background in distributed systems and large-scale hybrid data pipelines. He is an active in open-source and has steered several industry collaborations on projects like Kubeflow, Apache Mnemonic and Knative. His research interests... Read More →
avatar for Adam Tetelman

Adam Tetelman

Principal Product Architect, NVIDIA
Adam Tetelman is a principal architect at NVIDIA leading cloud native initiatives and CNCF engagements across the company; building inference platforms for NVIDIA AI Enterprise and DGX Cloud. He has degrees in computational robotics, computer & systems engineering, and cognitive science... Read More →
avatar for Taneem Ibrahim

Taneem Ibrahim

Senior Engineering Manager, Red Hat
Taneem is an engineering leader at Red Hat where his organization is responsible for building and delivering Model Serving, Responsible AI, and Model Registry solution in OpenShift AI.
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom GI
  Cloud Native Experience
  • Content Experience Level Any

5:25pm MST

Pick My Project! Lessons Learned from Interviewing 20+ End Users for Cloud Native Case Studies - Shedrack Akintayo & Bill Mulligan, Isovalent at Cisco
Thursday November 14, 2024 5:25pm - 6:00pm MST
Cloud native projects can promise the moon in their READMEs, but have you ever wondered what actually causes end users to adopt a project? Shedrack and Bill have interviewed over 20 companies in industries ranging from media to financial services about why they picked a project for their cloud native platform. In this talk, they will reveal what end users truly want when adopting cloud native technologies and what the forcing function was for each of them. You’ll hear firsthand accounts of the triumphs and tribulations faced by companies like Bloomberg, DigitalOcean, The New York Times, and more as well as the specific benefits these organizations are reaping, from enhanced security and observability to improved performance and cost savings. Additionally, they’ll teach other projects their process for creating impactful case studies. By the end, the audience will understand the real-world applications and advantages of cloud native technologies and why end users pick a project.
Speakers
avatar for Shedrack Akintayo

Shedrack Akintayo

Technical Marketing Engineer, Isovalent at Cisco
Shedrack Akintayo is a software engineer and technical writer based in London with six years of experience spanning Web Engineering, DevOps, Technical Writing, and Developer Relations. Shedrack works as a Technical Marketing Engineer at Cisco, via the Isovalent acquisition. He actively... Read More →
avatar for Bill Mulligan

Bill Mulligan

Community Pollinator, Isovalent at Cisco
Bill Mulligan is a cloud native pollinator and community builder. He has given talks, written articles, and appeared on podcasts on a wide range of topics around cloud native. While at CNCF he restarted the Kubernetes Community Day program. He is currently at Isovalent growing the... Read More →
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 2 | 255 BC
  Cloud Native Experience
  • Content Experience Level Any

5:25pm MST

Why Serverless Is Trending Again - Matt Butcher, Fermyon
Thursday November 14, 2024 5:25pm - 6:00pm MST
The idea of serverless computing really took off in 2016. But after an apparent peak in 2019, it seemed to be on the decline. Yet things took an about face again in 2022. The idea of serverless functions not only regained lost ground, but even now it is hitting new levels of interest. Why? In this session, we first get very clear about what “serverless” means as a design pattern. Then we dive into what it is good for, and mention a few of the major successes of serverless computing. From there, we look into the present and future of serverless technology, particularly inside of Kubernetes. WebAssembly is the runtime technology that enables serverless in Kubernetes to outperform Amazon Lambda and other competitors.
Speakers
avatar for Matt Butcher

Matt Butcher

CEO, Fermyon
Matt Butcher (CEO) is a founder of Fermyon. He is one of the original creators of Helm, Brigade, CNAB, OAM, Glide, and Krustlet. He has written or co-written many books, including "Learning Helm" and "Go in Practice." He is a co-creator of the "Illustrated Children’s Guide to Kubernetes... Read More →
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 2 | 251
  Cloud Native Novice
  • Content Experience Level Any

5:25pm MST

One Gateway API to Rule Them All (and in the Cluster Configure Them) - Flynn, Buoyant
Thursday November 14, 2024 5:25pm - 6:00pm MST
Ingress, egress, east-west, north-south… Kubernetes has always had a lot of different ways to talk about network traffic, each with its own concerns. For years, the possibility of unifying these kinds of configuration under a single API was a tantalizing but far-off possibility until Gateway API v0.8 took the first step of combining ingress and mesh configuration. Now Gateway API is taking the next step: bringing egress to the party. Join us for a look into how Linkerd is using these new egress capabilities to meet real user needs! We’ll start with a quick overview of what egress policy covers and what people need from it, how Gateway API makes egress work within its existing model, continue to cover how Linkerd implements it, and finish up with a live demo showing off a real-world example of egress management through the Gateway API. Welcome to the grand unified world!
Speakers
avatar for Flynn -

Flynn -

Tech Evangelist, Buoyant
Flynn is a tech evangelist at Buoyant, educating developers about Linkerd, Kubernetes, and cloud-native development in general. He has spent 40 years in software engineering (from the kernel up through distributed applications, with a common thread of communications and security throughout... Read More →
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | 155 EF
  Connectivity
  • Content Experience Level Any

5:25pm MST

Now You See Me: Tame MTTR with Real-Time Anomaly Detection - Kruthika Prasanna Simha & Raj Bhensadadia, Apple Inc.
Thursday November 14, 2024 5:25pm - 6:00pm MST
Picture this! You are running an application on a Kubernetes cluster & you notice that your nodes have been restarting and your users are noticing that your application is unreachable. As an engineer, you want to identify these failures in real-time & differentiate these from known states, at scale. But we know, static thresholds fail for dynamic metrics! This session explores real-time anomaly detection for cloud-native systems. We'll show you how to reduce MTTR and mean time to analyse by proactively identifying abnormal application behavior using statistical & machine learning algorithms on time series data from Prometheus. Learn to pinpoint issues, identify missing instrumentation, and visualize anomalies using Grafana. This session equips you to achieve faster issue resolution and maintain optimal application health. We'll demo practical techniques for metrics selection, anomaly detection and proactive issue identification to manage your cloud-native applications.
Speakers
avatar for Raj

Raj

Machine Learning Engineer, Apple Inc.
Raj Bhensadadia, a machine learning engineer with a passion for leveraging ML technologies to enhance monitoring and analysis of large scale systems and ensure robustness and performance of infrastructure and services.
avatar for Kruthika Prasanna Simha

Kruthika Prasanna Simha

Software Engineer, Apple Inc.
Kruthika is a software engineer at Apple specializing in building ML enabled observability solutions. She holds a Masters in Computer Engineering and has specialized in Machine Learning. In her free time, she likes to dabble with Jupyter Notebooks for running experiments with data... Read More →
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom HJ
  Observability
  • Content Experience Level Any

5:25pm MST

How Google Build Its New Cloud on Top of Kubernetes - Saad Ali, Jie Yu & Prashanth Venugopal, Google
Thursday November 14, 2024 5:25pm - 6:00pm MST
“Build a new air-gapped cloud with open source technologies” – this is what a small team at Google was tasked with in late 2021. The team delivered a private cloud platform, complete with managed VMs, databases, AI services, and more. Moreover, it did so by leveraging a number of CNCF technologies, including Kubernetes, Istio, etc. We’ll share the potential of these technologies, as well as their limitations, by explaining how they were used to build a scalable, reliable, and secure cloud platform. We’ll discuss how to implement cloud tenancy concepts, enforce isolation among tenants, and how we built a cloud API leveraging k8s API machinery and service mesh. A key innovation in building the private cloud platform was the “Kubernetes Defined Networking” (KDN) stack we created: by leveraging existing k8s networking features (e.g. load balancer, etc.) along with a few key enhancements, we implemented most of the traditional cloud SDN concepts, like VPC, firewall, VM support, etc.
Speakers
avatar for Saad Ali

Saad Ali

Senior Engineering Manager, Google
Saad Ali is a Senior Engineering Manager at Google. He works on Google Distributed Cloud and the open-source Kubernetes project. He led the development of the Kubernetes storage and volume subsystem. He serves as a lead of the Kubernetes Storage SIG, has served as member of the CNCF... Read More →
avatar for prashanth venugopal

prashanth venugopal

Kubernetes Networking Lead, Google
Prashanth has an almost two decades long career, across various networking market segments. In his current role as the lead architect of Google's Kubernetes networking stack, he helps drive the networking stack's evolution for Google Kubernetes Engine (for the Public Cloud Market... Read More →
avatar for Jie Yu

Jie Yu

Principal Software Engineer, Google
Jie Yu is a currently a Principal Software Engineer at Google. Jie is currently working on Google Distributed Cloud, and is the leading architect for the product. Prior to Google, Jie was a Chief Architect at Mesosphere (D2IQ), and worked at Twitter. Jie joined Kubernetes community... Read More →
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom BDF
  Platform Engineering
  • Content Experience Level Any
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date - 
  • 🚨 Contribfest
  • 🪧 Poster Sessions
  • AI + ML
  • Breaks
  • ⚡ Lightning Talks
  • Cloud Native Experience
  • Cloud Native Novice
  • CNCF-hosted Co-located Events
  • Connectivity
  • Data Processing + Storage
  • Emerging + Advanced
  • Experiences
  • Keynote Sessions
  • Maintainer Track
  • Observability
  • Operations + Performance
  • Platform Engineering
  • Project Opportunties
  • Registration
  • SDLC
  • Security
  • Solutions Showcase
  • Sponsor-hosted Co-located Event
  • Tutorials