Loading…
Attending this event?
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
Salt Palace | Level 1 | Grand Ballroom GI clear filter
arrow_back View All Dates
Thursday, November 14
 

11:00am MST

Cooperative Scheduling for Stateful Systems - Michael Youssef & Laxman Prabhu, LinkedIn
Thursday November 14, 2024 11:00am - 11:35am MST
At LinkedIn, we develop many stateful systems and run them on tens of thousands of machines in our datacenters. As we move LinkedIn’s infrastructure to Kubernetes, we quickly realized that StatefulSet was not going to be enough to support running critical stateful systems and satisfy the safety and durability goals of the teams developing stateful systems. We've built first-class support for running stateful workloads on bare metal where the stateful systems can coordinate with Kubernetes to stay available and ensure durability. With our design, we support planned/unplanned maintenance, swapping out hardware, and allow stateful systems to customize their rollout policies natively on Kubernetes. This talk covers: - Our LiStatefulSet API. - How we allow apps to customize safety checks and deployment policies via an ApplicationClusterManager, our pluggable policy engine. - The ApplicationClusterManager protocol that allows coordination of the lifecycle of workloads with Kubernetes.
Speakers
LP

Laxman Prabhu

Staff Software Engineer, Systems Infrastructure, LinkedIn
avatar for Michael Youssef

Michael Youssef

Staff Software Engineer, LinkedIn
Michael is a Staff Software Engineer at LinkedIn, currently making management and deployment of sharded systems a touch less painful on Kubernetes. In his free time he enjoys spending time with his cat, inhaling chocolate, and playing tennis.
Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom GI
  Data Processing + Storage

11:55am MST

Database DevOps: CD for Stateful Applications - Stephen Atwell, Harness.io & Christopher Crow, Pure Storage
Thursday November 14, 2024 11:55am - 12:30pm MST
Running stateful applications on Kubernetes can provide many of the same advantages as stateless applications. In this talk, Stephen and Chris will share some thoughts on managing stateful applications as part of a CD Pipeline so that applications - and the application's data - can be versioned and deployed safely and repeatedly. This talk will discuss managing persistent data within kubernetes, as well as managing structural changes to a database as part of a CD process. With Kubernetes and liquibase, we can provide something better than before: A more testable, repeatable, and open way to deploy stateful applications. This talk features a practical demo of how CD tooling can empower users to automate data migrations within Kubernetes.
Speakers
avatar for Christopher Crow

Christopher Crow

Technical Marketing Engineer, Pure Storage
Chris Crow works as a cloud architect at Portworx. He has worked previously as an education, systems administrator. He is a lifelong open-source enthusiast.
avatar for Stephen Atwell

Stephen Atwell

Principal Product Manager, Harness.io
With over 26 years of technology experience, Stephen focuses on solving problems encountered in his previous roles. He has worn hats ranging from network administrator, to database administrator, to software engineer, to product manager. Outside of work, Stephen develops open source... Read More →
Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom GI
  Data Processing + Storage

2:30pm MST

Distributed Cache Empowers AI/ML Workloads on Kubernetes Cluster - Yuichiro Ueno & Toru Komatsu, Preferred Networks, Inc.
Thursday November 14, 2024 2:30pm - 3:05pm MST
Today, storage technologies play a fundamental role in the realm of AI/ML. Read performance is essential for swiftly moving datasets from storage to AI accelerators. However, the rapid enhancement of AI accelerators' performance often outpaces I/O, bottlenecks the training. Due to the scheduling of pods in Kubernetes across multiple nodes, utilizing node-local storage effectively presents a challenge. To address this, we introduce a distributed cache system built atop node-local storages, designed for AI/ML workloads. This cache system has been successfully deployed on our on-premise 1024+ GPUs Kubernetes cluster within a multi-tenancy environment. Throughout our two-year experience operating this cache system, we have overcome numerous hurdles across several components, including the I/O library, load balancers, and the storage backend. We will share the challenges and the solutions we implemented, leading to a system delivering 50+ GB/s throughput and less than 2ms latency.
Speakers
avatar for Toru Komatsu

Toru Komatsu

Engineer, Preferred Networks, Inc.
Toru is a machine learning platform engineer at Preferred Networks in Japan. He is the creator and lead developer of youki, an OCI Runtime in Rust, and a maintainer of the OCI Runtime Specification. Additionally, he serves as a reviewer for runwasi and is involved in developing a world that utilizes containers and Wasm. Additionally, he is a member of the Kubernetes org and is especially interested in... Read More →
avatar for Yuichiro Ueno

Yuichiro Ueno

Engineer, Preferred Networks, Inc.
He is currently a machine learning platform engineer at Preferred Networks in Japan. His research and engineering interests include a range of high-performance computing (distributed deep learning, networking/RDMA, and storage technologies), performance engineering, and Kubernete... Read More →
Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom GI
  Data Processing + Storage

3:25pm MST

Elastic Data Streaming: Autoscaling Apache Kafka - Jakub Scholz, Red Hat
Thursday November 14, 2024 3:25pm - 4:00pm MST
Autoscaling is an important part of modern cloud-native architecture. It allows applications to handle a big load at peak times while helping to optimize costs and make deployments more green and sustainable at the same time. Apache Kafka is well known for its scalability. It can grow with your project from a small cluster up to hundreds of brokers. But it was not very elastic for a long time and using dynamic autoscaling with it was very hard. This talk will guide the attendees through the main challenges of auto-scaling Apache Kafka on Kubernetes. It will show how these challenges can be solved with the help of new features added recently in Strimzi and Apache Kafka projects such as auto-rebalancing, node pools, or tiered storage. And it will help the users get started with the auto-scaling of Apache Kafka.
Speakers
avatar for Jakub Scholz

Jakub Scholz

Senior Principal Software Engineer, Red Hat
Jakub works at Red Hat as Senior Principal Software Engineer. He has long-term experience with messaging and currently focuses mainly on Apache Kafka and its integration with Kubernetes. He is one of the maintainers of the Strimzi project which provides tooling for running Apache... Read More →
Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom GI
  Data Processing + Storage

4:30pm MST

Elevating Kubeflow Spark Operator's Future: Best Practices and Enhancements - Vara Bonthu, AWS & Chaoran Yu, Apple Inc
Thursday November 14, 2024 4:30pm - 5:05pm MST
As Kubernetes becomes the leading platform for data processing, mastering the deployment and management of Apache Spark on it is crucial. In this presentation, you'll hear from the new maintainers of the Kubeflow Spark Operator project, who will provide an overview of scaling the Spark Operator on Kubernetes, emphasizing best practices to optimize performance and efficiency. Attendees will explore the migration of the Spark Operator repository from Google to Kubeflow, gaining insights into the roadmap and key takeaways. The session will cover strategies for achieving multi-tenancy, managing multiple Spark Operator instances for large-scale deployments, ensuring robust security, and performing seamless upgrades. Participants will learn advanced techniques to maximize their Spark on Kubernetes deployments, making their data processing pipelines more efficient, reliable, and secure. This talk is for Data, ML, DevOps, and MLOps pros to enhance their Spark on Kubernetes skills.
Speakers
avatar for Chaoran Yu

Chaoran Yu

Software Engineer, Apple Inc
Chaoran Yu is a software engineer at Apple. He leads a team that builds and operates a large-scale batch analytics data platform to meet the demanding requirements of data scientists and engineers. His passion lies in delivering the best value to stakeholders through best-of-breed... Read More →
avatar for Vara

Vara

Principal OSS Specialist, AWS
Vara Bonthu is a dedicated technology professional and Worldwide Tech Leader for Data on EKS, specializing in assisting AWS customers ranging from strategic accounts to diverse organizations. He is passionate about open-source technologies, Data Analytics, AI/ML, and Kubernetes, and... Read More →
Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Grand Ballroom GI
  Data Processing + Storage
  • Content Experience Level Any

5:25pm MST

Engaging the KServe Community, The Impact of Integrating a Solutions with Standardized CNCF Projects - Adam Tetelman, NVIDIA; Taneem Ibrahim, Red Hat; Johnu George, Nutanix; Tessa Pham, Bloomberg; Andreea Munteanu, Canonical
Thursday November 14, 2024 5:25pm - 6:00pm MST
Building a new solution and contemplating whether or not the OSS path is right for you? Wondering where to get started with a large cloud initiative and where the pitfalls may lie? Curious to know all the benefits waiting if your organization embraces a rich CNCF ecosystem? In this talk we will discuss the trade-offs between building a product on a full OSS platform vs. a DIY approach. We will delve into the issues of working with internal stakeholders or partners to embrace an OSS community and will cover the benefits and scaling factors that come when embracing open standards. We will use the recent integration of NVIDIA NIM into KServe as a case study and talk through the trials and tribulations that paid off in a win-win-win situation for our solutions, the OSS projects, and our users. We will cover Kubeflow, Knative, Istio, KServe, and wg-serve as well as a network of companies building enterprise K8s platforms and enterprise AI applications on top of these foundations.
Speakers
avatar for Andreea Munteanu

Andreea Munteanu

AI Product Manager, Canonical
I lead AI at Canonical, the publisher of Ubuntu and a provider of open source security, support and services. With a background in data science across industries like retail and telecommunications, I help enterprises make data-driven decisions with AI. I am passionate about amplifying... Read More →
avatar for Tessa Pham

Tessa Pham

Senior Software Engineer, Bloomberg
Tessa Pham is a Senior Software Engineer on Bloomberg's Cloud Native Compute Services organization. She works on building an inference platform for Bloomberg’s Data Science Platform, used by engineers and data scientists for training, deploying and serving ML models. Tessa is a... Read More →
avatar for Johnu George

Johnu George

Staff Engineer, Nutanix
Johnu George is a staff engineer at Nutanix with a background in distributed systems and large-scale hybrid data pipelines. He is an active in open-source and has steered several industry collaborations on projects like Kubeflow, Apache Mnemonic and Knative. His research interests... Read More →
avatar for Adam Tetelman

Adam Tetelman

Principal Product Architect, NVIDIA
Adam Tetelman is a principal architect at NVIDIA leading cloud native initiatives and CNCF engagements across the company; building inference platforms for NVIDIA AI Enterprise and DGX Cloud. He has degrees in computational robotics, computer & systems engineering, and cognitive science... Read More →
avatar for Taneem Ibrahim

Taneem Ibrahim

Senior Engineering Manager, Red Hat
Taneem is an engineering leader at Red Hat where his organization is responsible for building and delivering Model Serving, Responsible AI, and Model Registry solution in OpenShift AI.
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom GI
  Cloud Native Experience
  • Content Experience Level Any
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
Filtered by Date - 
  • 🚨 Contribfest
  • 🪧 Poster Sessions
  • AI + ML
  • Breaks
  • ⚡ Lightning Talks
  • Cloud Native Experience
  • Cloud Native Novice
  • CNCF-hosted Co-located Events
  • Connectivity
  • Data Processing + Storage
  • Emerging + Advanced
  • Experiences
  • Keynote Sessions
  • Maintainer Track
  • Observability
  • Operations + Performance
  • Platform Engineering
  • Project Opportunties
  • Registration
  • SDLC
  • Security
  • Solutions Showcase
  • Sponsor-hosted Co-located Event
  • Tutorials