KubeCon + CloudNativeCon North America 2024: Full Schedule

In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis.

11:15am MST

Unlocking Cost Savings & New Possibilities: Your Guide to Prometheus Remote Write 2.0 - Callum Styan, Grafana Labs & Bartłomiej Płotka, Google

Wednesday November 13, 2024 11:15am - 11:50am MST

Salt Palace | Level 1 | Grand Ballroom B

Prometheus Remote Write is the protocol used to send Prometheus metrics from Prometheus or any other metric source to compatible remote storage endpoints such as Thanos and Cortex. Remote Write is generally used for metric long term storage, centralization, and cloud services. It also enables users to run Prometheus in an agent mode, reducing local storage requirements. Welcome to Remote Write 2.0! In this talk, Bartek and Callum, Prometheus maintainers and RW2.0 spec. co-authors, will introduce you to the next iteration of the popular protocol which adds more functionality while cutting your egress costs up to 60%, and keeps the previous versions easy-to-implement stateless design! The audience will learn what's changed in the second version of Remote Write, what it unlocks, and how easy it is to update or adopt. Finally, the speakers will share the latest benchmarks and differences with the common alternatives.

Speakers

Bartłomiej Płotka

Senior Software Engineer, Google

Bartek Płotka is a Senior Software Engineer at Google. SWE by heart, with an SRE background, currently working on Cloud Observability. Previously Principal Software Engineer at Red Hat. Author of "Efficient Go" book with O'Reilly. As the co-founder of the CNCF Thanos project and... Read More →

Callum Styan

Senior Software Engineer, Grafana Labs

Callum is a software engineer from Vancouver, Prometheus Team Member/Maintainer, and currently works on Loki at Grafana Labs.

KubeCon NA 2024 Unlocking Cost Savings & New Possibilities Your Guide to Prometheus Remote Write 2.0 pdf

Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Advanced

12:10pm MST

Towards Zero Change Incidents: Intuit's Strategy for Implementing AI-Driven Progressive Delivery - Avik Basu & Saravanan Balasubramanian, Intuit

Wednesday November 13, 2024 12:10pm - 12:45pm MST

Salt Palace | Level 1 | Grand Ballroom B

At Intuit, rapid development is essential for swift feature updates and fixes. Yet, 33% of last year's incidents were due to new deployments, highlighting the need for a progressive delivery system with automated rollback capabilities. However, traditional static thresholds fall short for Intuit's ~2500 services, each with unique patterns across multiple key performance metrics. To tackle this, Intuit has implemented an ML-based progressive delivery system that utilizes Prometheus to monitor multivariate metrics, offering a comprehensive view of application health and performance during deployments. The talk will present a case study application, identify its critical metrics, and showcase how Intuit leverages Numaproj and its out-of-the-box ML models to generate anomaly scores during deployments using Argo Rollouts. This strategy enables Intuit to quickly identify and address issues using AIOps techniques, ensuring a smooth and dependable customer experience.

Speakers

Saravanan Balasubramanian

Senior Staff Software Engineer, Intuit

Bala is the lead engineer and maintainer in Argo workflow project , Intuit- leading AIOps, K8s Abstraction and Argo workflow project for open source community and Intuit.

Avik Basu

Staff Machine Learning Engineer, Intuit

Avik is a data scientist and machine learning engineer with expertise across multiple ML domains such as computer vision, natural language understanding, reinforcement learning, and time series. Currently, he leads the machine learning initiatives for open-source AIOps at Intuit... Read More →

AIOps Kubecon 2024 pdf

Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Beginner

2:30pm MST

Unifying Observability: Correlating Metrics, Traces, and Logs with Exemplars and OpenTelemetry - Kruthika Prasanna Simha & Charlie Le, Apple

Wednesday November 13, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | Grand Ballroom B

In modern distributed systems, observability is key to understanding application performance and behavior. While metrics, traces, and logs each provide valuable insights, their true power is realized when they are correlated. This talk will dive into the practical benefits and implementation of correlating these signals with exemplars using the OpenTelemetry SDK and Collector, and showcase the results in Grafana. Attendees will learn how to leverage OpenTelemetry to create exemplars which will allow them to navigate from either logs or metrics to their traces.

Speakers

Kruthika Prasanna Simha

Senior Software Engineer, Apple

Kruthika is a software engineer at Apple specializing in building ML enabled observability solutions. She holds a Masters in Computer Engineering and has specialized in Machine Learning. In her free time, she likes to dabble with Jupyter Notebooks for running experiments with data... Read More →

Charlie Le

Senior Software Engineer, Apple

Charlie is a software engineer at Apple, specializing in building and scaling cloud native observability solutions and infrastructure. Deeply inspired by the collaborative spirit of open source, he actively contributes to projects like Cortex and OpenTelemetry, shaping the future... Read More →

Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Intermediate

3:25pm MST

Using OpenTelemetry for Deep Observability Within Messaging Queues - Shivanshu Raj Shrivastava & Ekansh Gupta, SigNoz

Wednesday November 13, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 1 | Grand Ballroom B

The recent changes in OpenTelemetry have made new semantic conventions and changes in agents to better monitor messaging queues such as Kafka, RabbitMQ, and Amazon SQS, etc. In this session, we'll discuss how those semantic conventions are standardizing the telemetry collected from producers, consumers, and the messaging queues, and how in-depth observability can be achieved by correlating producer-to-consumer spans with the metrics collected from Kafka. Additionally, We will demonstrate how the Kafka Java client side instrumentation enabled and JMX metrics collected from Kafka how OpenTelemetry instrumentation can help for metrics to trace and trace to metrics correlation and spot reasons for anomalies like increased consumer lag, partition failures, time taken by messaging queues. This will also help in giving the corresponding traces in time that can help end users to better delve into their infrastructures and optimize their asynchronous applications.

Speakers

Ekansh Gupta

SDE, SigNoz

Ekansh is a Software Development Engineer with SigNoz, with active involvement in various open-source and cloud native communities for upwards two years now. He was previously an SDE Intern at SteamLabs. He is also a speaker for a couple of talks at PyCon, KubeCon and MozFests. Ekansh... Read More →

Shivanshu Raj Shrivastava

Founding Engineer, SigNoz

Shivanshu is a Founding Engineer at SigNoz, working on building an OTeL native observability product. He has a keen interest in deep tech and OSS. He is a CNCF ambassador and a member of CNCF projects like OTeL, k8s, and Istio. He has got the opportunity to mentor contributors in... Read More →

Using OTel for Deep observability within messaging queue pdf

Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Intermediate

4:30pm MST

Watching the Watchers: How We Do Continuous Reliability at Grafana Labs - Nicole van der Hoeven, Grafana Labs

Wednesday November 13, 2024 4:30pm - 5:05pm MST

Salt Palace | Level 1 | Grand Ballroom B

Nothing is foolproof. Everything fails eventually. Observability tools help predict and lessen the impact of those failures, as the watchers of your software systems. But who watches the watchers? At Grafana Labs, we're not immune to production incidents. Just like any company, we still sometimes move too quickly. We run complex, microservices-based systems ourselves, so we have to eat our own dogfood on a daily basis. In this talk, I reveal: - how we solved a years-long mystery that cost us $100,000+ - how we got our internal Mimir clusters to reliably hold 1.3 billion time series for metrics - what we've had to do to scale our Loki clusters to handle 324 TB of logs a day - what our Grafana dashboards to monitor Grafana Cloud look like Sometimes, it's easier to learn from failures in observability than from successes. This talk is a confession of some of our worst sins as well as a realistic look under the hood at how we're improving the continuous reliability of our stack.

Speakers

Nicole van der Hoeven

Senior Developer Advocate, Grafana Labs

Nicole is a Senior Developer Advocate at Grafana Labs and a performance engineer with over a decade of experience in breaking software and learning to build it back up again. She has lived in the Philippines, the US, Australia, the Netherlands, and Portugal, helping teams all over... Read More →

Wednesday November 13, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Any

5:25pm MST

The OTTL Cookbook: A Collection of Solutions to Common Problems - Tyler Helmuth, Honeycomb & Evan Bradley, Dynatrace

Wednesday November 13, 2024 5:25pm - 6:00pm MST

Salt Palace | Level 1 | Grand Ballroom B

Is your telemetry missing key attributes? Maybe there are details in your log bodies you’d rather have as attributes. It is common to find yourself in situations where your data doesn't look how you expect: it's too large, the wrong shape, or doesn't have everything you want. The OpenTelemetry Collector uses the OpenTelemetry Transformation Language (OTTL) to solve these problems. OTTL enables telemetry transformations based on any field of the payload, utilizing functions to execute the changes. In this session, Tyler and Evan will go over a brief intro to OTTL and then cover example after example of situations where you can use OTTL to solve processing problems in the Collector, like setting attributes, or defining an entire OTLP log record from a kubernetes event. Get ready with situations of your own, as we’ll save time at the end to try writing OTTL statements live on stage for your transformation or filtering issues so we can demonstrate how flexible OTTL truly is.

Speakers

Tyler Helmuth

Staff Software Engineer, Honeycomb

Tyler is a Sr. Software Engineer at Honeycomb with a passion for observability and helping users start their observability journey. He is a maintainer for the OpenTelemetry Collector and OTel Helm Charts, and an active contributor to other OTel repositories. While not its originator... Read More →

Evan Bradley

Senior Software Engineer, Dynatrace

Evan helps maintain the OpenTelemetry Collector, where he is also a primary contributor to the OpenTelemetry Transformation Language (OTTL) and the OpenTelemetry Agent Management Protocol (OpAMP) Collector components. Evan has a background in developing DevOps tooling and observability... Read More →

OTTL Cookbook pdf

examples yaml

Wednesday November 13, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Any

11:00am MST

Lessons Learned Adopting OpenTelemetry at Scale - Alex Arnell, Heroku / Salesforce

Thursday November 14, 2024 11:00am - 11:35am MST

Salt Palace | Level 1 | Grand Ballroom B

OpenTelemetry makes bold promises to unlock and unleash your observability, providing you with open standards, no vendor lock-in and interoperability with just about everything. You believe that your organization could really benefit from an uplift to modern observability. It would be easy to adopt if you were was starting out fresh, but let’s face it, most organizations have sprawling codebases and architectures. Decisions, infrastructure and often engineers that have been in place for decades. How do you even get started? This Heroku case study dives into our OpenTelemetry journey where you'll discover strategies on adoption, how to deal with internal resistance, and technical guidance on rolling out the change. Learn from our missteps and what we wished we had done differently. You’ll even see how a bit of luck can help drive adoption over the finish line. This session will equip you to navigate OpenTelemetry adoption in the most entrenched environments.

Speakers

Alex Arnell

Principal Engineer, Heroku / Salesforce

Alex Arnell is a Principal Engineer at Heroku / Salesforce with over two decades of software development experience. Alex has spent the last decade specializing in telemetry and observability systems. Alex is the lead engineer of the Telemetry team at Heroku, responsible for the collection... Read More →

Adopting OTel KCNA24.pptx pdf

Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Any

11:55am MST

Cognitive and Self-Adaptive System for Effective Distributed-Tracing in Applications - Mitul Tandon & Akash Gusain

Thursday November 14, 2024 11:55am - 12:30pm MST

Salt Palace | Level 1 | Grand Ballroom B

In response to challenges of limited trace capture in dynamic API tracing systems, the solution leverages Machine Learning and Cognitive approach for unbiased trace collection. Unlike existing implementations with a skewed distribution(~5%) towards normal traces, our self-adaptive system dynamically learns to prioritise and capture diverse traces, crucial for effective diagnosis of API failures and performance issues. This innovative approach significantly enhances the SREs ability to triage complex issues, leading to a game-changing reduction in Mean Time to Resolve (MTTR). The Adaptive Sampling approach analyses existing system traces and autonomously adjusts the sampling rate, eliminating manual configs. This ML-based solution outcome includes streamlined trace metric analysis, enhanced reliability work efficiency, and considerable infrastructure cost reduction through targeted trace collection, ultimately making a significant impact on operational effectiveness & reliability

Speakers

Akash Gusain

Software Engineer, Bito

Akash Gusain is a software engineer with over two years of experience in designing and deploying cloud-native applications. At VMware, he contributed to the development of scalable and robust cloud solutions, showcasing his ability to learn and adapt quickly to new technologies while... Read More →

Mitul Tandon

Software Engineer

A DevOps/SRE Engineer at VMware with 2+ years of experience with working on distributed systems and containerised applications.

KubeCon2024 pdf

Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Any

2:30pm MST

Low-Overhead, Zero-Instrumentation, Continuous Profiling for OpenTelemetry - Christos Kalkanis, Elastic

Thursday November 14, 2024 2:30pm - 3:05pm MST

Salt Palace | Level 1 | Grand Ballroom B

Elastic has recently donated its whole-system continuous profiling agent to OpenTelemetry. After a thorough community review process, the donation was enthusiastically accepted. Leveraging eBPF, the profiling agent provides unprecedented visibility into the runtime behavior of all applications: it builds stacktraces that go from the kernel to userspace native code, all the way into code running into higher level runtimes, enabling users to identify performance regressions, reduce wasteful computations, and debug complex issues faster. This session will explore: - Benefits of eBPF-based continuous profiling compared to conventional approaches that rely on application instrumentation - How the agent builds profiles that seamlessly span kernel, native code and most widely used application runtimes - Integration with the rest of OpenTelemetry: OTLP and Collector

Speakers

Christos Kalkanis

Principal Software Engineer, Elastic

Christos is a principal engineer at Elastic, a maintainer for the OpenTelemetry Profiling SIG and a co-author of the donated OpenTelemetry profiling agent previously known as the Elastic Universal Profiling agent. After more than a decade of focusing on cybersecurity offense he moved... Read More →

Continuous Profiling OTel Christos Kalkanis pdf

Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Intermediate

3:25pm MST

Measuring All the Costs with OpenCost Plugins - Alex Meijer, Stackwatch

Thursday November 14, 2024 3:25pm - 4:00pm MST

Salt Palace | Level 1 | Grand Ballroom B

The CNCF OpenCost project is approaching 5,000 stars on GitHub and has become one of the most popular cost monitoring systems in use. Originally focused on cloud provider and Kubernetes cost monitoring, OpenCost expanded its scope in May 2024 by launching OpenCost Plugins with Datadog as the first reference implementation. These plugins allow users to measure and visualize virtually any cost in OpenCost, without writing a single line of OpenCost code. Alex Meijer, OpenCost and OpenCost Plugins maintainer, will speak on how the OpenCost Plugins ecosystem works and will dive into the use of the open-source FOCUS spec in OpenCost, which is the key to being able to measure nearly any cost. A plugin-enabled OpenCost deployment will be demoed, with an external cost (Datadog) visualized alongside the traditional Kubernetes and cloud provider costs. Alex will also share how to get started with plugins so that users can start analyzing the costs of whatever matters to their unique use case!

Speakers

Alex Meijer

Staff Software Engineer, Stackwatch

Alex Meijer has been working with Kubernetes for his entire career, being at various times a user, operator, and currently as someone working to help others use Kubernetes better. He has served in startups ranging in size from 5-90 people. Alex contributes to the Opencost project... Read More →

Measuring All The Costs w OC Plugins pdf

Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Intermediate

4:30pm MST

Mastering OpenTelemetry Collector Configuration - Steve Flanders, Cisco

Thursday November 14, 2024 4:30pm - 5:05pm MST

Salt Palace | Level 1 | Grand Ballroom B

Configuring the OpenTelemetry Collector can be a daunting task for both novices and seasoned professionals alike. Yet, mastering this crucial aspect is essential for unlocking the full potential of your observability stack. In this session, you will embark on a journey to gain the knowledge and skills needed to conquer common OpenTelemetry Collector configuration challenges. This session will draw from real-world experiences and best practices and provide live demonstrations to navigate the intricacies of OpenTelemetry Collector configuration. Whether you are a novice looking to get started or a seasoned veteran seeking to level up your skills, this session promises to empower you with the knowledge and confidence needed to properly and efficiently configure the OpenTelemetry Collector.

Speakers

Steve Flanders

Senior Director of Engineering, Splunk

Steve Flanders is a Senior Director of Engineering at Splunk (acquired by Cisco) responsible for the Observability Platform team, which includes contributions to the OpenTelemetry project. He was previously the Head of Product at Omnition (acquired by Splunk). Prior to Omnition, he... Read More →

KubeCon NA 2024 Mastering OpenTelemetry Collector Configuration pdf

Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Any

5:25pm MST

Now You See Me: Tame MTTR with Real-Time Anomaly Detection - Kruthika Prasanna Simha & Raj Bhensadadia, Apple Inc.

Thursday November 14, 2024 5:25pm - 6:00pm MST

Salt Palace | Level 1 | Grand Ballroom B

Picture this! You are running an application on a Kubernetes cluster & you notice that your nodes have been restarting and your users are noticing that your application is unreachable. As an engineer, you want to identify these failures in real-time & differentiate these from known states, at scale. But we know, static thresholds fail for dynamic metrics! This session explores real-time anomaly detection for cloud-native systems. We'll show you how to reduce MTTR and mean time to analyse by proactively identifying abnormal application behavior using statistical & machine learning algorithms on time series data from Prometheus. Learn to pinpoint issues, identify missing instrumentation, and visualize anomalies using Grafana. This session equips you to achieve faster issue resolution and maintain optimal application health. We'll demo practical techniques for metrics selection, anomaly detection and proactive issue identification to manage your cloud-native applications.

Speakers

Raj

Machine Learning Engineer, Apple Inc.

Raj Bhensadadia, a machine learning engineer with a passion for leveraging ML technologies to enhance monitoring and analysis of large scale systems and ensure robustness and performance of infrastructure and services.

Kruthika Prasanna Simha

Senior Software Engineer, Apple

Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Any

11:00am MST

Shopify’s Open Source Approach to Network Monitoring with eBPF, Vector and ClickHouse - Sebastian Rabenhorst & Matt Franklin, Shopify

Friday November 15, 2024 11:00am - 11:35am MST

Salt Palace | Level 1 | Grand Ballroom B

At Shopify, we’ve successfully implemented a scalable, open-source network monitoring solution for the cloud. In this talk, we will demonstrate how we built a network monitoring solution leveraging eBPF, Vector, ClickHouse, and Grafana. This solution enables us to monitor over 30 million network flow, DNS and other networking-related events per second at the container level for thousands of services across hundreds of Kubernetes clusters in the Shopify Cloud. We will also share the lessons we learned regarding these technologies and provide insights on how you can implement your own purely open-source monitoring solution capable of handling millions of events per second.

Speakers

Matt Franklin

Shopify

Sebastian

Senior Production Engineer, Shopify

Sebastian is a Senior Production Engineer at Shopify mostly working on monitoring and logging solutions as part of the observability team.

Shopify’s Open Source Approach to Network Monitoring with eBPF, Vector and ClickHouse pdf

Friday November 15, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom B

Observability

Content Experience Level Beginner