Loading…
In-person
November 12-15
Learn More and Register to Attend

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for KubeCon + CloudNativeCon North America 2024 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

Please note: This schedule is automatically displayed in Mountain Standard Time (UTC -7). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date." The schedule is subject to change and session seating is available on a first-come, first-served basis. 
or to bookmark your favorites and sync them to your phone or calendar.
strong>Observability [clear filter]
Wednesday, November 13
 

11:15am MST

Unlocking Cost Savings & New Possibilities: Your Guide to Prometheus Remote Write 2.0 - Callum Styan, Grafana Labs & Bartłomiej Płotka, Google
Wednesday November 13, 2024 11:15am - 11:50am MST
Prometheus Remote Write is the protocol used to send Prometheus metrics from Prometheus or any other metric source to compatible remote storage endpoints such as Thanos and Cortex. Remote Write is generally used for metric long term storage, centralization, and cloud services. It also enables users to run Prometheus in an agent mode, reducing local storage requirements. Welcome to Remote Write 2.0! In this talk, Bartek and Callum, Prometheus maintainers and RW2.0 spec. co-authors, will introduce you to the next iteration of the popular protocol which adds more functionality while cutting your egress costs up to 60%, and keeps the previous versions easy-to-implement stateless design! The audience will learn what's changed in the second version of Remote Write, what it unlocks, and how easy it is to update or adopt. Finally, the speakers will share the latest benchmarks and differences with the common alternatives.
Speakers
avatar for Bartłomiej Płotka

Bartłomiej Płotka

Senior Software Engineer, Google
Bartek Płotka is a Senior Software Engineer at Google. SWE by heart, with an SRE background, currently working on Cloud Observability. Previously Principal Software Engineer at Red Hat. Author of "Efficient Go" book with O'Reilly. As the co-founder of the CNCF Thanos project and... Read More →
avatar for Callum Styan

Callum Styan

Senior Software Engineer, Grafana Labs
Callum is a software engineer from Vancouver, Prometheus Team Member/Maintainer, and currently works on Loki at Grafana Labs.
Wednesday November 13, 2024 11:15am - 11:50am MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability

12:10pm MST

Towards Zero Change Incidents: Intuit's Strategy for Implementing AI-Driven Progressive Delivery - Avik Basu & Saravanan Balasubramanian, Intuit
Wednesday November 13, 2024 12:10pm - 12:45pm MST
At Intuit, rapid development is essential for swift feature updates and fixes. Yet, 33% of last year's incidents were due to new deployments, highlighting the need for a progressive delivery system with automated rollback capabilities. However, traditional static thresholds fall short for Intuit's ~2500 services, each with unique patterns across multiple key performance metrics. To tackle this, Intuit has implemented an ML-based progressive delivery system that utilizes Prometheus to monitor multivariate metrics, offering a comprehensive view of application health and performance during deployments. The talk will present a case study application, identify its critical metrics, and showcase how Intuit leverages Numaproj and its out-of-the-box ML models to generate anomaly scores during deployments using Argo Rollouts. This strategy enables Intuit to quickly identify and address issues using AIOps techniques, ensuring a smooth and dependable customer experience.
Speakers
avatar for Saravanan Balasubramanian

Saravanan Balasubramanian

Senior Staff Software Engineer, Intuit
Bala is the lead engineer and maintainer in Argo workflow project , Intuit- leading AIOps, K8s Abstraction and Argo workflow project for open source community and Intuit.
avatar for Avik Basu

Avik Basu

Staff Machine Learning Engineer, Intuit
Avik is a data scientist and machine learning engineer with expertise across multiple ML domains such as computer vision, natural language understanding, reinforcement learning, and time series. Currently, he leads the machine learning initiatives for open-source AIOps at Intuit... Read More →
Wednesday November 13, 2024 12:10pm - 12:45pm MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability

2:30pm MST

Unifying Observability: Correlating Metrics, Traces, and Logs with Exemplars and OpenTelemetry - Kruthika Prasanna Simha & Charlie Le, Apple
Wednesday November 13, 2024 2:30pm - 3:05pm MST
In modern distributed systems, observability is key to understanding application performance and behavior. While metrics, traces, and logs each provide valuable insights, their true power is realized when they are correlated. This talk will dive into the practical benefits and implementation of correlating these signals with exemplars using the OpenTelemetry SDK and Collector, and showcase the results in Grafana. Attendees will learn how to leverage OpenTelemetry to create exemplars which will allow them to navigate from either logs or metrics to their traces.
Speakers
avatar for Kruthika Prasanna Simha

Kruthika Prasanna Simha

Senior Software Engineer, Apple
Kruthika is a software engineer at Apple specializing in building ML enabled observability solutions. She holds a Masters in Computer Engineering and has specialized in Machine Learning. In her free time, she likes to dabble with Jupyter Notebooks for running experiments with data... Read More →
avatar for Charlie Le

Charlie Le

Senior Software Engineer, Apple
Charlie is a software engineer at Apple, specializing in building and scaling cloud native observability solutions and infrastructure. Deeply inspired by the collaborative spirit of open source, he actively contributes to projects like Cortex and OpenTelemetry, shaping the future... Read More →
Wednesday November 13, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability

3:25pm MST

Using OpenTelemetry for Deep Observability Within Messaging Queues - Shivanshu Raj Shrivastava & Ekansh Gupta, SigNoz
Wednesday November 13, 2024 3:25pm - 4:00pm MST
The recent changes in OpenTelemetry have made new semantic conventions and changes in agents to better monitor messaging queues such as Kafka, RabbitMQ, and Amazon SQS, etc. In this session, we'll discuss how those semantic conventions are standardizing the telemetry collected from producers, consumers, and the messaging queues, and how in-depth observability can be achieved by correlating producer-to-consumer spans with the metrics collected from Kafka. Additionally, We will demonstrate how the Kafka Java client side instrumentation enabled and JMX metrics collected from Kafka how OpenTelemetry instrumentation can help for metrics to trace and trace to metrics correlation and spot reasons for anomalies like increased consumer lag, partition failures, time taken by messaging queues. This will also help in giving the corresponding traces in time that can help end users to better delve into their infrastructures and optimize their asynchronous applications.
Speakers
avatar for Ekansh Gupta

Ekansh Gupta

SDE, SigNoz
Ekansh is a Software Development Engineer with SigNoz, with active involvement in various open-source and cloud native communities for upwards two years now. He was previously an SDE Intern at SteamLabs. He is also a speaker for a couple of talks at PyCon, KubeCon and MozFests. Ekansh... Read More →
avatar for Shivanshu Raj Shrivastava

Shivanshu Raj Shrivastava

Founding Engineer, SigNoz
Shivanshu is a Founding Engineer at SigNoz, working on building an OTeL native observability product. He has a keen interest in deep tech and OSS. He is a CNCF ambassador and a member of CNCF projects like OTeL, k8s, and Istio. He has got the opportunity to mentor contributors in... Read More →
Wednesday November 13, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability

4:30pm MST

Watching the Watchers: How We Do Continuous Reliability at Grafana Labs - Nicole van der Hoeven, Grafana Labs
Wednesday November 13, 2024 4:30pm - 5:05pm MST
Nothing is foolproof. Everything fails eventually. Observability tools help predict and lessen the impact of those failures, as the watchers of your software systems. But who watches the watchers? At Grafana Labs, we're not immune to production incidents. Just like any company, we still sometimes move too quickly. We run complex, microservices-based systems ourselves, so we have to eat our own dogfood on a daily basis. In this talk, I reveal: - how we solved a years-long mystery that cost us $100,000+ - how we got our internal Mimir clusters to reliably hold 1.3 billion time series for metrics - what we've had to do to scale our Loki clusters to handle 324 TB of logs a day - what our Grafana dashboards to monitor Grafana Cloud look like Sometimes, it's easier to learn from failures in observability than from successes. This talk is a confession of some of our worst sins as well as a realistic look under the hood at how we're improving the continuous reliability of our stack.
Speakers
avatar for Nicole van der Hoeven

Nicole van der Hoeven

Senior Developer Advocate, Grafana Labs
Nicole is a Senior Developer Advocate at Grafana Labs and a performance engineer with over a decade of experience in breaking software and learning to build it back up again. She has lived in the Philippines, the US, Australia, the Netherlands, and Portugal, helping teams all over... Read More →
Wednesday November 13, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability
  • Content Experience Level Any

5:25pm MST

The OTTL Cookbook: A Collection of Solutions to Common Problems - Tyler Helmuth, Honeycomb & Evan Bradley, Dynatrace
Wednesday November 13, 2024 5:25pm - 6:00pm MST
Is your telemetry missing key attributes? Maybe there are details in your log bodies you’d rather have as attributes. It is common to find yourself in situations where your data doesn't look how you expect: it's too large, the wrong shape, or doesn't have everything you want. The OpenTelemetry Collector uses the OpenTelemetry Transformation Language (OTTL) to solve these problems. OTTL enables telemetry transformations based on any field of the payload, utilizing functions to execute the changes. In this session, Tyler and Evan will go over a brief intro to OTTL and then cover example after example of situations where you can use OTTL to solve processing problems in the Collector, like setting attributes, or defining an entire OTLP log record from a kubernetes event. Get ready with situations of your own, as we’ll save time at the end to try writing OTTL statements live on stage for your transformation or filtering issues so we can demonstrate how flexible OTTL truly is.
Speakers
avatar for Tyler Helmuth

Tyler Helmuth

Staff Software Engineer, Honeycomb
Tyler is a Sr. Software Engineer at Honeycomb with a passion for observability and helping users start their observability journey. He is a maintainer for the OpenTelemetry Collector and OTel Helm Charts, and an active contributor to other OTel repositories. While not its originator... Read More →
avatar for Evan Bradley

Evan Bradley

Senior Software Engineer, Dynatrace
Evan helps maintain the OpenTelemetry Collector, where he is also a primary contributor to the OpenTelemetry Transformation Language (OTTL) and the OpenTelemetry Agent Management Protocol (OpAMP) Collector components. Evan has a background in developing DevOps tooling and observability... Read More →
examples yaml
Wednesday November 13, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability
  • Content Experience Level Any
 
Thursday, November 14
 

11:00am MST

Lessons Learned Adopting OpenTelemetry at Scale - Alex Arnell, Heroku / Salesforce
Thursday November 14, 2024 11:00am - 11:35am MST
OpenTelemetry makes bold promises to unlock and unleash your observability, providing you with open standards, no vendor lock-in and interoperability with just about everything. You believe that your organization could really benefit from an uplift to modern observability. It would be easy to adopt if you were was starting out fresh, but let’s face it, most organizations have sprawling codebases and architectures. Decisions, infrastructure and often engineers that have been in place for decades. How do you even get started? This Heroku case study dives into our OpenTelemetry journey where you'll discover strategies on adoption, how to deal with internal resistance, and technical guidance on rolling out the change. Learn from our missteps and what we wished we had done differently. You’ll even see how a bit of luck can help drive adoption over the finish line. This session will equip you to navigate OpenTelemetry adoption in the most entrenched environments.
Speakers
avatar for Alex Arnell

Alex Arnell

Principal Engineer, Heroku / Salesforce
Alex Arnell is a Principal Engineer at Heroku / Salesforce with over two decades of software development experience. Alex has spent the last decade specializing in telemetry and observability systems. Alex is the lead engineer of the Telemetry team at Heroku, responsible for the collection... Read More →
Thursday November 14, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability
  • Content Experience Level Any

11:55am MST

Cognitive and Self-Adaptive System for Effective Distributed-Tracing in Applications - Mitul Tandon & Akash Gusain
Thursday November 14, 2024 11:55am - 12:30pm MST
In response to challenges of limited trace capture in dynamic API tracing systems, the solution leverages Machine Learning and Cognitive approach for unbiased trace collection. Unlike existing implementations with a skewed distribution(~5%) towards normal traces, our self-adaptive system dynamically learns to prioritise and capture diverse traces, crucial for effective diagnosis of API failures and performance issues. This innovative approach significantly enhances the SREs ability to triage complex issues, leading to a game-changing reduction in Mean Time to Resolve (MTTR). The Adaptive Sampling approach analyses existing system traces and autonomously adjusts the sampling rate, eliminating manual configs. This ML-based solution outcome includes streamlined trace metric analysis, enhanced reliability work efficiency, and considerable infrastructure cost reduction through targeted trace collection, ultimately making a significant impact on operational effectiveness & reliability
Speakers
avatar for Akash Gusain

Akash Gusain

Software Engineer, Bito
Akash Gusain is a software engineer with over two years of experience in designing and deploying cloud-native applications. At VMware, he contributed to the development of scalable and robust cloud solutions, showcasing his ability to learn and adapt quickly to new technologies while... Read More →
avatar for Mitul Tandon

Mitul Tandon

Software Engineer
A DevOps/SRE Engineer at VMware with 2+ years of experience with working on distributed systems and containerised applications.
Thursday November 14, 2024 11:55am - 12:30pm MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability
  • Content Experience Level Any

2:30pm MST

Low-Overhead, Zero-Instrumentation, Continuous Profiling for OpenTelemetry - Christos Kalkanis, Elastic
Thursday November 14, 2024 2:30pm - 3:05pm MST
Elastic has recently donated its whole-system continuous profiling agent to OpenTelemetry. After a thorough community review process, the donation was enthusiastically accepted. Leveraging eBPF, the profiling agent provides unprecedented visibility into the runtime behavior of all applications: it builds stacktraces that go from the kernel to userspace native code, all the way into code running into higher level runtimes, enabling users to identify performance regressions, reduce wasteful computations, and debug complex issues faster. This session will explore: - Benefits of eBPF-based continuous profiling compared to conventional approaches that rely on application instrumentation - How the agent builds profiles that seamlessly span kernel, native code and most widely used application runtimes - Integration with the rest of OpenTelemetry: OTLP and Collector
Speakers
avatar for Christos Kalkanis

Christos Kalkanis

Principal Software Engineer, Elastic
Christos is a principal engineer at Elastic, a maintainer for the OpenTelemetry Profiling SIG and a co-author of the donated OpenTelemetry profiling agent previously known as the Elastic Universal Profiling agent. After more than a decade of focusing on cybersecurity offense he moved... Read More →
Thursday November 14, 2024 2:30pm - 3:05pm MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability

3:25pm MST

Measuring All the Costs with OpenCost Plugins - Alex Meijer, Stackwatch
Thursday November 14, 2024 3:25pm - 4:00pm MST
The CNCF OpenCost project is approaching 5,000 stars on GitHub and has become one of the most popular cost monitoring systems in use. Originally focused on cloud provider and Kubernetes cost monitoring, OpenCost expanded its scope in May 2024 by launching OpenCost Plugins with Datadog as the first reference implementation. These plugins allow users to measure and visualize virtually any cost in OpenCost, without writing a single line of OpenCost code. Alex Meijer, OpenCost and OpenCost Plugins maintainer, will speak on how the OpenCost Plugins ecosystem works and will dive into the use of the open-source FOCUS spec in OpenCost, which is the key to being able to measure nearly any cost. A plugin-enabled OpenCost deployment will be demoed, with an external cost (Datadog) visualized alongside the traditional Kubernetes and cloud provider costs. Alex will also share how to get started with plugins so that users can start analyzing the costs of whatever matters to their unique use case!
Speakers
avatar for Alex Meijer

Alex Meijer

Staff Software Engineer, Stackwatch
Alex Meijer has been working with Kubernetes for his entire career, being at various times a user, operator, and currently as someone working to help others use Kubernetes better. He has served in startups ranging in size from 5-90 people. Alex contributes to the Opencost project... Read More →
Thursday November 14, 2024 3:25pm - 4:00pm MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability

4:30pm MST

Mastering OpenTelemetry Collector Configuration - Steve Flanders, Cisco
Thursday November 14, 2024 4:30pm - 5:05pm MST
Configuring the OpenTelemetry Collector can be a daunting task for both novices and seasoned professionals alike. Yet, mastering this crucial aspect is essential for unlocking the full potential of your observability stack. In this session, you will embark on a journey to gain the knowledge and skills needed to conquer common OpenTelemetry Collector configuration challenges. This session will draw from real-world experiences and best practices and provide live demonstrations to navigate the intricacies of OpenTelemetry Collector configuration. Whether you are a novice looking to get started or a seasoned veteran seeking to level up your skills, this session promises to empower you with the knowledge and confidence needed to properly and efficiently configure the OpenTelemetry Collector.
Speakers
avatar for Steve Flanders

Steve Flanders

Senior Director of Engineering, Splunk
Steve Flanders is a Senior Director of Engineering at Splunk (acquired by Cisco) responsible for the Observability Platform team, which includes contributions to the OpenTelemetry project. He was previously the Head of Product at Omnition (acquired by Splunk). Prior to Omnition, he... Read More →
Thursday November 14, 2024 4:30pm - 5:05pm MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability
  • Content Experience Level Any

5:25pm MST

Now You See Me: Tame MTTR with Real-Time Anomaly Detection - Kruthika Prasanna Simha & Raj Bhensadadia, Apple Inc.
Thursday November 14, 2024 5:25pm - 6:00pm MST
Picture this! You are running an application on a Kubernetes cluster & you notice that your nodes have been restarting and your users are noticing that your application is unreachable. As an engineer, you want to identify these failures in real-time & differentiate these from known states, at scale. But we know, static thresholds fail for dynamic metrics! This session explores real-time anomaly detection for cloud-native systems. We'll show you how to reduce MTTR and mean time to analyse by proactively identifying abnormal application behavior using statistical & machine learning algorithms on time series data from Prometheus. Learn to pinpoint issues, identify missing instrumentation, and visualize anomalies using Grafana. This session equips you to achieve faster issue resolution and maintain optimal application health. We'll demo practical techniques for metrics selection, anomaly detection and proactive issue identification to manage your cloud-native applications.
Speakers
avatar for Raj

Raj

Machine Learning Engineer, Apple Inc.
Raj Bhensadadia, a machine learning engineer with a passion for leveraging ML technologies to enhance monitoring and analysis of large scale systems and ensure robustness and performance of infrastructure and services.
avatar for Kruthika Prasanna Simha

Kruthika Prasanna Simha

Senior Software Engineer, Apple
Kruthika is a software engineer at Apple specializing in building ML enabled observability solutions. She holds a Masters in Computer Engineering and has specialized in Machine Learning. In her free time, she likes to dabble with Jupyter Notebooks for running experiments with data... Read More →
Thursday November 14, 2024 5:25pm - 6:00pm MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability
  • Content Experience Level Any
 
Friday, November 15
 

11:00am MST

Shopify’s Open Source Approach to Network Monitoring with eBPF, Vector and ClickHouse - Sebastian Rabenhorst & Matt Franklin, Shopify
Friday November 15, 2024 11:00am - 11:35am MST
At Shopify, we’ve successfully implemented a scalable, open-source network monitoring solution for the cloud. In this talk, we will demonstrate how we built a network monitoring solution leveraging eBPF, Vector, ClickHouse, and Grafana. This solution enables us to monitor over 30 million network flow, DNS and other networking-related events per second at the container level for thousands of services across hundreds of Kubernetes clusters in the Shopify Cloud. We will also share the lessons we learned regarding these technologies and provide insights on how you can implement your own purely open-source monitoring solution capable of handling millions of events per second.
Speakers
avatar for Sebastian

Sebastian

Senior Production Engineer, Shopify
Sebastian is a Senior Production Engineer at Shopify mostly working on monitoring and logging solutions as part of the observability team.
Friday November 15, 2024 11:00am - 11:35am MST
Salt Palace | Level 1 | Grand Ballroom B
  Observability
 

Share Modal

Share this link via

Or copy link

Filter sessions
Apply filters to sessions.
  • 🚨 Contribfest
  • 🪧 Poster Sessions
  • AI + ML
  • Breaks
  • ⚡ Lightning Talks
  • Cloud Native Experience
  • Cloud Native Novice
  • CNCF-hosted Co-located Events
  • Connectivity
  • Data Processing + Storage
  • Diversity + Equity + Inclusion
  • Emerging + Advanced
  • Experiences
  • Keynote Sessions
  • Maintainer Track
  • Observability
  • Operations + Performance
  • Platform Engineering
  • Project Opportunities
  • Registration
  • SDLC
  • Security
  • Solutions Showcase
  • Sponsor-hosted Co-located Event
  • Tutorials