SREday 2023

14 - 15 Sep London, UK
Site Reliability Engineering Conference

Use continuous profiling to gain a deeper understanding of your incidents

Christian Simon
Grafana Labs

During this talk I will show how continuous profiling can help aid the investigators during an active incident, to reduce the time to recovery. Continuous profiling data can also give you some more clues to finding the root cause in the aftermath. I will share our experiences with real word outages/performance degradations of our observability cloud platform, and how continuous profiling helped to restore the service quicker. And how we use the profiling data during the Post Incident Review to spot the root cause and/or find more improvements to our systems.

Christian can rely on a broad set of experiences around the Linux ecosystem. Starting as early as Kernel version 2.2, he managed to convert his hobby into a career: working as a Kubernetes engineer (some might remember the kube-lego project) showed me the value a modern observability stack can provide. I am now working as a Software Engineer on Grafana Lab's Pyroscope team.

Sponsors & Partners

Want to become a sponsor? Get in touch!