Observability relies on metrics as a crucial aspect, providing a cost-effective and speedy way to address SDLC and Software health queries. From combating Noisy Neighbors to battling in the Streaming Wars and dealing with the pulse of High Cardinality, what are the best workflows to deal with it?
Observability relies on metrics as a crucial aspect, providing a cost-effective and speedy way to address queries regarding SDLC and Software health, which can otherwise be challenging. With metrics, inevitably, you hit High cardinality problems. While searching for profound insights from their systems, we often face restrictions due to the cardinality limitations of the observability tools. But what makes high cardinality significant, and why is it an inevitable challenge when monitoring systems on a vast scale? We will delve into the anatomy of a metric and issues that high cardinality can help resolve, from combating Noisy Neighbors to battling in the Streaming Wars and dealing with the pulse of High Cardinality. However, modern systems' limitations make cardinality an unsolved problem. To find the best solution for cardinality, it is crucial to understand the Metric Lifecycle. Lastly, we will define workflows that enable scaling cardinality to millions, not just thousands. When software is in production, it's crucial to have telemetry and instrumentation to troubleshoot issues. Unfortunately, this can be a time-consuming and costly process. Often, we resort to using generic solutions that may not address all the unique needs of our specific system. This can lead to missed opportunities for improvement and wasted time looking for answers elsewhere. I've spent a decade working in this field and can help the audience explore new questions and simplify their workflows. Most importantly, it will allow architects and engineering leaders to keep things SIMPLE and reach that 9 with much less pain.
Piyush Verma is co-founder and CTO at Last9.io, an SRE platform that aims to minimize the toil that SREs and decision-makers need to go through to reduce the time to make a decision. Earlier, he led SRE @ TrustingSocial.com to produce 600 million credit scores a day across 4 countries. In his past life, he built oogway.in (exit to TrustingSocial.com), datascale.io (exit to Datastax), and siminars.com.