The backbone of SRE - like all engineering, is monitoring and observability. When designing and implementing platforms, understanding how monitoring and observability telemetry data impacts your systems is critical to scale.
At modern cloud-fleet scale, and as systems grow more complex, the volume of telemetry data in the form of logs, metrics, and traces sent to observability (o11y) systems can quickly exceed manageable limits. This talk dives into tried and true methods for defining and enforcing telemetry data quotas for OpenTelemetry (OTel) Collectors, based on CAP Theorem. We have been applying the principles of CAP Theorem to provide the framework for achieving better telemetry volume & distribution. Through our research & application, we'll share strategies for efficiently measuring data from multiple OTel agents, ensuring rule-based data distribution & addressing the practical implications of the CAP Theorem within telemetry pipelines for cloud native systems. By reframing CAP principles in the context of telemetry data, we'll consider trade-offs between consistency, availability, and partition tolerance when scaling and managing quotas across distributed systems.
Erez Rusovsky is a seasoned entrepreneur and technology leader with extensive experience in DevOps, observability, and product innovation. He currently serves as the Co-Founder & CPO of Sawmills, a cutting-edge SaaS platform revolutionizing telemetry data management to help companies reduce costs and improve the quality of their observability data.
Erez has a technical background as DevOps Engineer, monitoring large-scale grid compute infrastructures. He co-founded Rollout.io, a feature flagging as a service company, where he served as CEO. Under his leadership, Rollout.io raised $7M in funding, launched two developer focused products, achieved millions in revenue, and was successfully acquired by CloudBees. Following the acquisition, Erez spearheaded the feature flagging product, a modern CI solution, and played a pivotal role in driving the vision and execution of a next-generation cloud-native software delivery platform.