SREday 2023

14 - 15 Sep London, UK
Site Reliability Engineering Conference

How We Stopped Thanos from Snapping $100,000 from our Infra Budget

Shubham Srivastava & Deepak Kumar

In a galaxy not so far away, where data is as vast as the cosmos, our team was troubled with observability data chaos. Seeking some clarity, we sought salvation with Thanos and Fluentbit – fabled titans against our metric storage and logging issues. Thanos empowered us with a Prometheus setup with high availability and virtually infinite historical data storage. Prometheus ascended to new heights, flawlessly scaling horizontally and the Thanos Compactor's downsampling abilities promised faster results for querying older data. Fluentbit made collecting, filtering, and outputting logs across multiple sources and destinations effortless. But, little did we know that even the most powerful tools, when not wielded correctly could be double-edged Infinity Stones. Join us on a thrilling tale of blunders as we recount some missteps in configuring these tools, easily missed caveats in data downsampling and log storage, and how the pursuit of seamless data handling almost cost us over $100,000.

Sponsors & Partners

Want to become a sponsor? Get in touch!