Monitoring and observability at enterprise scale is challenging — especially when dealing with 500 VMs and 3 Kubernetes clusters totaling 1,500 nodes. In this session, I’ll share how we implemented full-stack observability using Coroot agents, collecting metrics, logs, traces, and CPU/memory profiles out-of-the-box with minimal operational overhead.
I'm passionate about bridging technology and business outcomes. I have over 20 years experience in Performance Engineering and Site Reliability Engineering. As a founder of CloudElu Labs, I help engineering and product teams embed SRE, FinOps Chaos Engineering and Observability practices directly into their DevOps workflows — starting from development.
I also founded From Dev to Ops - a Chennai based Devops Community, which brings together practitioners of Devops, SRE and Platform Engineering to Learn and Share the expertise.
With a strong belief that visibility drives innovation, I specialize in implementing cost-aware infrastructure, OpenTelemetry-based observability platforms, and developer-friendly tooling across AWS, Azure, and GCP.
I work with forward-thinking leaders to eliminate blind spots in cloud spend, improve reliability through modern SRE practices, and enable strategic, forecast-driven cloud decisions — without adding friction to engineering.