SREDAY

Site Reliability, DevOps and Cloud

October 11, 2025 HCLTech, Chennai, India

1
Days
25+
Speakers
2
Tracks
200
Attendees

Observability 3.0: Where Human-Driven SRE Meets AIOps

Siva Bagavathi & Balaji Venkatesan
GuhaTek

Modern systems are too dynamic for dashboards alone and too business-critical to leave to black-box automation. This session reframes “observability 3.0” as a sociotechnical practice that fuses human-driven SRE with pragmatic AIOps. We’ll explore how rich telemetry (metrics, logs, traces, profiles, and user signals) plus topology and domain context evolves from passive visibility into an active decision system. On the human side, SREs define intent via SLOs, failure hypotheses, and reliability narratives and design for explainability, guardrails, and post-incident learning. On the AI side, we’ll discuss how correlation, anomaly detection, event deduplication, and LLM-powered triage can cut noise and accelerate root-cause discovery, while runbook automation and safe auto-remediation handle the boring, reversible fixes.

Expect concrete patterns: telemetry pipelines that feed feature stores, knowledge graphs that reflect real service topology, feedback loops that keep models honest, and governance that ensures AI augments not overrides SRE judgment. We’ll also cover common traps (data debt, alert fatigue, overfitting to incidents) and a rollout playbook that starts small and proves value quickly.

Seasoned, forward-looking professional with around two decades of experience in Site Reliability Engineering, Solution architecting, Performance engineering, Capacity planning, Chaos Engineering, product development, automation and setup SRE for large enterprise applications in various business domains. Consistently recognized as competent individual, skilled at coordinating with cross-functional teams in a fast-paced environment to steer timely completion of project with budgetary constraints.

Sponsors & Partners

Want to become a sponsor? Get in touch!