SREday

Schedule

Day 1

09:30

From Blind Spots to Brilliance: The Evolving Landscape of Observability

Ford Motor Company

Ever wonder how top teams keep complex systems running smoothly? This keynote will explore the current landscape of observability, moving beyond traditional pillars to embrace advanced techniques and holistic insights. We'll discuss how SRE teams are leveraging emerging technologies to navigate...

10:00

Coffee break

Main lobby

10:30

Mukta Aphale

The Hidden Cost of Telemetry at Scale

Last9

OpenTelemetry makes observability simple—add an agent and instantly see every service interaction. But this ease creates a hidden trap: teams instrument everything without considering the cost impact, leading to observability bills that consume 10-30% of infrastructure budgets. This talk exposes...

11:00

Ram Iyengar

Cloud Cost Optimization: What, Why, & How

Cloud Foundry Foundation

- **OpenCost** is an open-source tool designed to measure the **granular cost of cloud resources**. - It is **best suited for cloud-native environments**, but works across several cloud platforms. - While **everyone knows the basics of cost optimization**, few apply them **proactively**—they’re...

11:30

Clarence Selestin

"The Rise of SRE AI Agents" - How AI Agents Are Redefining Reliability

Ford Motor Company

In an era of increasingly complex, distributed systems, Site Reliability Engineering (SRE) teams are perpetually navigating a deluge of operational data, often battling manual toil, fragmented tools, and the relentless pressure of incident response. The traditional approach, while critical, often...

12:00

Siva Bagavathi & Balaji Venkatesan

Observability 3.0: Where Human-Driven SRE Meets AIOps

GuhaTek

Modern systems are too dynamic for dashboards alone and too business-critical to leave to black-box automation. This session reframes “observability 3.0” as a sociotechnical practice that fuses human-driven SRE with pragmatic AIOps. We’ll explore how rich telemetry (metrics, logs, traces,...

12:30

Lunch & networking

Main lobby

13:30

Sudip Chakraborty & Gaurav Chauhan

Mastering ECS at Scale: A Journey from Chaos to Control on 100% Spot Instances

Zomato

This is a story from the trenches of running one of India’s largest ECS fleets—serving millions of requests entirely on infrastructure that can disappear with just two minutes’ notice. We began with the “easy path” of a third-party managed solution, but as we scaled, it quickly became our biggest...

14:00

Soham Chakraborty

Tracing under the hood: Inspecting Kubernetes cluster with Inspektor Gadget

Sematext

- **Debugging Kubernetes workloads** across ephemeral pods, nodes, and services is not straightforward, especially in real time. - **Traditional observability tools** provide high-level metrics, but lack the **depth and immediacy** needed to find root causes of elusive, low-level issues. - This...

14:30

Aditya Joshi

AI Beyond the Chat: MCP for Kubernetes

Walmart

Conversational AI has captured widespread attention, but the true potential of AI lies well beyond chatbots and dialogue systems. As enterprises look to build intelligent, adaptive systems, the ability to provide real-time, contextual understanding becomes essential. This is where Model Context...

15:00

Networking & sponsor crawl

Main lobby

15:30

Sanket Bisne

Kubernetes Under Siege: How Network Policies Defend Your Cluster

Lloyds Banking Group

This session is for SREs, DevOps engineers, and platform teams who want to strengthen Kubernetes security at the network layer. While cloud providers offer firewalls and service meshes, the last line of defense inside the cluster is Network Policies. The talk balances concepts + live demos and...

16:00

Guruprasad Murali & Balanganesh S

NightsWatch - The AI Assisted OnCall Triaging

Freshworks

NightsWatch is a CLI tool for managing lengthy SOP into executable commands. It simplifies on call operations for production infra. For devs, it is extensible, enabling to add new commands as SOPs evolve.During oncall triaging, who really wants read huge SOPs & run commands without a single mistake?

16:30

Vivek Anandaraman

Enterprise Observability at Scale: Metrics, Traces, and Profiling with Coroot

CloudElu Labs

Monitoring and observability at enterprise scale is challenging — especially when dealing with 500 VMs and 3 Kubernetes clusters totaling 1,500 nodes. In this session, I’ll share how we implemented full-stack observability using Coroot agents, collecting metrics, logs, traces, and CPU/memory...

17:00

Wrap up

Scan each other's QR codes & head to a nearby pub!

10:00

Coffee break

Main lobby

10:30

Bikram Debnath

Seeing the Unseen: Observability with OTel

IBM India Software Lab (ISL)

In modern applications, observability is essential. OpenTelemetry (OTel) has emerged as the standard for telemetry data, providing a unified way to collect, process, and export logs, metrics, and traces. Beyond data collection, organizations need effective ways to store, analyze, and visualize...

11:00

Mahesh Venkataraman & Koushik Vijayaraghavan

STAMPing Out Outages: Applying MIT's STAMP/ STPA to Resilience Engineering in SRE

Accenture

```markdown Why should SREs care about systems thinking applied in aviation safety engineering? Modern distributed systems face the same challenges as complex safety-critical systems: emergent failures, cascading outages, and the gap between system design and runtime behavior. While traditional...

11:30

Meiyappan Kannappa

Is Your AI Agent a Black Box? Here’s How to Open It

Ford Motor Company

Why can't you just use your old monitoring tools for your new AI systems? This session answers that question by introducing AI Observability. You'll learn how to use AIOps to move beyond guesswork and effectively manage your AI agents and infrastructure

12:00

Ganesh Padmanaban

Supercharging Performance Pipelines: JMeter Automation Meets GenAI Observation

GuhaTek

I’ve engineered an AI-powered Continuous Performance Testing pipeline: JMeter unleashed in GitLab CI/CD, Python-driven baseline magic, and Google Gemini LLM delivering smart insights. Real-time, action-ready alerts hit Slack—be the first to squash regressions! This talk reveals every step

12:30

Lunch & networking

Main lobby

13:30

Akshat Goel & Nishant Sarraff

Take Control of Your Runners: Architecting a 10× Cheaper, High‐Performance CI/CD Orchestrator

Zomato

What We’ll Cover - **Massive Scale Orchestration** - Executing **1M+ CI/CD jobs every month** across **600+ repositories** in multiple GitHub organizations. - Each workflow run triggers comprehensive validations, combining **open‐source tools** and **in‐house linters**. - Build-intensive...

14:00

Kaustubha Shravan

Taming the Chaos: Avoiding Kubernetes Landmines with Observability & AIOps

Microsoft

Kubernetes promises portability and scalability, but in reality, most production outages happen due to avoidable mistakes. Security gaps, misconfigured health checks, poor scaling strategies—all can derail even experienced teams. In this session, we’ll uncover: Security Disasters → The risk of...

14:30

Manu Muraleedharan

Automating SRE with AI Agents

Serverless Guru

Imagine an AI Agent that detects production incidents from logs, finds a resolution runbook and resolves it autonomously. In this session, we’ll walk through the Strands Agents SDK and show how to create such an agent. We will use the RAG pattern to create a repository of Runbook knowledge. Then...

15:00

Networking & sponsor crawl

Main lobby

15:30

Ramprasath Asokan

The Blameless Blueprint: Forging a Resilient SRE Culture

Nuclei (CDNA Technologies)

Let's be honest: "blameless culture" is easy to say, but hard to do. We've all been in those post-mortems where the air is thick with unsaid accusations and the search for a root cause feels more like a search for a scapegoat. What happens when good intentions clash with the messy reality of a...

16:00

Jenisten Xavier

Unlocking Effortless Cloud Deployments: Workload Identity Federation in Action

FULL Creative

Managing secrets on the CI/CD pipelines is a constant challenge in cloud reliability. What if you could eliminate long-lived service account keys and still maintain seamless, secure deployments? My talk uncovers how Workload Identity Federation lets to securely connect with GitHub Actions...

16:30

Wrap up

Scan each other's QR codes & head to a nearby pub!

Site Reliability, DevOps and Cloud

October 11, 2025 HCLTech, Chennai, India

Event Starts In:

Tickets

Schedule

Day 1

09:30

10:00

10:30

11:00

11:30

12:00

12:30

13:30

14:00

14:30

15:00

15:30

16:00

16:30

17:00

10:00

10:30

11:00

11:30

12:00

12:30

13:30

14:00

14:30

15:00

15:30

16:00

16:30

Speakers

Aditya Joshi

Akshat Goel & Nishant Sarraff

Bikram Debnath

Clarence Selestin

Ganesh Padmanaban

Guruprasad Murali & Balanganesh S

Hemant Kamatgi

Jenisten Xavier

Kaustubha Shravan

Mahesh Venkataraman & Koushik Vijayaraghavan

Manu Muraleedharan

Meiyappan Kannappa

Mukta Aphale

Ram Iyengar

Ramprasath Asokan

Sanket Bisne

Siva Bagavathi & Balaji Venkatesan

Soham Chakraborty

Sudip Chakraborty & Gaurav Chauhan

Vivek Anandaraman

Venue

HCLTech

Sponsors & Partners