SREDAY

Site Reliability, DevOps and Cloud

October 3, 2025 San Francisco, CA, USA

1
Days
16+
Speakers
1
Tracks
100
Attendees

From Dashboard to Defense: Automating Resilience at Large Scale

Sureshkumar Karuppuchamy
eBay

Modern production systems can no longer rely on static dashboards and reactive on-call rotations to ensure uptime. At large scale — with billions of requests flowing through mission-critical services — reliability must be engineered into the system through autonomous detection, mitigation, and recovery. In this session, I’ll share how our platform team evolved from traditional observability stacks to an integrated, self-defending resilience architecture that transforms metrics into real-time, automated mitigations. Key topics include: Actionable observability: Designing high-fidelity Prometheus instrumentation that surfaces actionable SLO breaches and capacity anomalies — not just vanity metrics. Closed-loop alerting: Building alert pipelines that automatically trigger mitigations, including traffic shaping, circuit breaking, and dynamic configuration changes. Continuous delivery at scale: How we implemented fully automated CI/CD pipelines with canary deployments, progressive rollouts, and automatic rollback — eliminating manual gates while preserving production stability. Dynamic rate limiting: Using adaptive throttling to contain abusive or runaway workloads before they impact critical path services. Proactive incident response: Real-world learnings from production incidents that shaped our automated safeguards, including post-incident automation improvements and resilience patterns. Operational trust: Governance strategies for enabling engineers to trust self-healing automation, from progressive rollout policies to guardrails for fail-safe operation. Attendees will gain a practical blueprint for evolving traditional monitoring into an autonomous resilience layer — with concrete patterns, architectural considerations, and lessons learned operating a high-volume, always-on platform. Whether you’re modernizing your incident response playbooks, tightening your feedback loops, or scaling continuous delivery for critical systems, you’ll leave with actionable strategies to move beyond dashboards — and build a production environment that can defend itself. Key Takeaways How to evolve from passive observability to automated corrective action. Designing metrics pipelines that detect and trigger real-time mitigations. Safe automation of deployments at scale without sacrificing reliability. Implementing dynamic safeguards like adaptive rate limiting and circuit breaking. Practical leadership and governance approaches for building trust in self-healing systems.

Sureshkumar Karuppuchamy is a technology leader with more than two decades of experience designing and modernizing large-scale, AI-enabled infrastructure for some of the world’s most complex platforms. In his current role as a senior engineering leader at eBay, he has led critical modernization efforts across core systems—revamping legacy platforms, transitioning to cloud-native data solutions, and reimagining API architectures to improve agility, reliability, and scalability. His work includes the development of advanced compliance systems that support real-time moderation and auditing in alignment with global regulations like the EU Digital Services Act. He’s also helped shape seller experience through intuitive listing flows and AI-powered tools that streamline product onboarding, such as transforming product images into fully generated listings. Sureshkumar’s contributions have been featured in publications including The Guardian, Deloitte WSJ Insights, and marketscreener.com. He began his career at Oracle, building enterprise solutions for global supply chains, and is a graduate of Anna University’s College of Engineering, Guindy. Passionate about knowledge-sharing, he mentors technologists, contributes to peer-reviewed research, and regularly speaks at international conferences on system architecture, AI, data platforms, and compliance tech.

Sponsors & Partners

Want to become a sponsor? Get in touch!