SREDAY

Site Reliability, DevOps and Cloud

November 27, 2025 PagerDuty, Lisbon, Portugal

1
Days
10+
Speakers
1
Tracks
50
Attendees

Building an Actionable Runbook Platform

Rajat Gupta
Paymenttools

Actionable runbooks close the gap between “what to do” and “doing it.”
This talk shows how to design and ship a runbook platform where steps can be clicked and executed safely during incidents.

What the system does
- Create and manage runbooks with tags and Markdown.
- Blocks include: instruction, command, API call, conditional, and timer.
- Execute a full runbook or a single block with outputs captured in history.
- Use RBAC, encrypted credential store, versioning, and containerized environments to keep execution safe and repeatable.
- Core entities and API surface: Runbook, RunbookVersion, Block, ExecutionJob, Credentials, plus endpoints for runbooks, versions, execution, and credentials.

Architecture at a glance
- React SPA communicates with a FastAPI backend and MongoDB.
- An execution worker runs jobs and streams results.

Demo flow
1. Create a runbook with tags and Markdown instructions.
2. Add a command block and an API call block that uses a stored credential.
3. Assign a custom Docker execution environment to the runbook.
4. Run a single block, then run the entire runbook and watch outputs land in history.

What you will learn
- Design principles for truly actionable runbooks and how they differ from static docs.
- How to implement safe execution with RBAC, audit, and container isolation.
- Patterns for versioning and rollbacks so teams can iterate without fear.
- How this approach complements existing incident tooling and industry guidance on making runbooks actionable
([Incident][1], [resources.rundeck.com][2]).

Who should attend
- SRE, platform, security, and backend engineers who own on-call and incident response.
- Engineering managers who want safer self-service for ops tasks.

I’m a Senior Engineering Manager at Paymenttools in Berlin, leading platform teams across SRE and Security. I focus on reliability, observability, Kubernetes, and policy as code, and I drive green-field work from idea to production. Lately I’ve been applying GenAI into the SRE and platform domains. I like clear processes, data-backed decisions, and practical solutions, and I write for peers to share what works and what doesn’t.

Sponsors & Partners

Want to become a sponsor? Get in touch!