Building an Actionable Runbook Platform

Rajat Gupta

Paymenttools

Actionable runbooks close the gap between “what to do” and “doing it.”
This talk shows how to design and ship a runbook platform where steps can be clicked and executed safely during incidents.

What the system does
- Create and manage runbooks with tags and Markdown.
- Blocks include: instruction, command, API call, conditional, and timer.
- Execute a full runbook or a single block with outputs captured in history.
- Use RBAC, encrypted credential store, versioning, and containerized environments to keep execution safe and repeatable.
- Core entities and API surface: Runbook, RunbookVersion, Block, ExecutionJob, Credentials, plus endpoints for runbooks, versions, execution, and credentials.

Architecture at a glance
- React SPA communicates with a FastAPI backend and MongoDB.
- An execution worker runs jobs and streams results.

Demo flow
1. Create a runbook with tags and Markdown instructions.
2. Add a command block and an API call block that uses a stored credential.
3. Assign a custom Docker execution environment to the runbook.
4. Run a single block, then run the entire runbook and watch outputs land in history.

What you will learn
- Design principles for truly actionable runbooks and how they differ from static docs.
- How to implement safe execution with RBAC, audit, and container isolation.
- Patterns for versioning and rollbacks so teams can iterate without fear.
- How this approach complements existing incident tooling and industry guidance on making runbooks actionable
([Incident][1], [resources.rundeck.com][2]).

Who should attend
- SRE, platform, security, and backend engineers who own on-call and incident response.
- Engineering managers who want safer self-service for ops tasks.

I’m a Senior Engineering Manager at Paymenttools in Berlin, leading platform teams across SRE and Security. I focus on reliability, observability, Kubernetes, and policy as code, and I drive green-field work from idea to production. Lately I’ve been applying GenAI into the SRE and platform domains. I like clear processes, data-backed decisions, and practical solutions, and I write for peers to share what works and what doesn’t.

SREDAY

Site Reliability, DevOps and Cloud

November 27, 2025 PagerDuty, Lisbon, Portugal

Building an Actionable Runbook Platform

Rajat Gupta

Sponsors & Partners