Inspired by work from hyperscale SRE teams, RunWhen is building the industry's largest open source library of troubleshooting automation. Contributing Authors receive royalties when the company's enterprise customers use their code to automate root cause analysis and remediation.
Hyperscale SRE teams don't use traditional runbooks. Instead, each team hooks in to a massive shared library of automated troubleshooting steps covering all layers of their infrastructure, platform and application dependencies. RunWhen is building a (paid) expert community to create an open source equivalent of these shared libraries, along with an enterprise AI platform that runs hundreds of automated troubleshooting steps in response to incoming alerts, questions, CI/CD jobs, etc. When an enterprise user imports automation from the RunWhen community, the original Author receives a royalty. This talk is for potential interested Authors.
The RunWhen Authors program is an opportunity for SRE/PE/DevOps engineers who enjoy writing high quality automation for root cause analysis and remediation, or who simply want to learn about how hyperscale SRE teams think about their automation programs. In 2024, the RunWhen Authors program is focused primarily on the Kubernetes ecosystem -- underlying cloud infrastructure, platform, popular open source components (read: the top 100 Helm charts) and common application troubleshooting patterns.
This talk will provide an overview of contributions to the library thus far, typical users of the library, how to spot a good potential contribution and the basics of structuring the automated steps in wrappers that the RunWhen platform can consume. We will go over some of the AI code authoring tools that the RunWhen team has written to make the process of creating high quality troubleshooting automation quickly. Several examples will be shown, along with links to learn more and pointers to hands-on workshops where attendees can write their first automated troubleshooting steps, publish them for the community and watch them executed by RunWhen AI Troubleshooting Assistants.
Kyle Forster is the Founder of RunWhen, a company pioneering the use of AI for troubleshooting mission critical applications in dev, test and production environments. Prior to RunWhen, Kyle was the Senior Director for Product Management in Google's Kubernetes team. He is a second time founder, having previously started Big Switch Networks, a pioneer in Software Defined Networking (acquired by Arista Networks). Kyle started his career at Cisco after doing an MBA and MS in Computer Science from Stanford University and an MSE in Electrical Engineering from Princeton University. He currently holds six patents in wireless and software defined networking technologies lives with his wife, three children and two dogs in the greater London area where he recently settled after relocating from Silicon Valley.