Modern software is all but delivering moments of delight to customers. Moments that are constantly competing for users attention , reliability is not just a luxury—it’s a necessity. But what does reliability truly mean for stakeholders, and how do they ensure it in our software systems? This talk explores the evolution of reliability from the perspective of someone who has navigated the trenches of shipping reliable products, managing large SRE teams, and building an organization dedicated to SRE platforms.
Reliability in software is a multifaceted journey that begins with understanding when and why it becomes critical. This talk will address key milestones in this journey:
- When Does Software Require Reliability?
- Understanding the tipping points where reliability moves from being an optional feature to a core necessity.
- Identifying the stakeholders who benefit the most from reliable systems.
- Investing in Observability: The Next Step in Reliability
- Exploring the reasons why reliability demands investment in observability.
- Discussing the core components of observability: logs, metrics, and traces.
- Real-world examples of how observability transforms reliability.
- From Observability to Controllability: Bridging the Gap
- Defining controllability and its significance in maintaining reliable systems.
- How observability enables controllability and proactive management.
- Practitioner Needs vs. Jargon Overload
- Cutting through the noise: Focusing on the practical needs of practitioners rather than getting lost in the technical jargon.
- Sharing insights and lessons learned from real-world experiences in SRE and product management.
- Building a Reliable Organization
- Strategies for fostering a culture of reliability within teams.
- Tools and practices that support reliable software delivery.
Everyone talks about how to do SRE. Here, we talk about "why to do SRE at all"