SREDAY

Site Reliability, DevOps and Cloud

March 27-28, 2025 London, UK

2
Days
50+
Speakers
6
Tracks
200
Attendees

Event Starts In:

Tickets

Schedule

Day 1

Keynote: The Evolution of Ops

SRE Author
Let's explore how the Ops landscape has evolved in recent years and examine key trends shaping its future.

09:30

Coffee break

Main lobby


What Can We Learn from Formula 1 Incident Management

FanDuel / Blip.pt
July 18th 2020, Max Verstappen qualifies 7th for the Hungarian Grand Prix. With Red Bull fighting Mercedes for the Constructor’s championship and Max fighting Lewis Hamilton and Vallteri Bottas for the Driver’s Championship, this wasn’t his best qualifying session. July 19th 2020, during the...

Bolstering Your Workload's Resilience via Chaos Engineering: AWS Insights from 2024

AWS
Fancy a peek into the crystal ball for 2025's resilience planning? Join us as we unpack the valuable lessons gleaned from Amazon's best practices and customers' experiences in 2024. We'll explore the Chaos Engineering mechanisms AWS has developed to fortify your workload's resilience. Ready to...

11:00

From Diapers to Delivery: Parenting Lessons for Effective Management

The Curious Coffee Club
Parenthood often arrives with little time to prepare. Our idea of 'good' parenting usually involves emulating others, whilst hoping we don’t do any permanent damage. Stepping into management is remarkably similar: we emulate others, but it never feels quite right. Fortunately, there’s a better way

11:30

Platform Engineering for Private Cloud

VMware / Pivotal
“Platform engineering” is the art of building and managing the infrastructure that powers your applications: a mix of cloud, a handful of DevOps, a pinch of SRE, and a thick glaze of product management. While it’s “nothing new,” many organizations are just starting to practice it—and for good...

12:00

Lunch & networking

Main lobby


13:00

Money-saving tips for the frugal serverless developer

Lumigo
Dive into the world of serverless and explore common, costly mistakes and learn actionable tips for cutting down waste and reducing your AWS bill. Whether you're looking to cut down on CloudWatch costs or improve cost-efficiency for your serverless application, we've got some helpful tips for you.

13:30

The Human API: Engineering Better Conversations

Steven Wade Consulting
What if the principles that make great software could transform the way we connect with people? Join Steve for an illuminating exploration of how technical expertise can unlock extraordinary human connections. Drawing from years of experience bridging the gap between technical brilliance and...

14:00

What a submarine commander taught me about effective software teams

Tech Momentum
My colleagues were right to recommend L. David Marquet's *"Turn the Ship Around"* as a good read for software team leaders. I encourage you to read it too, with my key takeaways and how they apply to software. --- #### 1) Fix the environment, not the people We've got a lot to learn from other...

14:30

Lessons Learned from Changing 3 Service Meshes in 7 Years

Avito
I'm a member of a platform team that managed to change 3 service mesh solutions during the past 7 years. We did it seamlessly for other 1,500 engineers that work at Avito; our solution manages over 3,000 microservices and > 3 mln RPS. I will share difficulties and lifehacks how we achieved that.

15:00

09:30

Coffee break

Main lobby


Why you should own an internal platform for your External AI and SaaS Providers

IBM
Today there are thousands of AI and SaaS Services out there and are used throughout your businesses. This session will explain What, Why and How platform teams need to productise their external AI and SaaS Providers to provide flexibility and control of these external applications. These AI and...

Building an Open-Source, DIY Private DBaaS with Dapr

Severalnines
Tired of DBaaS lock-in? Divine explores Sovereign DBaaS, a model putting you back in control. Learn how to build your own private DBaaS with Dapr, overcoming compliance and licensing headaches. Gain insights from real-world feedback and design considerations.

Raw-dogging the Linux proc filesystem

Ebury
The goal of this talk is to show the source of information many tools use to display process information. I will go through the most interesting files in the /proc filesystem and show what information is there, along with standard tools for displaying this information. This comes in handy when...

11:30

The operator pattern is here to stay: Building a foundational cloud-native Streaming Platform

Dojo
This session delivers critical insights into leveraging Kubernetes Operators in the data space. I'll cover the Kubernetes operator pattern, KubeBuilder, data governance, how we built our connector ecosystem, and many more.

12:00

Lunch & networking

Main lobby


13:00

Building Smarter Kubernetes Workflows: Pepr for the Modern SRE

Defense Unicorns
Pepr simplifies Kubernetes operations by consolidating admission controllers and operators into one lightweight framework. Enforce global security postures, leverage a full-fledged programming language, and offload operational expertise into code. Pepr makes administering Kubernetes clusters easy!

13:30

Speeding Up CI Pipelines: Testing Kubernetes Apps with vCluster

Loft Labs
Accelerate your CI workflows with vCluster—create lightweight, on-demand Kubernetes clusters for faster testing and development, reducing build times and overhead while supporting CRDs for production-like environments.

Emulation, Contenerization and Virtualization - do you know the differences?

SpeakAura
Let’s journey back to the basics and explore the fascinating realms of emulation, virtualization, and containerization. Together, we’ll uncover how these three pillars revolutionized the technical landscape and continue to drive innovation today.

15:00

09:30

Coffee break

Main lobby


10:00

The DevOps Organisation

RiverSafe
DevOps has many benefits for software eng, but is rarely talked about outside of that context. In this talk we’ll explore why DevOps is not a purely technical endeavour, what it means to apply DevOps across the whole organisation, and how you can use these ideas to deliver change where you work.

10:30

The Missing Chapter in the Platform Engineering Playbook

Ariga
Databases power nearly every cloud-native application, yet they remain one of the most overlooked components in platform engineering. While advancements in Operator development have made managing stateful applications on Kubernetes more feasible, database schema management still lags behind. This...

Defining Reliability through User Objectives

Tenable
In this talk, we’ll explore how we revolutionized our SLO practices by introducing User Objectives—customer-experience-focused metrics that transcend individual services. This approach transformed our SRE function from a traditional embedded model to a centralized Application SRE team, fostering...

Why we skipped SRE and switched to Platform Engineering?

Electrolux Group
We work in the IoT space at Electrolux Group, leader in Home Appliance industry, scaling from 10 to 300 developers with just 5 Ops engineers in 4 years. Along the way, we faced challenges in promoting SRE principles to development teams. This led us to transition from SRE to Platform Engineering....

12:00

Lunch & networking

Main lobby


Designing, Building and Launching a Hyper Scaling DevOps Platform

Triform
What does it take to design and launch a DevOps platform capable of running alien code—written by both users and AI—fast, securely, and at scale? This talk dives deep into the journey of creating Triform, a platform redefining DevOps for a new era of AI-driven development. You’ll learn how we: -...

The Human Side of the Cloud: Why Soft Skills Are the Key to Success

Teesside University London
In a world increasingly defined by complex technology and rapid innovation, it is easy to focus entirely on the technical aspects of success. Yet, the most advanced cloud infrastructure, the most cutting-edge tools, and the most sophisticated algorithms are only as effective as the people behind...

No More Heroes: Why Team Composition is a BIG Deal

Pegasystems
This talk covers a topic that's universal across any team, company and industry that deals with technology - Team Composition. And with this talk, I bring relevant data and proven sources to the discussion to explain what the key concepts are, and why they matter so much on the outcomes delivered...

14:30

Day 2

09:00

Keynote: The Future of Observability: Trends, AI, and New Relic’s Vision for a Smarter Stack

New Relic
As cloud-native development accelerates, observability is no longer a nice-to-have, but a necessity. This session explores key trends shaping the observability space, including the role of AI in transforming monitoring practices, the rise of open standards like OpenTelemetry, and how platforms...

09:30

Coffee break

Main lobby


Observability is not just for Backend!

Elastic
Observability is the ability to measure the current state of a system. Backend engineers are becoming more familiar with the primary signals and technologies, such as OpenTelemetry that can be used to instrument applications and diagnose issues. Yet, in the frontend world, we're behind the curve....

Stateful Workloads Made Easy: A Practical Demo of Live Migration

CAST AI
Kubernetes works great for stateless applications, but stateful workloads like databases or long- running jobs pose a challenge. These applications rely on persistent data and can’t afford interruptions, making Kubernetes’ “ephemeral” approach risky. Downtime can lead to data loss,...

Evolving Shift Left: Integrating Observability into Modern Software Development

Coralogix
The concept of “Shift Left” has long guided developers to address issues early in the software development lifecycle (SDLC), catching bugs before they reach production. But as modern software ecosystems become more complex—with microservices, serverless architectures, and global...

Don't Over-Engineer your Observability stack period

KubeCloud
In the cloud-native space, there is a plethora of tools available for observing Kubernetes applications & Infra. However, the choices often involve either opting for service meshes that increase architectural complexity or selecting tools with exorbitant costs. What if there was a one-stop...

12:00

Lunch & networking

Main lobby


When Platform Engineers meet SREs: The Birth of Observability-as-a-Service Superpowers

Chronosphere & Mia-Platform
Monitoring the behavior of a system is essential to ensuring its long-term effectiveness. However, managing an end-to-end observability stack can feel like stepping into quicksand, without a clear plan you’re risking sinking deeper into system complexities. In this talk, we’ll explore how...

13:30

Real-time earthquake alert system: Leveraging Serverless architecture with Confluent Kafka

DataIceberg
In our upcoming presentation, we'll explore a cutting-edge architectural solution for real-time SMS and email notifications, particularly geared towards responding to earthquake events. This system is designed to handle rapid data transmission, listening for event changes every second, making it...

Business-Driven Monitoring: An SRE’s Secret Weapon

Vettabase
In this talk, I’ll share how focusing on business metrics, not just technical ones, can transform Site Reliability Engineering. By tracking business-centric metrics, we identified issues early and resolved them before they significantly impacted users or revenue. Real-World Cases from Experience...

How to tame chaos effectively?

Pegasystems
Imagine a self-healing system that handles surprises, letting you sleep peacefully. If that sounds appealing, chaos engineering could be the answer. Trusted by Netflix, LinkedIn, Google, and Facebook, it's key for business resilience. In this session, we'll explore its history, learn how to apply...

15:00

09:30

Coffee break

Main lobby


10:00

It's Friday! CI/CD as an unfinished journey

ZenCity
I'll start from a show of hands with some questions, move to challenge the common premise about Friday production deployment. explain the down spiral of slow deployment and WIP . and then move to discuss how we can fix it, and yes. it's an unfinished journey Expect some mentions of the seminal...

How to Build Cloud Native Platforms with Kubernetes

Loft Labs
In this talk, I will explore how to build cloud-native platforms using Kubernetes. I will discuss creating self-service portals, leveraging programmatic APIs, and automating workflows to enhance productivity and reliability. We’ll cover best practices for infrastructure management, security...

Embracing the chaos: How Chaos Engineering could have saved Jurassic Park

Supercharged
What do dinosaurs and distributed systems have in common? Both are complex, unpredictable, and prone to catastrophic incidents without proper safeguards. In this talk, we’ll explore how the principles of Chaos Engineering could have prevented the incident at Jurassic Park. We’ll dissect critical...

Incident Groundhog Day

Uptime Labs
Learning how to respond effectively to incidents is hard. One of the reasons is that we never see the same incident twice. While we can learn vital lessons during and after an incident, we can’t hop into a time machine, and apply these lessons to the same incident to discover their impact. What...

12:00

Lunch & networking

Main lobby


NebulOuS Meta Operating System for cloud continuum ops based on Kubernetes

7bulls.com
In this talk, I present a novel, meta-operating system approach to the cloud continuum - showcasing the NebulOuS project vision and the first results that enable cloud continuum ops. NebulOuS accomplishes substantial research contributions in the realms of cloud continuum brokerage by introducing...

Taming Noisy Neighbors: Accelerating Response Times With Memory Performance Isolation

Unvariance
We think of containers as providing isolation for our applications, however a major source of performance interference remains unaddressed, significantly degrading performance. Contention for CPU caches and memory bandwidth has been shown to increase tail response times by 4-13x and reduce...

From Spot Ocean to Karpenter - Zero Downtime Migration

adjoe
From Spot Ocean to Karpenter: adjoe's zero-downtime migration story. Learn how we switched autoscalers in production, the challenges we faced along the way, and why we built a custom controller to fix broken nodes.

14:30

Understanding Zero Trust: From Physical Security to Identity-Aware Proxies

Pomerium
Zero Trust doesn't have to be intimidating. Learn how Identity-Aware Proxies transform service access from perimeter-based to continuous verification, explained through the universal experience of airport security.

15:00

09:30

Coffee break

Main lobby


DevSecOps in the Multi-Cloud Era: Securing Applications and Ensuring Compliance at Scale

iCrossing
The multi-cloud era brings unparalleled opportunities for agility, scalability, and redundancy. However, it also introduces unique security and compliance challenges as organizations navigate diverse cloud platforms. This session explores how DevSecOps serves as the foundation for addressing...

Serverless Security Flaws - A Noob's Guide to Hacking Serverless

SecurityWall
Serverless breaches expose dangerous missteps in securing function chains, IAM policies, and API gateways. We unravel serverless compromises to reveal the overlooked risks lurking in your infrastructureless apps. Arm yourself with actionable lessons to lock down your functions and avoid headlines.

Automating SRE Operations with Multi-Agent AI: InfraAssistant Approach

Electrolux Group
SRE teams often face challenges with a high volume of routine tasks and requests, making it difficult to focus on critical, high-priority issues. At Electrolux, we faced the same challenge, which led us to develop __InfraAssistat__ —an ***multi-agent AI-powered solution*** designed to automate...

11:30

DevOps for the GenAI Age

LinearB
GenAI is disrupting how we write, review, accept and deliver code. DevOps practices must evolve to be able to keep up. There are new kinds of bottlenecks to open, new bends in the pipelines to navigate, and new technologies at our disposal Join us to learn how.

12:00

Lunch & networking

Main lobby


AIRE: AI Reliability Engineering. Bringing SRE to AI

GfK - An NIQ Company
AI products are becoming critical for businesses to maintain a competitive edge, yet integrating them into an organization’s ecosystem brings unique challenges. Ensuring the reliability, security, and alignment of AI systems with business goals and ethical standards demands new approaches and...

13:30

Taking Machine Learning to production: Cloud MLOps for speed and efficiency

DoiT International
Taking Machine Learning to production: Cloud MLOps for speed and efficiency

14:00

Workshop: Hands-on guide to monitor your API-driven AI/LLM applications

New Relic
In this workshop, we will focus on leveraging New Relic's AI Monitoring to confidently build and run AI applications. You'll learn how to achieve comprehensive observability across your stack to maintain peak performance, ensure compliance, promote quality, and observe costs. Through hands-on...

14:30

Speakers

Agnieszka Welian
Pegasystems
Read more →
Aivars Kalvans
Ebury
Read more →
Alina Astapovich & Markus Makela
Electrolux Group
Read more →
Alon Nisser
ZenCity
Read more →
Babar Khan Akhunzada & Muhammad Khizer Javed
SecurityWall
Read more →
Carly Richmond
Elastic
Read more →
Casey Wylie
Defense Unicorns
Read more →
Chris Phillips
IBM
Read more →
Denys Vasyliev
GfK - An NIQ Company
Read more →
Divine Odazie
Severalnines
Read more →
Elad Leev
Dojo
Read more →
Eric D. Schabell & Graziano Casto
Chronosphere & Mia-Platform
Read more →
Harry Kimpel
New Relic
Read more →
Harry Kimpel
New Relic
Read more →
Hrittik Roy
Loft Labs
Read more →
Iggy Gullstrand
Triform
Read more →
Igor Baliuk
Avito
Read more →
James Eastham
Datadog
Read more →
Jonathan Perry
Unvariance
Read more →
Joshua Fox
DoiT International
Read more →
Kristina Kondrashevich & Gang Luo
Electrolux Group
Read more →
Laura Thomson & Vladislav Nedosekin
AWS
Read more →
Marius Kimmina
adjoe
Read more →
Mark Faiers
RiverSafe
Read more →
Martin McLarnon
Coralogix
Read more →
Mateusz Solnica
SpeakAura
Read more →
Michael Cote
VMware / Pivotal
Read more →
Miko Pawlikowski
SRE Author
Read more →
Mykhaylo Rykmas
Vettabase
Read more →
Nick Taylor
Pomerium
Read more →
Pawel Hajdan
Tech Momentum
Read more →
Pawel Skrzypek
7bulls.com
Read more →
Pedro Ivo Raimundo
Pegasystems
Read more →
Piotr Zaniewski
Loft Labs
Read more →
Prerit Munjal
KubeCloud
Read more →
Ricardo Castro
FanDuel / Blip.pt
Read more →
Rob Charlwood
Supercharged
Read more →
Rotem Tamir
Ariga
Read more →
Simon Copsey
The Curious Coffee Club
Read more →
Steve Wade
Steven Wade Consulting
Read more →
Stuart Rimell
Uptime Labs
Read more →
Tomasz Czajka, Ciaran Gaffney & Pascal Schlumpf
Tenable
Read more →
Vamsi Anumolu
iCrossing
Read more →
Victor Onyenagubom
Teesside University London
Read more →
Vlad Onetiu
DataIceberg
Read more →
Vladimir Klevko
CAST AI
Read more →
Yan Cui
Lumigo
Read more →
Yishai Beeri
LinearB
Read more →

Venue

Everyman Canary Wharf

Crossrail Place,
Canary Wharf,
E14 5AR, London, UK
Level -2

Tube access
Jubilee, Elizabeth and DLR lines: Canary Wharf station

Sponsors & Partners

Want to become a sponsor? Get in touch!