SREDAY

Site Reliability, DevOps and Cloud

March 27-28, 2025 London, UK

2
Days
50+
Speakers
6
Tracks
200
Attendees

Event Starts In:

Tickets

Schedule

Day 1

Keynote: SRE’s worst nightmare

SRE Author
What's the worst that can happen? Join me for a story about reliability, secrecy and potential 200k fatalities.

09:30

Coffee break

Main lobby


Bolstering Your Workload's Resilience via Chaos Engineering: AWS Insights from 2024

AWS
Fancy a peek into the crystal ball for 2025's resilience planning? Join us as we unpack the valuable lessons gleaned from Amazon's best practices and customers' experiences in 2024. We'll explore the Chaos Engineering mechanisms AWS has developed to fortify your workload's resilience. Ready to...

10:30

From Diapers to Delivery: Parenting Lessons for Effective Management

The Curious Coffee Club
Parenthood often arrives with little time to prepare. Our idea of 'good' parenting usually involves emulating others, whilst hoping we don’t do any permanent damage. Stepping into management is remarkably similar: we emulate others, but it never feels quite right. Fortunately, there’s a better way

11:00

Platform Engineering for Private Cloud

VMware / Pivotal
“Platform engineering” is the art of building and managing the infrastructure that powers your applications: a mix of cloud, a handful of DevOps, a pinch of SRE, and a thick glaze of product management. While it’s “nothing new,” many organizations are just starting to practice it—and for good...

11:30

Money-saving tips for the frugal serverless developer

Lumigo
Dive into the world of serverless and explore common, costly mistakes and learn actionable tips for cutting down waste and reducing your AWS bill. Whether you're looking to cut down on CloudWatch costs or improve cost-efficiency for your serverless application, we've got some helpful tips for you.

12:00

Lunch & networking

Main lobby


13:00

The Human API: Engineering Better Conversations

Steven Wade Consulting
What if the principles that make great software could transform the way we connect with people? Join Steve for an illuminating exploration of how technical expertise can unlock extraordinary human connections. Drawing from years of experience bridging the gap between technical brilliance and...

13:30

What a submarine commander taught me about effective software teams

Tech Momentum
My colleagues were right to recommend L. David Marquet's *"Turn the Ship Around"* as a good read for software team leaders. I encourage you to read it too, with my key takeaways and how they apply to software. --- #### 1) Fix the environment, not the people We've got a lot to learn from other...

14:00

Lessons Learned from Changing 3 Service Meshes in 7 Years

Avito
I'm a member of a platform team that managed to change 3 service mesh solutions during the past 7 years. We did it seamlessly for other 1,500 engineers that work at Avito; our solution manages over 3,000 microservices and > 3 mln RPS. I will share difficulties and lifehacks how we achieved that.

So You Want to Maintain a Reliable Event Driven System

Datadog
Building an event-driven system is the easy part. You build producers that produce messages and consumers that consume messages, and you leverage managed services as the message channels between your systems. But what does this mean for your operations? The things that keep your systems online,...

15:00

DevOps for the GenAI Age

LinearB
GenAI is disrupting how we write, review, accept and deliver code. DevOps practices must evolve to be able to keep up. There are new kinds of bottlenecks to open, new bends in the pipelines to navigate, and new technologies at our disposal Join us to learn how.

15:30

Scaling Analytics with ClickHouse in Cloud-Native Environments

Altinity
Learn strategies to optimize ClickHouse for massive data streams. Explore Kubernetes orchestration, query optimization, and real-world solutions for scalable, cost-efficient, and reliable analytics in modern systems.

16:00

09:30

Coffee break

Main lobby


Stateful Workloads Made Easy: A Practical Demo of Live Migration

CAST AI
Kubernetes works great for stateless applications, but stateful workloads like databases or long-running jobs pose a challenge. These applications rely on persistent data and can’t afford interruptions, making Kubernetes’ “ephemeral” approach risky. Downtime can lead to data loss,...

Building an Open-Source, DIY Private DBaaS with Dapr

EverythingDevOps
Tired of DBaaS lock-in? Divine explores Sovereign DBaaS, a model putting you back in control. Learn how to build your own private DBaaS with Dapr, overcoming compliance and licensing headaches. Gain insights from real-world feedback and design considerations.

Raw-dogging the Linux proc filesystem

Ebury
The goal of this talk is to show the source of information many tools use to display process information. I will go through the most interesting files in the /proc filesystem and show what information is there, along with standard tools for displaying this information. This comes in handy when...

11:30

The operator pattern is here to stay: Building a foundational cloud-native Streaming Platform

Dojo
This session delivers critical insights into leveraging Kubernetes Operators in the data space. I'll cover the Kubernetes operator pattern, KubeBuilder, data governance, how we built our connector ecosystem, and many more.

12:00

Lunch & networking

Main lobby


13:00

Building Smarter Kubernetes Workflows: Pepr for the Modern SRE

Defense Unicorns
Pepr simplifies Kubernetes operations by consolidating admission controllers and operators into one lightweight framework. Enforce global security postures, leverage a full-fledged programming language, and offload operational expertise into code. Pepr makes administering Kubernetes clusters easy!

13:30

Speeding Up CI Pipelines: Testing Kubernetes Apps with vCluster

Loft Labs
Accelerate your CI workflows with vCluster—create lightweight, on-demand Kubernetes clusters for faster testing and development, reducing build times and overhead while supporting CRDs for production-like environments.

Emulation, Contenerization and Virtualization - do you know the differences?

SpeakAura
Let’s journey back to the basics and explore the fascinating realms of emulation, virtualization, and containerization. Together, we’ll uncover how these three pillars revolutionized the technical landscape and continue to drive innovation today.

14:30

kubenetmon: How we built a tool to meter data transfer in ClickHouse Cloud

ClickHouse
In this talk, we are going to discuss how to go from zero to hero in understanding which of your workloads send how much data to each other. We'll talk about popular observability solutions, such as Cilium Hubble, Flow Logs, tools like Retina, and others, and where they fall short. We will then...

15:00

Cost-Effective Monitoring in Kubernetes

VictoriaMetrics
Discover cost-effective monitoring in Kubernetes! Learn to optimize expenses with practical strategies. We'll explore efficient resource utilization, smart data retention, and more, aimed at maximizing your monitoring investment. Join us to enhance your monitoring approach without breaking the bank!

15:30

09:30

Coffee break

Main lobby


10:00

The DevOps Organisation

RiverSafe
DevOps has many benefits for software eng, but is rarely talked about outside of that context. In this talk we’ll explore why DevOps is not a purely technical endeavour, what it means to apply DevOps across the whole organisation, and how you can use these ideas to deliver change where you work.

10:30

The Missing Chapter in the Platform Engineering Playbook

Atlas by Ariga
Databases power nearly every cloud-native application, yet they remain one of the most overlooked components in platform engineering. When LLM-powered co-pilots author schema changes alongside our devs, building proper platform guardrails is no longer optional - it's essential. This talk uncovers...

Defining Reliability through User Objectives

Tenable
In this talk, we’ll explore how we revolutionized our SLO practices by introducing User Objectives—customer-experience-focused metrics that transcend individual services. This approach transformed our SRE function from a traditional embedded model to a centralized Application SRE team, fostering...

11:30

Taking Machine Learning to production: Cloud MLOps for speed and efficiency

DoiT International
Taking Machine Learning to production: Cloud MLOps for speed and efficiency

12:00

Lunch & networking

Main lobby


Why you should own an internal platform for your External AI and SaaS Providers

IBM
Today there are thousands of AI and SaaS Services out there and are used throughout your businesses. This session will explain What, Why and How platform teams need to productise their external AI and SaaS Providers to provide flexibility and control of these external applications. These AI and...

Why we skipped SRE and switched to Platform Engineering?

Electrolux Group
We work in the IoT space at Electrolux Group, leader in Home Appliance industry, scaling from 10 to 300 developers with just 5 Ops engineers in 4 years. Along the way, we faced challenges in promoting SRE principles to development teams. This led us to transition from SRE to Platform Engineering....

No More Heroes: Why Team Composition is a BIG Deal

Pegasystems
This talk covers a topic that's universal across any team, company and industry that deals with technology - Team Composition. And with this talk, I bring relevant data and proven sources to the discussion to explain what the key concepts are, and why they matter so much on the outcomes delivered...

Search and Rescue: from mountain peaks to protocols

Rootly
Getting paged at 3 a.m. is tough—but imagine it’s because you need to rescue a lost hiker in a vast mountain range with freezing temperatures. Search and Rescue (SAR) operations depend on protocols, communication channels, defined team roles, and other concepts familiar to any SRE—except SAR...

DevSecOps in the Multi-Cloud Era: Securing Applications and Ensuring Compliance at Scale

iCrossing
The multi-cloud era brings unparalleled opportunities for agility, scalability, and redundancy. However, it also introduces unique security and compliance challenges as organizations navigate diverse cloud platforms. This session explores how DevSecOps serves as the foundation for addressing...

15:30

Day 2

09:00

Keynote: The Future of Observability: Trends, AI, and New Relic’s Vision for a Smarter Stack

New Relic
As cloud-native development accelerates, observability is no longer a nice-to-have, but a necessity. This session explores key trends shaping the observability space, including the role of AI in transforming monitoring practices, the rise of open standards like OpenTelemetry, and how platforms...

09:30

Coffee break

Main lobby


10:00

2h Workshop: Hands-on guide to monitor your API-driven AI/LLM applications

New Relic
In this workshop, we will focus on leveraging New Relic's AI Monitoring to confidently build and run AI applications. You'll learn how to achieve comprehensive observability across your stack to maintain peak performance, ensure compliance, promote quality, and observe costs. Through hands-on...

10:30

2h Workshop: Hands-on guide to monitor your API-driven AI/LLM applications

New Relic
In this workshop, we will focus on leveraging New Relic's AI Monitoring to confidently build and run AI applications. You'll learn how to achieve comprehensive observability across your stack to maintain peak performance, ensure compliance, promote quality, and observe costs. Through hands-on...

11:00

2h Workshop: Hands-on guide to monitor your API-driven AI/LLM applications

New Relic
In this workshop, we will focus on leveraging New Relic's AI Monitoring to confidently build and run AI applications. You'll learn how to achieve comprehensive observability across your stack to maintain peak performance, ensure compliance, promote quality, and observe costs. Through hands-on...

11:30

2h Workshop: Hands-on guide to monitor your API-driven AI/LLM applications

New Relic
In this workshop, we will focus on leveraging New Relic's AI Monitoring to confidently build and run AI applications. You'll learn how to achieve comprehensive observability across your stack to maintain peak performance, ensure compliance, promote quality, and observe costs. Through hands-on...

12:00

Lunch & networking

Main lobby


Serverless Security Flaws - A Noob's Guide to Hacking Serverless

SecurityWall
Serverless breaches expose dangerous missteps in securing function chains, IAM policies, and API gateways. We unravel serverless compromises to reveal the overlooked risks lurking in your infrastructureless apps. Arm yourself with actionable lessons to lock down your functions and avoid headlines.

Automating SRE Operations with Multi-Agent AI: InfraAssistant Approach

Electrolux Group
SRE teams often face challenges with a high volume of routine tasks and requests, making it difficult to focus on critical, high-priority issues. At Electrolux, we faced the same challenge, which led us to develop __InfraAssistat__ —an ***multi-agent AI-powered solution*** designed to automate...

14:00

Zero Trust: From Airports to Identity-Aware Proxies

Pomerium
Zero Trust doesn't have to be intimidating. Learn how Identity-Aware Proxies transform service access from perimeter-based to continuous verification, explained through the universal experience of airport security.

Incident Groundhog Day

Uptime Labs
Learning how to respond effectively to incidents is hard. One of the reasons is that we never see the same incident twice. While we can learn vital lessons during and after an incident, we can’t hop into a time machine, and apply these lessons to the same incident to discover their impact. What...

Don't Over-Engineer your Observability stack period

KubeCloud
In the cloud-native space, there is a plethora of tools available for observing Kubernetes applications & Infra. However, the choices often involve either opting for service meshes that increase architectural complexity or selecting tools with exorbitant costs. What if there was a one-stop...

15:30

09:30

Coffee break

Main lobby


10:00

It's Friday! CI/CD as an unfinished journey

ZenCity
It's Friday afternoon, and you've got plans for this evening. You've just finished the feature. you push to main, and click deploy. OR DO YOU? let's talk about Friday deployments and what they can teach us. We'll talk about being stuck in slow deployment cycles and how to break free from them....

How to Build Cloud Native Platforms with Kubernetes

Loft Labs
In this talk, I will explore how to build cloud-native platforms using Kubernetes. I will discuss creating self-service portals, leveraging programmatic APIs, and automating workflows to enhance productivity and reliability. We’ll cover best practices for infrastructure management, security...

Embracing the chaos: How Chaos Engineering could have saved Jurassic Park

Supercharged
What do dinosaurs and distributed systems have in common? Both are complex, unpredictable, and prone to catastrophic incidents without proper safeguards. In this talk, we’ll explore how the principles of Chaos Engineering could have prevented the incident at Jurassic Park. We’ll dissect critical...

A story of hundreds of thousands SLOs across the globe

Elastic
This talk dives into the technical architecture and operational strategies behind Elastic's global-scale SLO management system, designed to handle hundreds of thousands of SLOs across 60+ regions and major cloud providers. The system empowers development teams to define, monitor, and manage SLOs...

12:00

Lunch & networking

Main lobby


NebulOuS Meta Operating System for cloud continuum ops based on Kubernetes

7bulls.com
In this talk, I present a novel, meta-operating system approach to the cloud continuum - showcasing the NebulOuS project vision and the first results that enable cloud continuum ops. NebulOuS accomplishes substantial research contributions in the realms of cloud continuum brokerage by introducing...

Taming Noisy Neighbors: Accelerating Response Times With Memory Performance Isolation

Unvariance
We think of containers as providing isolation for our applications, however a major source of performance interference remains unaddressed, significantly degrading performance. Contention for CPU caches and memory bandwidth has been shown to increase tail response times by 4-13x and reduce...

From Spot Ocean to Karpenter - Zero Downtime Migration

adjoe
From Spot Ocean to Karpenter: adjoe's zero-downtime migration story. Learn how we switched autoscalers in production, the challenges we faced along the way, and why we built a custom controller to fix broken nodes.

Mastering Kubernetes Cost Optimization for Sustainable Cloud Operations

Randoli
Cloud-native platforms like Kubernetes offer unparalleled flexibility and scalability, but they often come with a hidden price tag. Without intentional cost management, organizations risk overspending due to inefficient resource utilization, over-provisioning, and lack of visibility into cloud...

Business-Driven Monitoring: An SRE’s Secret Weapon

Vettabase
In this talk, I’ll share how focusing on business metrics, not just technical ones, can transform Site Reliability Engineering. By tracking business-centric metrics, we identified issues early and resolved them before they significantly impacted users or revenue. Real-World Cases from Experience...

SRE in Gaming Tech: Handling Millions of Real-Time Requests

Baazi Games
In the gaming industry, every millisecond matters. When managing a real-time, high-traffic platform like PokerBaazi (India's biggest online poker platform), where millions of users are simultaneously playing, the need for Site Reliability Engineering (SRE) becomes critical. Achieving seamless...

16:00

09:30

Coffee break

Main lobby


Observability is not just for Backend!

Elastic
Observability is the ability to measure the current state of a system. Backend engineers are becoming more familiar with the primary signals and technologies, such as OpenTelemetry that can be used to instrument applications and diagnose issues. Yet, in the frontend world, we're behind the curve....

Evolving Shift Left: Integrating Observability into Modern Software Development

Coralogix
The concept of “Shift Left” has long guided developers to address issues early in the software development lifecycle (SDLC), catching bugs before they reach production. But as modern software ecosystems become more complex—with microservices, serverless architectures, and global...

When Platform Engineers meet SREs: The Birth of Observability-as-a-Service Superpowers

Chronosphere & Mia-Platform
Monitoring the behavior of a system is essential to ensuring its long-term effectiveness. However, managing an end-to-end observability stack can feel like stepping into quicksand, without a clear plan you’re risking sinking deeper into system complexities. In this talk, we’ll explore how...

How to tame chaos effectively?

Pegasystems
Imagine a self-healing system that handles surprises, letting you sleep peacefully. If that sounds appealing, chaos engineering could be the answer. Trusted by Netflix, LinkedIn, Google, and Facebook, it's key for business resilience. In this session, we'll explore its history, learn how to apply...

12:00

Lunch & networking

Main lobby


Plan for Unplanned Work: Game Days with Chaos Engineering

PagerDuty
How do you plan for unplanned incidents? You practice with Chaos Engineering. Strong incident response doesn't just happen, you have to build the skills and train your team. Practicing for major incidents gives your team insight into how your applications will behave when something goes wrong as...

AIRE: AI Reliability Engineering. Bringing SRE to AI

GfK - An NIQ Company
AI products are becoming critical for businesses to maintain a competitive edge, yet integrating them into an organization’s ecosystem brings unique challenges. Ensuring the reliability, security, and alignment of AI systems with business goals and ethical standards demands new approaches and...

The Human Side of the Cloud: Why Soft Skills Are the Key to Success

Teesside University London
In a world increasingly defined by complex technology and rapid innovation, it is easy to focus entirely on the technical aspects of success. Yet, the most advanced cloud infrastructure, the most cutting-edge tools, and the most sophisticated algorithms are only as effective as the people behind...

Behaviour-Driven Automation and Commerce as Code

Saleor Commerce
Behaviour-driven automation builds on practices like configuration as code and infrastructure as code, where entire systems—hosts, services, and resources—are defined in code rather than configured manually. This makes deployments consistent, testable, and reproducible. However, traditional code-...

15:00

Scaling Community: The K8SUG Story

K8SUG
What does it really take to build a global cloud-native community from the ground up — with no funding, no big-name backing, and no playbook? In this talk, I’ll share the journey of growing K8SUG (Kubernetes & Cloud Native User Group) from a small local meetup into a global movement, now spanning...

15:30

Speakers

Agnieszka Welian
Pegasystems
Read more →
Aivars Kalvans
Ebury
Read more →
Alina Astapovich & Markus Makela
Electrolux Group
Read more →
Alon Nisser
ZenCity
Read more →
Babar Khan Akhunzada & Muhammad Khizer Javed
SecurityWall
Read more →
Carly Richmond
Elastic
Read more →
Casey Wylie
Defense Unicorns
Read more →
Chris Phillips & Simon Kapadia
IBM
Read more →
Daniel Afonso
PagerDuty
Read more →
Denys Vasyliev
GfK - An NIQ Company
Read more →
Diego Nieto
Altinity
Read more →
Divine Odazie
EverythingDevOps
Read more →
Dmytri Kleiner
Saleor Commerce
Read more →
Elad Leev
Dojo
Read more →
Eric D. Schabell & Graziano Casto
Chronosphere & Mia-Platform
Read more →
Harry Kimpel
New Relic
Read more →
Hrittik Roy
Loft Labs
Read more →
Igor Baliuk
Avito
Read more →
Ilya Andreev
ClickHouse
Read more →
James Eastham
Datadog
Read more →
Jonathan Perry
Unvariance
Read more →
Jorge Lainfiesta
Rootly
Read more →
Joshua Fox
DoiT International
Read more →
Kristina Kondrashevich & Gang Luo
Electrolux Group
Read more →
Laura Thomson & Vladislav Nedosekin
AWS
Read more →
Marius Kimmina
adjoe
Read more →
Mark Faiers
RiverSafe
Read more →
Martin McLarnon
Coralogix
Read more →
Mateusz Solnica
SpeakAura
Read more →
Max Golionko
VictoriaMetrics
Read more →
Michael Cote
VMware / Pivotal
Read more →
Miko Pawlikowski
SRE Author
Read more →
Mykhaylo Rykmas
Vettabase
Read more →
Nick Taylor
Pomerium
Read more →
Panagiotis Moustafellos
Elastic
Read more →
Pawel Hajdan
Tech Momentum
Read more →
Pawel Skrzypek
7bulls.com
Read more →
Pedro Ivo Raimundo
Pegasystems
Read more →
Piotr Zaniewski
Loft Labs
Read more →
Prerit Munjal
KubeCloud
Read more →
Rajith Muditha Attapattu
Randoli
Read more →
Rob Charlwood
Supercharged
Read more →
Rotem Tamir
Atlas by Ariga
Read more →
Siddharth Vijay
Baazi Games
Read more →
Simon Copsey
The Curious Coffee Club
Read more →
Steve Wade
Steven Wade Consulting
Read more →
Stuart Rimell
Uptime Labs
Read more →
Tomasz Czajka, Ciaran Gaffney & Pascal Schlumpf
Tenable
Read more →
Vamsi Anumolu
iCrossing
Read more →
Victor Onyenagubom
Teesside University London
Read more →
Vladimir Klevko
CAST AI
Read more →
Yan Cui
Lumigo
Read more →
Yishai Beeri
LinearB
Read more →
Yongkang HE
K8SUG
Read more →

Venue

Everyman Canary Wharf

Crossrail Place,
Canary Wharf,
E14 5AR, London, UK
Level -2

Tube access
Jubilee, Elizabeth and DLR lines: Canary Wharf station

Sponsors & Partners

Want to become a sponsor? Get in touch!