SREday

Schedule

Day 1

09:30

Daniele Tonella

Keynote - TBD

ING

Coming soon...

10:00

Peter Marshall

Keynote: What Observability Can Learn From BI: Decoupling for Speed, Scale, and Flexibility

Imply

Today’s observability platforms are often vertically integrated—binding data storage, query, and visualization layers into a single stack. This tight coupling drives up costs, makes integrations painful, and slows teams down. But it doesn’t have to be this way. In this talk, we’ll explore how SRE...

10:30

Tasmia Niazi & Roheel I

Panel Discussion - SRE practices in Banking

Lloyds Baking Group

11:00

Coffee break

Main lobby

11:30

Alayshia Knighten

Autonomy with Assurance for Reliability

SREs are on the frontlines of uptime, performance, cost efficiency, and incident response. So, at times, policies for security and compliance often live in stale docs enforced inconsistently, if at all, until something breaks or someone has an audit. Policy as Code (PaC) replaces that mess with...

12:00

Diana Todea

Cutting Through Metrics Cardinality Noise with VictoriaMetrics

VictoriaMetrics

In high-scale environments, metrics cardinality isn’t just a resource concern, it’s an architectural challenge. Left unchecked, it can impact performance, query latency, and even system stability. This talk takes a deep technical dive into how VictoriaMetrics enables advanced observability...

12:30

Alessandro Vozza

The USB-c for your Copilot: Securing MCP Servers with API Management

Microsoft

The Model Context Protocol (MCP) has seen a meteoric rise to become the de-facto connecting mycelium for GenAI models and applications. In this session, we will explore the critical role of API management in securing MCP servers. Learn how to leverage API management as the "USB-c" connector for...

13:00

Michael Cote

Platform Engineering and AI - Two Buzzwords Finally Meet!

Tanzu

Two Buzzwords Finally Meet! How should we manage AI in large organizations? What are the "services" developers need to add AI to enterprise apps, and what role do platform engineers take? Where do data scientists fit in? How does MCP, A2A, and whatever the latest AI API is fit in? There are so...

13:30

Lunch & networking

Main lobby

14:30

Kevin van der Vlist

From Incident Response to Preventive Mitigation: Leveraging CodeQL and LLMs at Scale

ING

After recovering from incidents, we must ensure that effective mitigation measures are in place to prevent similar issues in the future. However, clearly expressing the characteristics of past problems and identifying similar occurrences across large codebases is a significant challenge,...

15:00

Yishai Beeri

Your AI Code Reviews Are Missing the Point (And How to Fix It)

LinearB

Most AI code review implementations focus on the wrong metrics—counting comments generated or code accepted rather than measuring developer velocity and code quality improvements. The real value lies in intelligent context integration and organizational learning at scale. This talk examines...

15:30

Marius Kimmina

From Spot Ocean to Karpenter - One Year Later

adjoe

Over One year of operating Karpenter in production across multiple Kubernetes Clusters has taught us at adjoe many valuable lessons that we want to share in this talk. We will talk about how we handled the migration process, why we build a custom controller to handle broken nodes as well as how...

16:00

Leo Visser

Secure your cloud automation

OGD ict-diensten

So you decided to automate tasks in your Azure cloud platform? Are you sure this automation is secured properly and not causing any more attack surfaces for malicious actors? In this talk I will cover different ways of automating your cloud platform and risks involved with these techniques. I...

16:30

Networking & sponsor crawl

Main lobby

17:00

Yan Cui

Patterns and Practices for Building Resilient AWS Serverless Applications

Lumigo

Lambda provides multi-AZ support out of the box, but even then, things can still go wrong in production. Region-wide outages and performance degradations can render your applications non-responsive. And what if you're dealing with downstream systems that aren't as scalable as your system and...

17:30

Wrap up

Scan each other's QR codes & head to a nearby pub!

11:00

Coffee break

Main lobby

11:30

Marcel Koert

It’s Not the Tools — It’s Us: How Human Biases Undermine Reliability

MeloMar IT

In the world of DevOps and SRE, we pride ourselves on automation, observability, and engineering excellence. But even the most sophisticated infrastructure can be derailed by something far more human: our own brains. This talk explores the invisible enemies of reliability — cognitive biases that...

12:00

Joris Bonnefoy

From experimentation to continuous verification: how to benefit from the entire spectrum of Chaos Engineering

Datadog

Chaos Engineering is often misunderstood as simply “breaking things on purpose.” This talk challenges that perception and repositions Chaos Engineering as a critical pillar of reliability and resilience engineering. Rather than focusing on failure injection alone, we explore how to leverage...

12:30

Maxim Schepelin

How to set SLOs, drive improvements, and make friends with business stakeholders

Booking.com

We, tech people, have internalized the concept of reliability so deeply that we don't need an explanation for why it's bad to have services failing in production. It doesn't matter what your software is doing — whether it controls train schedules, allows people to make money transfers, or serves...

13:00

Daniel Afonso

Full Service Ownership & The Lifecycle of a Service

PagerDuty

Services are the backbone of our systems. They are the pieces that make up our businesses—whether they are literal microservices or functional components of a traditional application, we can’t do the computer thing without services. When it comes to a service in your company or organization,...

13:30

Lunch & networking

Main lobby

14:30

Lukasz Groszkowski

SRE at Scale: Keeping e‑Commerce Alive Across 20+ Countries

Inter Cars

A deep dive into how we ensure reliability and availability in a complex distributed environment, with lessons learned from real-world incidents and rollouts.

15:00

Renato Losio

Nobody Ever Got Fired for Implementing Multi-AZ

Funambol

Using multiple Availability Zones (AZs) is often seen as essential for building resilient and highly available cloud systems. This is true, until it is not. While Multi-AZ is a proven architectural choice, there are important drawbacks to consider and common assumptions that don’t always hold up....

15:30

Praveen Kottarathil

Coming soon...

ING

16:00

Mert Polat & Batuhan Apaydin

The Invisible Layer: Securing Kubernetes with eBPF, Cilium, and Tetragon

Sufle

Kubernetes is dynamic—and so are its threats. Traditional security tools struggle to keep up. In this session, we’ll explore how eBPF powers next-gen observability and security by running directly in the Linux kernel. Using Cilium for networking and Tetragon for runtime security, we’ll trace...

Site Reliability, DevOps and Cloud

November 7, 2025 ING Cedar, Amsterdam, Netherlands

Event Starts In:

Tickets

Schedule

Day 1

09:30

10:00

10:30

11:00

11:30

12:00

12:30

13:00

13:30

14:30

15:00

15:30

16:00

16:30

17:00

17:30

11:00

11:30

12:00

12:30

13:00

13:30

14:30

15:00

15:30

16:00

16:30

17:00

Speakers

Alayshia Knighten

Alessandro Vozza

Daniel Afonso

Daniele Tonella

Diana Todea

Joris Bonnefoy

Kevin van der Vlist

Leo Visser

Lukasz Groszkowski

Marcel Koert

Marius Kimmina

Maxim Schepelin

Mert Polat & Batuhan Apaydin

Michael Cote

Peter Marshall

Praveen Kottarathil

Renato Losio

Tasmia Niazi & Roheel I

Yan Cui

Yishai Beeri

Venue

ING Cedar - Hosting Sponsor

Sponsors & Partners