SREday

Schedule

Day 1

main room day1

09:00

Connal Murphy

Panel Discussion - SRE Practices in Platforms

Criteo

09:30

Matthieu Blumberg

Keynote: The Infrastructure Renaissance: Security, Sustainability, and Scale

Criteo

As infrastructure evolves from cost center to innovation engine, the stakes have never been higher. This keynote will unpack the forces reshaping the infrastructure landscape—from the rise of sovereign clouds and liquid cooling to the imperative of secure-by-default systems. Drawing on lessons...

10:00

Poone Mokari

Keynote: Listen to Production the Way It Deserves

Ewake.ai

Production today is messy. There’s noise, complexity, and a constant stream of change. And while we’ve come a long way with observability, it still leans heavily on human foresight. Logs, metrics, alerts, they’re all things we had to think of ahead of time. But when we don’t? That’s where blind...

10:30

Coffee break

Main lobby

11:00

Alayshia Knighten

Autonomy with Assurance for Reliability

Mondoo

SREs are on the frontlines of uptime, performance, cost efficiency, and incident response. So, at times, policies for security and compliance often live in stale docs enforced inconsistently, if at all, until something breaks or someone has an audit. Policy as Code (PaC) replaces that mess with...

11:30

Annie Talvasto

I’ll Be Backoff: Benchmarking AI-Powered Platform Engineering Tools

Waovo

AI is making its way into platform engineering—not just as a workload, but as a smart automation layer for how platforms are built, operated, and optimized. Promises of intelligent autoscaling, self-tuning systems, and AI-assisted remediation are everywhere. But how do these claims hold up in...

12:00

Juliano Costa & Yuri Oliveira Sa

Crash-Proofing Your OpenTelemetry Collector

Datadog & OllyGarden

When planning observability for a distributed system, it's common to avoid having each microservice sending telemetry data directly to the backend. Instead, a Collector is typically deployed per host or node to receive, process, and forward telemetry data. This approach improves bandwidth usage...

12:30

Tasmia Niazi

Google Cloud SRE - AI in incident prevention, automation, and anomaly detection. Migrations to Cloud (GCP, Dynatrace, ServiceNow, Monaco YAML, etc.)

Lloyds banking group

Resilience, trust, and scale as systems move to cloud & AI reshapes how we build and operate software. SRE teams use AI to stop incidents. Google Cloud blending automation, observability, and security to build systems. AI/ML anomal prevent outages and reduce MTTR. SLOs help reduce incidents.

13:00

Lunch & networking

Main lobby

14:00

Mahesh Venkataraman & Koushik Vijayaraghavan

The Intent Graph: Visualizing Cross-Layer Impact in Observability

Accenture

Today’s observability stacks are rich in telemetry but poor in semantic alignment. This talk introduces the Intent Graph—a new visualization paradigm that traces the propagation of design decisions across system layers, from infrastructure to application logic to business outcomes. The Intent...

14:30

Piotr Zaniewski

Instant KAI Sandboxes with vCluster: Multi-Tenant, Multi-Scheduler GPU Sharing

vCluster

**Session Overview** Kubernetes offers many ways to share GPUs, but a single, cluster-wide scheduler often forces trade-offs between utilization, stability, and team autonomy. This talk shows how vCluster makes the NVIDIA Kubernetes AI Scheduler (KAI) run as an opt-in service for each tenant—so...

15:00

Laurent Godet

Orchestrating the Edge: A Hybrid Kubernetes Journey

Happening

This session explores how Happening completely revamped their edge Kubernetes infrastructure by implementing EKS Hybrid to centrally manage all their on-premise clusters across different markets. Faced with regulatory requirements to store data locally at the edge while maintaining operational...

15:30

Networking & sponsor crawl

Main lobby

16:00

Attila Szakacs

Counting What You Care About in Your Security Data Pipeline

Axoflow

Traditional syslog systems have long been opaque — exporting minimal, fixed-format metrics that rarely reflect what users actually care about. AxoSyslog, a high-performance fork of syslog-ng, has taken a different path: not only adopting native Prometheus metrics, but also enabling metric...

16:30

Wrap up

Scan each other's QR codes & head to a nearby pub!

Day 2

main room day2

09:00

Coffee break

Main lobby

09:30

Emmanuel Guérin

Dependencies galore: Behind the scenes of large-scale multi-repository CI

Criteo

Setting up continuous integration is now a common practice in the industry. However, there are still only few effective solutions for doing so across hundreds of repositories encompassing thousands of projects. How do we manage dependencies between projects? How do we assess the quality of each...

10:00

Tomaz Medrado

Production-Ready LangChain Agents - Multi-Tool Architectures for SRE Investigations

Ewake.ai

In modern distributed systems, the volume and fragmentation of production data can easily and frequently overwhelm human operators. This talk introduces a LangChain agent built to autonomously investigate production issues by orchestrating multiple tools across organizations’ stacks. We'll walk...

10:30

Bence Csati

Telemetry as Code: Declarative Observability with OpenTelemetry

Axoflow

This talk introduces telemetry as code**: bringing the same declarative principles that transformed infrastructure to your observability stack. Using **OpenTelemetry Collector Custom Resources** and the **Telemetry Controller**, we'll demonstrate how to eliminate configuration drift, enable true...

11:00

Dominik Süß

Schema driven Observability with OpenTelemetry Weaver

Grafana Labs

OpenTelemetry Semantic conventions cover many layers of your stack but fall flat when it comes to business logic. But this doesn’t have to be the case! The OpenTelemetry Weaver project gives you the tools to build your own semantic conventions. With auto generated instrumentation libraries and...

11:30

Lunch & networking

Main lobby

12:30

Thijs Feryn

Leveraging the edge for observability

Varnish Software

Most organizations adopt CDNs or HTTP caching proxies to boost the performance and scalability of their web platforms. But there’s another powerful advantage that often goes underused: centralized observability. This presentation will demonstrate how to go beyond the traditional performance...

13:00

Jerome Baude

Optimizing Kubernetes with Container Live Migration

CAST AI

Containers are immutable by design making Kubernetes the standard execution platorm for stateless workloads. We do see more and Statefull applications designed for Kubernetes like Databasaes and Jobs. But it still comes with some challenges. What about a long running process that you want to...

13:30

Daniel Afonso

The Age of AIOps: The State of AI in Incident Response

PagerDuty

How many times were you woken up during the night to either spend more time than you would like trying to figure out what exactly broke, or just bash your keyboard in frustration once you figure out it was actually a false positive? What if there was a better way? I mean, AI is everywhere...

14:00

Networking & sponsor crawl

Main lobby

14:30

Vlad Onetiu

Real-time earthquake alert system: Leveraging Serverless architecture with Confluent Kafka

DataIceberg

In our upcoming presentation, we'll explore a cutting-edge architectural solution for real-time SMS and email notifications, particularly geared towards responding to earthquake events. This system is designed to handle rapid data transmission, listening for event changes every second, making it...

15:00

Dima Malyshenko

From Git Push to Exit: How Continuous Deployment Converted into Financial Success

countX GmbH

This talk presents the real-world story behind countX, a B2B fintech company that grew from first commit to successful private equity exit in under four years, without VC funding and with a lean, empowered team. From day one, we built on a fully serverless AWS-native architecture: Lambda,...

15:30

Joshua Fox

Growing Machine Learning to production: Cloud MLOps for speed and efficiency

DoiT International

Machine Learning (ML) solutions often start on a simple platform like a virtual machine, which is great for initial research. However, as the system scales and enters production, automation becomes crucial. Cloud suites such as Google Vertex AI, Azure Machine Learning, and AWS Sagemaker, can...

16:00

Yash Verma

Drowning in Observability Costs? Build a Cost-Aware Telemetry Pipeline to Keep You Afloat ft. OpenTelemetry

Independent

Observability is the cornerstone of reliable systems. It lets teams identify and resolve issues before they impact a broader group of users. Yet building an ideal observability stack is far from easy. It demands time and effort, instrumenting every app, service, and component that emits...

16:30

Wrap up

Scan each other's QR codes & head to a nearby pub!

Buy Tickets

Site Reliability, DevOps and Cloud

November 19-20, 2025 Criteo, 32 Rue Blanche, 75009 Paris, France

Event Starts In:

Tickets

Schedule

Day 1

09:00

09:30

10:00

10:30

11:00

11:30

12:00

12:30

13:00

14:00

14:30

15:00

15:30

16:00

16:30

Day 2

09:00

09:30

10:00

10:30

11:00

11:30

12:30

13:00

13:30

14:00

14:30

15:00

15:30

16:00

16:30

Speakers

Alayshia Knighten

Annie Talvasto

Attila Szakacs

Bence Csati

Connal Murphy

Daniel Afonso

Dima Malyshenko

Dominik Süß

Emmanuel Guérin

Jerome Baude

Joshua Fox

Juliano Costa & Yuri Oliveira Sa

Laurent Godet

Mahesh Venkataraman & Koushik Vijayaraghavan

Matthieu Blumberg

Piotr Zaniewski

Poone Mokari

Tasmia Niazi

Thijs Feryn

Tomaz Medrado

Vlad Onetiu

Yash Verma

Venue

Criteo

Sponsors & Partners