SREDAY

Site Reliability, DevOps and Cloud

November 19-20, 2025 Criteo, 32 Rue Blanche, 75009 Paris, France

2
Days
20+
Speakers
2
Tracks
90
Attendees

Event Starts In:

Tickets

Schedule

Day 1

Keynote: The Infrastructure Renaissance: Security, Sustainability, and Scale

Criteo
As infrastructure evolves from cost center to innovation engine, the stakes have never been higher. This keynote will unpack the forces reshaping the infrastructure landscape—from the rise of sovereign clouds and liquid cooling to the imperative of secure-by-default systems. Drawing on lessons...

10:00

Keynote: Listen to Production the Way It Deserves

Ewake.ai
Production today is messy. There’s noise, complexity, and a constant stream of change. And while we’ve come a long way with observability, it still leans heavily on human foresight. Logs, metrics, alerts, they’re all things we had to think of ahead of time. But when we don’t? That’s where blind...

10:30

Coffee break

Main lobby


Autonomy with Assurance for Reliability

Mondoo
SREs are on the frontlines of uptime, performance, cost efficiency, and incident response. So, at times, policies for security and compliance often live in stale docs enforced inconsistently, if at all, until something breaks or someone has an audit. Policy as Code (PaC) replaces that mess with...

I’ll Be Backoff: Benchmarking AI-Powered Platform Engineering Tools

Waovo
AI is making its way into platform engineering—not just as a workload, but as a smart automation layer for how platforms are built, operated, and optimized. Promises of intelligent autoscaling, self-tuning systems, and AI-assisted remediation are everywhere. But how do these claims hold up in...

Crash-Proofing Your OpenTelemetry Collector

Datadog & OllyGarden
When planning observability for a distributed system, it's common to avoid having each microservice sending telemetry data directly to the backend. Instead, a Collector is typically deployed per host or node to receive, process, and forward telemetry data. This approach improves bandwidth usage...

12:30

Google Cloud SRE - AI in incident prevention, automation, and anomaly detection. Migrations to Cloud (GCP, Dynatrace, ServiceNow, Monaco YAML, etc.)

Lloyds banking group
Resilience, trust, and scale as systems move to cloud & AI reshapes how we build and operate software. SRE teams use AI to stop incidents. Google Cloud blending automation, observability, and security to build systems. AI/ML anomal prevent outages and reduce MTTR. SLOs help reduce incidents.

13:00

Lunch & networking

Main lobby


The Intent Graph: Visualizing Cross-Layer Impact in Observability

Accenture
Today’s observability stacks are rich in telemetry but poor in semantic alignment. This talk introduces the Intent Graph—a new visualization paradigm that traces the propagation of design decisions across system layers, from infrastructure to application logic to business outcomes. The Intent...

Instant KAI Sandboxes with vCluster: Multi-Tenant, Multi-Scheduler GPU Sharing

vCluster
**Session Overview** Kubernetes offers many ways to share GPUs, but a single, cluster-wide scheduler often forces trade-offs between utilization, stability, and team autonomy. This talk shows how vCluster makes the NVIDIA Kubernetes AI Scheduler (KAI) run as an opt-in service for each tenant—so...

Orchestrating the Edge: A Hybrid Kubernetes Journey

Happening
This session explores how Happening completely revamped their edge Kubernetes infrastructure by implementing EKS Hybrid to centrally manage all their on-premise clusters across different markets. Faced with regulatory requirements to store data locally at the edge while maintaining operational...

15:30

Counting What You Care About in Your Security Data Pipeline

Axoflow
Traditional syslog systems have long been opaque — exporting minimal, fixed-format metrics that rarely reflect what users actually care about. AxoSyslog, a high-performance fork of syslog-ng, has taken a different path: not only adopting native Prometheus metrics, but also enabling metric...

16:30

Wrap up

Scan each other's QR codes & head to a nearby pub!


Day 2

09:00

Coffee break

Main lobby


Dependencies galore: Behind the scenes of large-scale multi-repository CI

Criteo
Setting up continuous integration is now a common practice in the industry. However, there are still only few effective solutions for doing so across hundreds of repositories encompassing thousands of projects. How do we manage dependencies between projects? How do we assess the quality of each...

Production-Ready LangChain Agents - Multi-Tool Architectures for SRE Investigations

Ewake.ai
In modern distributed systems, the volume and fragmentation of production data can easily and frequently overwhelm human operators. This talk introduces a LangChain agent built to autonomously investigate production issues by orchestrating multiple tools across organizations’ stacks. We'll walk...

10:30

Telemetry as Code: Declarative Observability with OpenTelemetry

Axoflow
This talk introduces telemetry as code**: bringing the same declarative principles that transformed infrastructure to your observability stack. Using **OpenTelemetry Collector Custom Resources** and the **Telemetry Controller**, we'll demonstrate how to eliminate configuration drift, enable true...

Schema driven Observability with OpenTelemetry Weaver

Grafana Labs
OpenTelemetry Semantic conventions cover many layers of your stack but fall flat when it comes to business logic. But this doesn’t have to be the case! The OpenTelemetry Weaver project gives you the tools to build your own semantic conventions. With auto generated instrumentation libraries and...

11:30

Lunch & networking

Main lobby


12:30

Leveraging the edge for observability

Varnish Software
Most organizations adopt CDNs or HTTP caching proxies to boost the performance and scalability of their web platforms. But there’s another powerful advantage that often goes underused: centralized observability. This presentation will demonstrate how to go beyond the traditional performance...

13:00

Optimizing Kubernetes with Container Live Migration

CAST AI
Containers are immutable by design making Kubernetes the standard execution platorm for stateless workloads. We do see more and Statefull applications designed for Kubernetes like Databasaes and Jobs. But it still comes with some challenges. What about a long running process that you want to...

The Age of AIOps: The State of AI in Incident Response

PagerDuty
How many times were you woken up during the night to either spend more time than you would like trying to figure out what exactly broke, or just bash your keyboard in frustration once you figure out it was actually a false positive? What if there was a better way? I mean, AI is everywhere...

14:00

14:30

Real-time earthquake alert system: Leveraging Serverless architecture with Confluent Kafka

DataIceberg
In our upcoming presentation, we'll explore a cutting-edge architectural solution for real-time SMS and email notifications, particularly geared towards responding to earthquake events. This system is designed to handle rapid data transmission, listening for event changes every second, making it...

From Git Push to Exit: How Continuous Deployment Converted into Financial Success

countX GmbH
This talk presents the real-world story behind countX, a B2B fintech company that grew from first commit to successful private equity exit in under four years, without VC funding and with a lean, empowered team. From day one, we built on a fully serverless AWS-native architecture: Lambda,...

15:30

Growing Machine Learning to production: Cloud MLOps for speed and efficiency

DoiT International
Machine Learning (ML) solutions often start on a simple platform like a virtual machine, which is great for initial research. However, as the system scales and enters production, automation becomes crucial. Cloud suites such as Google Vertex AI, Azure Machine Learning, and AWS Sagemaker, can...

16:00

Drowning in Observability Costs? Build a Cost-Aware Telemetry Pipeline to Keep You Afloat ft. OpenTelemetry

Independent
Observability is the cornerstone of reliable systems. It lets teams identify and resolve issues before they impact a broader group of users. Yet building an ideal observability stack is far from easy. It demands time and effort, instrumenting every app, service, and component that emits...

16:30

Wrap up

Scan each other's QR codes & head to a nearby pub!


Speakers

Alayshia Knighten
Mondoo
Read more →
Annie Talvasto
Waovo
Read more →
Attila Szakacs
Axoflow
Read more →
Bence Csati
Axoflow
Read more →
Connal Murphy
Criteo
Read more →
Daniel Afonso
PagerDuty
Read more →
Dima Malyshenko
countX GmbH
Read more →
Dominik Süß
Grafana Labs
Read more →
Emmanuel Guérin
Criteo
Read more →
Jerome Baude
CAST AI
Read more →
Joshua Fox
DoiT International
Read more →
Juliano Costa & Yuri Oliveira Sa
Datadog & OllyGarden
Read more →
Laurent Godet
Happening
Read more →
Mahesh Venkataraman & Koushik Vijayaraghavan
Accenture
Read more →
Matthieu Blumberg
Criteo
Read more →
Piotr Zaniewski
vCluster
Read more →
Poone Mokari
Ewake.ai
Read more →
Tasmia Niazi
Lloyds banking group
Read more →
Thijs Feryn
Varnish Software
Read more →
Tomaz Medrado
Ewake.ai
Read more →
Vlad Onetiu
DataIceberg
Read more →
Yash Verma
Independent
Read more →

Venue

Criteo

32 Rue Blanche
75009 Paris, France

Sponsors & Partners

Want to become a sponsor? Get in touch!