SREDAY

Site Reliability, DevOps and Cloud

April 11, 2025 San Francisco, CA, USA

1
Days
16+
Speakers
2
Tracks
100
Attendees

Event Starts In:

Tickets

Schedule

Day 1

09:00

Coffee break

Main lobby


From Growing Pains to Enterprise Scale: How Harness Transformed Its Infrastructure

Harness
Scaling infrastructure is never just about adding more machines—it’s about evolving architecture, managing complexity, and maintaining reliability while growing rapidly. In this session, we’ll take you through the journey of how Harness scaled its infrastructure to support growing customer...

AI Teammates for SREs: How will they impact SRE Teams?

Neubird AI
Explore how AI teammates are revolutionizing SRE work by handling routine investigations, providing context-aware analysis, and enabling teams to focus on engineering. Learn about real implementation challenges and how to prepare your team for this transformation.

A proactive approach to resilience in modern applications

AWS
Discover how adopting a proactive resilience strategy can help modern applications withstand failures and maintain performance. This talk explores key techniques for identifying vulnerabilities, implementing fault tolerance, and ensuring continuous availability in dynamic environments.

11:00

Awareness Security in the age of A.I.

RSI Security
In an increasingly digital world, where technology and data have become integral parts of our daily lives, the importance of cybersecurity cannot be overstated. This topic on **"Awareness Security"** is an exciting opportunity to enlighten your mind about the significance of this subject, the...

11:30

Lunch & networking

Main lobby


12:30

Visibility, Insight, and Action with Cast AI, Prometheus, and Grafana

CAST AI
Achieving full-stack observability requires seamless integration of monitoring, analysis, and automation. In this session, we’ll explore how Cast AI, Prometheus, and Grafana work together to provide real-time visibility, actionable insights, and automated optimizations for cloud-native...

13:00

Managing Databases in the Cloud in Broken

Rapydo
In this talk, Matan Nataf, co-founder, and CEO of Rapydo, addresses the complexities of managing relational databases in the cloud era. Focusing on the challenges brought by microservices and multi-tenant architectures, he underscores the limitations of current tools in achieving scalability and...

Streamlining Telemetry Data: Building a Telemetry Pipeline to Handle High-Cardinality Metrics

Mezmo
Handling high-cardinality telemetry data efficiently is crucial for modern observability systems. In this session, we will explore strategies for designing a scalable telemetry pipeline that can process large volumes of diverse metrics without performance bottlenecks. We’ll cover best practices...

14:00

Zero Trust: From Airports to Identity-Aware Proxies

Pomerium
Zero Trust doesn't have to be intimidating. Learn how Identity-Aware Proxies transform service access from perimeter-based to continuous verification, explained through the universal experience of airport security.

14:30

Fast, Cheap, DIY Observability with Open Source Analytics and Visualization

Altinity
Commercial observability tools are expensive and complex - but you can build a fast, cost-effective solution yourself! This talk shows how to use ClickHouse, OpenTelemetry, and Grafana to create scalable, DIY monitoring with open source tools.

Building a distributed persistent queue on FoundationDB

Tigris Data
Tigris is a globally distributed object storage system where objects are stored all over the world. The system needs to have asynchronous tasks to keep objects cached, redundantly stored, and moved around in response to changes in access patterns. To solve this, we built an asynchronous task...

The duality of adopting AI: Can SREs become AI/ML engineers?

Kyndryl
AI/ML is not new in the business world. It has been used for some time, but Generative AI (GenAI) initiated a new disruptive force in recent years. Many businesses and technical processes are taking advantage of embedded GenAI capabilities. However, companies must put more effort into leveraging...

16:30

Wrap up

Scan each other's QR codes & head to a nearby pub!


09:00

Coffee break

Main lobby


Optimizing Cloud MLOps Costs: Tackling Kubernetes Challenges

in-n-out.cloud
From my experience Machine learning (ML) workloads on Kubernetes (in the Cloud) offer unparalleled flexibility and scalability which is great but also can lead to the higher cloud spend. In this session, In this session, I'll be addressing the hidden financial and technical challenges of running...

Case Study: Re-Thinking Our Infrastructure Tooling

Fairwinds
When you're managing dozens of Kubernetes clusters, across three different clouds, for dozens of individual companies in their own accounts, the challenge of (re)designing tooling is complex. Come hear how we worked through all the many possible options (centralized IaC vs templating, whether to...

CAP Theorem Reloaded- AKA How to Optimize Distribution of Telemetry Data

Sawmills
The backbone of SRE - like all engineering, is monitoring and observability. When designing and implementing platforms, understanding how monitoring and observability telemetry data impacts your systems is critical to scale.

11:00

Cloud Integration Testing Made Easy for Your AWS Cloud Apps

Postman
Integration testing for cloud-native AWS applications is complex due to service dependencies, deployment intricacies, and high costs. To make integration testing faster and easier, this presentation shows you how to emulate AWS environments locally using Testcontainers and LocalStack. By...

11:30

Lunch & networking

Main lobby


12:30

Reliable Serverless Needs Distributed Transactions

reboot.dev
Cloud native and serverless application platforms give teams encapsulation, flexibility, and reduced deployment dependencies. But the movement onto the cloud has so far trained us to accept that decomposing your application into multiple loosely coupled functions or services requires eventual...

Scaling Data Intelligence with Conversational AI: A Reliable and Efficient Approach to Enterprise Insights

As enterprise data volumes surge toward 175 zettabytes by 2025, organizations face mounting challenges in data accessibility, governance, and real-time decision-making. AskDataAI is an AI-driven data intelligence platform that bridges this gap by leveraging vector search, role-based access...

13:30

Fearless SREs

Arista Networks
In this talk, we'll explore both the technical but human and very emotional side of what makes mythical SREs perform at their best in an incident – but also how we not only find such engineers but help others grow into one. Together, we will explore the following: - **Fearless ≠...

14:00

DragonCrawl: Revolutionizing Mobile Testing with Generative AI

Revyl.ai
The challenges of traditional mobile testing at scale (3,000+ simultaneous experiments) Architecture and implementation of DragonCrawl using MPNet and embedding techniques Real-world examples of DragonCrawl's adaptive behavior and problem-solving capabilities Practical strategies for handling LLM...

14:30

15:00

Driving Solar Energy Efficiency with Predictive Analytics and AI: The Role of Site Reliability in Renewable Energy

Ironridge
The solar energy sector is at the forefront of innovation, leveraging predictive analytics and artificial intelligence (AI) to tackle the operational complexities of large-scale solar farms. As global solar capacity surpasses 760 GW, these advanced technologies are transforming site reliability,...

Mastering Kubernetes Cost Optimization for Sustainable Cloud Operations

Randoli
Cloud-native platforms like Kubernetes offer unparalleled flexibility and scalability, but they often come with a hidden price tag. Without intentional cost management, organizations risk overspending due to inefficient resource utilization, over-provisioning, and lack of visibility into cloud...

16:00

Wrap up

Scan each other's QR codes & head to a nearby pub!


Speakers

Anam Hira
Revyl.ai
Read more →
Anca Ghenade
Postman
Read more →
Andrew Suderman
Fairwinds
Read more →
Francois Martel
Neubird AI
Read more →
Gunnar Grosch
AWS
Read more →
Himank Chaudhary
Tigris Data
Read more →
John Shin
RSI Security
Read more →
Matan Nataf
Rapydo
Read more →
Natalie Serebryakova
in-n-out.cloud
Read more →
Nick Taylor
Pomerium
Read more →
Praveen Payili
Read more →
Puneet Saraswat
Harness
Read more →
Rajith Muditha Attapattu
Randoli
Read more →
Ren Lee
Arista Networks
Read more →
Rich Prillinger
Mezmo
Read more →
Riley Scheid
reboot.dev
Read more →
Robert Hodges
Altinity
Read more →
Robin Sarkar
Ironridge
Read more →
Rod Anami & Gregory Pruett
Kyndryl
Read more →
Scott Davis
CAST AI
Read more →
Shubhanshu Surana
Sawmills
Read more →

Venue

The offices of Harness.io

55 Stockton St, San Francisco,
CA 94108, United States

Sponsors & Partners

Want to become a sponsor? Get in touch!