SREDAY

Site Reliability, DevOps and Cloud

March 27-28, 2025 London, UK

2
Days
50+
Speakers
1
Tracks
200
Attendees

AIRE: AI Reliability Engineering. Bringing SRE to AI

Denys Vasyliev
GfK - An NIQ Company

AI products are becoming critical for businesses to maintain a competitive edge, yet integrating them into an organization’s ecosystem brings unique challenges. Ensuring the reliability, security, and alignment of AI systems with business goals and ethical standards demands new approaches and tools.

This talk explores the concept of AI Reliability Engineering (AIRE), which adapts SRE principles to AI systems. Based on our experience as an AI-based startup building a mentoring platform, we’ll discuss the challenges and solutions encountered when managing LLMs, and language chain tracing and AI gateways.

Key challenges include: - lack of visibility and control over the AI lifecycle: data collection, model deployment, and monitoring. - ensuring quality and robustness of AI models, addressing issues like prompt attacks, data drift, and evolving performance. - managing complexity in dependencies, configurations, and resources across environments.

The AIRE approach combines the CNCF ecosystem with established SRE practices to address these challenges. By leveraging tools like OpenInference and AI gateways, AIRE introduces processes that enhance reliability, mitigate risks, and improve the security of AI systems.

A seasoned professional with over 15 years of experience in Software Development, DevOps, Site Reliability Engineering (SRE), AIOps and Kubernetes.

Proven track record in leadership roles including Technical Manager, Operations Team Lead, and CTO. 5 years as Co-Founder, Cloud B2B/B2C application projects.

DevOps/SRE/Kubernetes Coach and Public Speaker

Sponsors & Partners

Want to become a sponsor? Get in touch!