SREDAY

Site Reliability, DevOps and Cloud

October 3, 2025 San Francisco, CA, USA

1
Days
16+
Speakers
2
Tracks
100
Attendees

Building Self-Healing Data Pipelines: How Reinforcement Learning Reduces Operational Overhead While Improving Performance

Deepika Annam
Nike

Site Reliability Engineers face escalating challenges managing data pipelines that must adapt to dynamic workloads, handle traffic spikes, and maintain high availability across distributed cloud environments. Traditional static optimization and rule-based approaches fall short when dealing with the complexity of modern streaming systems, often requiring extensive manual intervention and creating reliability risks during peak demand periods. This presentation demonstrates how Deep Reinforcement Learning (DRL) is revolutionizing pipeline reliability engineering, enabling truly self-healing systems that automatically optimize performance while reducing operational burden. Through analysis of production deployments and research findings, we'll explore how DRL-powered systems consistently outperform conventional approaches across critical reliability metrics. Research demonstrates significant operational improvements: Apache Spark implementations using RL-based resource allocation show measurable performance gains compared to heuristic policies while achieving better resource efficiency. Apache Flink deployments with RL-based flow control demonstrate substantial latency reductions and throughput increases compared to rule-based systems. Modern lightweight DRL architectures can achieve significant improvements in data ingestion throughput while converging to optimal solutions rapidly on complex production pipelines. The operational benefits extend beyond immediate performance gains. With maintenance activities consuming substantial portions of production costs, DRL-based predictive maintenance has seen significant growth in research adoption. Data center implementations show meaningful power savings in thermal management scenarios, while advanced algorithms demonstrate high success rates in automated remediation applications. These results highlight DRL's potential for reducing operational overhead while improving system reliability. Attendees will learn practical implementation strategies for neural network integration in production environments, adaptive resource allocation patterns for cloud-native systems, and multi-objective optimization frameworks balancing performance, cost, and reliability. We'll cover real-world applications using Deep Q-Networks, Proximal Policy Optimization, and Soft Actor-Critic algorithms across various industries. The session includes implementation frameworks, reliability benchmarking methodologies, and strategies for addressing operational challenges like hyperparameter tuning and model deployment, providing actionable insights for building next-generation self-managing pipeline systems that reduce manual intervention, improve system resilience, and enable proactive operational practices.

Deepika Annam is a Senior Data Engineer at Nike Inc with over 14 years of experience in data engineering, software development, and SAP consulting. Currently based in Portland, OR, she specializes in architecting scalable data solutions and developing robust data pipelines that support business intelligence and reporting across financial, inventory, sales, and planning domains. At Nike, Deepika has led major platform transformations, including the migration of approximately 1,500 SAP BusinessObjects reports to 250+ Power BI reports and the upgrade from EMR to Databricks, achieving significant cost savings of approximately 80%. Her work on real-time dashboards and data solutions has directly contributed to business growth, including a 15% increase in North America Thanksgiving sales. Deepika holds an MBA from Andhra University and a Bachelor's degree in Engineering from SRKR Engineering College, India. She is skilled in SQL, Python, PySpark, AWS, Airflow, and various SAP technologies, with additional certifications in data analysis and Python for data engineering from Great Learning Academy and Duke University. Her expertise spans modern data warehouse solutions, ETL pipeline development, and business intelligence reporting, making her a valuable asset in driving data-driven decision making for enterprise-level organizations.

Sponsors & Partners

Want to become a sponsor? Get in touch!