Kubernetes promises portability and scalability, but in reality, most production outages happen due to avoidable mistakes. Security gaps, misconfigured health checks, poor scaling strategies—all can derail even experienced teams.
In this session, we’ll uncover:
Security Disasters → The risk of running containers as root, overlooked image vulnerabilities, and RBAC pitfalls.
Configuration Catastrophes → Why “works on my machine” never works, and how resource mismanagement wrecks clusters.
Observability Blind Spots → Missing runtime security monitoring, misleading CPU/memory metrics, and logging anti-patterns.
Scaling Traps → HPA-induced thrashing, node scheduling inefficiencies, and bottlenecks hidden until too late.
But it’s not just about problems—we’ll explore solutions:
Actionable hardening checklists
Tools for continuous monitoring & runtime security
Automation strategies to prevent config drift
Proven observability practices for anomaly detection
Whether you’re new to Kubernetes or running enterprise-scale clusters, you’ll leave this session with practical, battle-tested strategies to keep your systems safe, stable, and observable.
Kaustubha Shravan is a Cloud Architect who designs and operates resilient, measurable, and cost-efficient platforms across Azure, AWS, and GCP. She blends reliability engineering with data-driven practices—SLOs, error budgets, and ML-assisted incident response—to make outages rare and recovery fast. With 46+ cloud certifications, she has led initiatives such as Benchmarking-as-a-Service and production-grade ML inference pipelines that improved performance while cutting spend. Kaustubha is a Women Techmakers Ambassador and frequent community mentor; her work has been showcased at NeurIPS workshops. She speaks about pragmatic reliability patterns, observability that drives action, and culture—how to turn postmortems into durable engineering improvements. When she’s not shipping guardrails, she’s helping teams adopt sustainable, privacy-aware AI practices and sharing playbooks that teams can put to work immediately.