On March 8th, Datadog had a massive global outage. It took more than 500 engineers split amongst many teams over two days to coordinate the incident response. In this talk, I will go over the trigger of the incident and why it took such large-scale efforts to resolve, and some of the technical and social lessons learned from the event.