What happens when the tools and services you depend on to drive Site Reliability Engineering turn out to be susceptible to reliability failures of their own?

That’s the question that teams at about 400 businesses have presumably had to ask themselves this month in the wake of a major outage in Atlassian Cloud. The incident offers a number of insights for SREs about reliability risks within reliability management software itself, as well as how to work through complex outages efficiently and transparently, as Atlassian has done following the incident.

Top
Generated by Feedzy