10 votes

GitHub: October 21 post-incident analysis

4 comments

  1. spit-evil-olive-tips
    Link
    I love stories of systematic failures like these:

    I love stories of systematic failures like these:

    Connectivity between these locations was restored in 43 seconds, but this brief outage triggered a chain of events that led to 24 hours and 11 minutes of service degradation.

    9 votes
  2. [3]
    unknown user
    Link
    As always: Reminds me of the AWS failure where they couldn't update the status page because it was hosted on AWS...

    As always:

    GitHub published a blog post to provide more context. We use GitHub Pages internally and all builds had been paused several hours earlier, so publishing this took additional effort.

    Reminds me of the AWS failure where they couldn't update the status page because it was hosted on AWS...

    4 votes
    1. [3]
      Comment deleted by author
      Link Parent
      1. [2]
        spit-evil-olive-tips
        (edited )
        Link Parent
        It was the same outage...S3 was impacted severely enough that the AWS Status page lost its red/yellow/green icons because they were stored in S3.

        It was the same outage...S3 was impacted severely enough that the AWS Status page lost its red/yellow/green icons because they were stored in S3.

        7 votes