1 vote

Good retry, bad retry: an incident story

1 comment

  1. skybrian
    Link
    From the article: Here is one solution they discuss: The article discusses tradeoffs compared to other solutions. On Hacker News they discuss other solutions, like the server sending feedback...

    From the article:

    John explained that exponential backoff for retries doesn’t eliminate load amplification: it merely delays it. After a certain downtime threshold, exponential backoff becomes ineffective in reducing server load.

    Here is one solution they discuss:

    • Retry budget (or adaptive retry): Retries are always allowed, but within a budget, for example, no more than 10% of the number of successful requests. In case of service problems, it can receive no more than 10% of additional traffic.

    The article discusses tradeoffs compared to other solutions. On Hacker News they discuss other solutions, like the server sending feedback about when to retry.