9 votes

When consuming an API with state rate limits, how should one handle not exceeding them?

Posted June 16, 2023 by bugsmith

My typical approach is one that I believe is pretty common: Reading the response header for current count and waiting if the limit is reached.

However, I am currently working with a couple of APIs which don't implement that and are currently set up with rate limits on an honesty system.

Is it a case of throwing sleep statements into you code, or using some kind of "bucket" and "lock" system?

I'd be interested to see any simple implementation people have used (the simpler the better).

8 comments

[3]
spit-evil-olive-tips
June 16, 2023
Link
two big ways of doing this, proactive and reactive the proactive way is typically using a "token bucket". tokens are added to the bucket at some rate, requests take tokens from the bucket. if...
- Exemplary
two big ways of doing this, proactive and reactive

the proactive way is typically using a "token bucket". tokens are added to the bucket at some rate, requests take tokens from the bucket. if there's not enough tokens they wait.

for API calls, one token per request is common, but you can also do something like each token represents one byte, or one megabyte, if the goal was to limit bandwidth usage. then the thing being rate-limited, instead of one token at a time, deducts tokens corresponding to its usage from the bucket.

if rolling your own token bucket implementation, an important consideration is to cap the maximum number of tokens in the bucket. otherwise, a long idle period can fill the bucket up with a huge number of tokens, and then a burst of requests might have effectively no rate limit.

the reactive way is with backoff - you get a 429 Too Many Requests or whatever, and sleep for N seconds before retrying.

very often, backoff is exponential - your first rate limit response, you sleep for only 1 second, but if the retry is also rate limited, sleep for 2 seconds, and so on. you want to cap this to a maximum as well, because sleeping 65536 seconds or whatever is rarely what you want.

exponential backoff is also very useful for dealing with error responses beyond rate limits - if you're doing some operation say once every 10 seconds, if it fails you might retry after 1 second. but if it continues to fail, there's no sense sitting there retrying every second. instead you'd back off to maybe once a minute.

and of course the two options aren't mutually exclusive - you can implement proactive limits, which lets you control the rate yourself, and try to stay below whatever limit the service actually enforces. you might still get a rate limit response though, so ideally you'd also make sure a 429 response is handled.

18 votes
1. [2]
  bugsmith (OP)
  June 16, 2023
  Link Parent
  Just the kind of answer I'd hoped to receive when I posted this topic. Exponential backoff is a technique I was already familiar with, and is effectively the only one I've ever implemented (both...
  
  Just the kind of answer I'd hoped to receive when I posted this topic.
  
  Exponential backoff is a technique I was already familiar with, and is effectively the only one I've ever implemented (both home spun and via libraries).
  
  The former mentioned "token bucket" is exactly what I was searching for though and is exactly in line with what I had imagined when I alluded to a "bucket and lock" technique.
  
  Thanks for the tips regarding maximum limit.
  
  5 votes
  1. spit-evil-olive-tips
    June 16, 2023
    Link Parent
    glad it's helpful! there's lots of third party libraries you can probably use, but if you want to roll your own for whatever reason, there's a neat trick you can use to simplify it: keep track of...
    
    glad it's helpful! there's lots of third party libraries you can probably use, but if you want to roll your own for whatever reason, there's a neat trick you can use to simplify it:
    
    keep track of the number of tokens in the bucket, and the timestamp of the last time tokens were added.
    
    when a thread goes to acquire a token, it can acquire the lock on the bucket, compute how long since the last refill, how many tokens should be added based on that elapsed time (which might include fractional tokens), then add that to the bucket, before deducting the token it acquires.
    
    the alternate way to do it is to have a dedicated thread that does nothing but token refreshes. that works too, but the "inline" refill can be simpler in many ways.
    
    another important consideration is to make sure threads never sleep while holding the lock that protects the bucket. when I've implemented this in the past, the way I've handled that is to allow the bucket count to go slightly negative. if a request takes the last token from the bucket, it computes an "overdraft penalty" of how long it should sleep to cover the negative token balance, releases the lock, and then sleeps for that duration before continuing.
    
    2 votes
[3]
Maxi
June 16, 2023
Link
If the API doesn’t give you a count on requests in the response, then you’ll have to keep track yourself of how many requests you have remaining. Alternatively, you can just keep hitting until you...

If the API doesn’t give you a count on requests in the response, then you’ll have to keep track yourself of how many requests you have remaining.

Alternatively, you can just keep hitting until you hit the rate limit, and then apply a successively increasing delay on further request attempts.

2 votes
1. [2]
  drannex
  June 16, 2023
  Link Parent
  I've done the alternate more times than I can count, and I feel wrong every time, but it's likely the best option in the long run. Otherwise, if you need to, you can setup a request queue system...
  
  I've done the alternate more times than I can count, and I feel wrong every time, but it's likely the best option in the long run.
  
  Otherwise, if you need to, you can setup a request queue system with a FIFO setup, and once the limit is reached you wait and retry until successful, remove from the queue, and continue until you reach the limit again.
  
  This works well especially for those terrifying services that have a variable rate limit system in place.
  
  3 votes
  1. Maxi
    June 16, 2023
    Link Parent
    That request queue system can work well if the cool down is short enough, but if the limit is actually set on an hour or daily basis, that’s a long wait :p
    
    That request queue system can work well if the cool down is short enough, but if the limit is actually set on an hour or daily basis, that’s a long wait :p
    
    1 vote
fourcandles
June 16, 2023
Link
I've encountered 'honesty' APIs before and I think the best approach is to keep things as thin as possible on your end in case they ever do introduce rate limits in the future. Or in other words,...

I've encountered 'honesty' APIs before and I think the best approach is to keep things as thin as possible on your end in case they ever do introduce rate limits in the future. Or in other words, whatever is implemented should be easy to get out of when there's a change on their side.

Most commonly I'll introducing small sleeps between requests, and (this depends on the type of data) caching on my side to avoid requests if it isn't needed.

1 vote
winther
June 16, 2023
Link
I work with a bunch of systems that have a request per second limit. Our approach is to check against a counter in Redis if the limit for this second have been reached, then we simply return the...

I work with a bunch of systems that have a request per second limit. Our approach is to check against a counter in Redis if the limit for this second have been reached, then we simply return the message back to RabbitMQ with a nack and requeue=true.