aiohttp: rate limiting parallel requests

后端未结

关注

 3  1427

APIs often have rate limits that users have to follow. As an example let\'s take 50 requests/second. Sequential requests take 0.5-1 second and thus are too slow to come clos

相关标签:

3条回答

走了就别回头了

2020-12-05 11:55
If I understand you well, you want to limit the number of simultaneous requests?

There is a object inside asyncio named Semaphore, it works like an asynchronous RLock.
```
semaphore = asyncio.Semaphore(50)
#...
async def limit_wrap(url):
    async with semaphore:
        # do what you want
#...
results = asyncio.gather([limit_wrap(url) for url in urls])
```
updated

Suppose I make 50 concurrent requests, and they all finish in 2 seconds. So, it doesn't touch the limitation(only 25 requests per seconds).

That means I should make 100 concurrent requests, and they all finish in 2 seconds too(50 requests per seconds). But before you actually make those requests, how could you determine how long will they finish?

Or if you doesn't mind finished requests per second but requests made per second. You can:
```
async def loop_wrap(urls):
    for url in urls:
        asyncio.ensure_future(download(url))
        await asyncio.sleep(1/50)

asyncio.ensure_future(loop_wrap(urls))
loop.run_forever()
```
The code above will create a Future instance every 1/50 second.
0 讨论(0)
发布评论:

提交评论
- 加载中...

悲&欢浪女

2020-12-05 11:57

I approached the problem by creating a sub class of aiohttp.ClientSession() with rate-limiter in-built based on leaky-bucket algorithm. I use asyncio.Queue() for ratelimiting instead of Semaphores. I only overrode the _request() method. I find this approach cleaner since you only replace session = aiohttp.ClientSession() with session = ThrottledClientSession(rate_limit=15)

Code:

class ThrottledClientSession(aiohttp.ClientSession):
    """Rate-throttled client session class inherited from aiohttp.ClientSession)""" 
MIN_SLEEP = 0.1

def __init__(self, rate_limit: float =None, *args,**kwargs) -> None: 
    super().__init__(*args,**kwargs)
    self.rate_limit = rate_limit
    self._fillerTask = None
    self._queue = None
    self._start_time = time.time()
    if rate_limit != None:
        if rate_limit <= 0:
            raise ValueError('rate_limit must be positive')
        self._queue = asyncio.Queue(min(2, int(rate_limit)+1))
        self._fillerTask = asyncio.create_task(self._filler(rate_limit))

 
def _get_sleep(self) -> list:
    if self.rate_limit != None:
        return max(1/self.rate_limit, self.MIN_SLEEP)
    return None
    
async def close(self) -> None:
    """Close rate-limiter's "bucket filler" task"""
    if self._fillerTask != None:
        self._fillerTask.cancel()
    try:
        await asyncio.wait_for(self._fillerTask, timeout= 0.5)
    except asyncio.TimeoutError as err:
        print(str(err))
    await super().close()


async def _filler(self, rate_limit: float = 1):
    """Filler task to fill the leaky bucket algo"""
    try:
        if self._queue == None:
            return 
        self.rate_limit = rate_limit
        sleep = self._get_sleep()
        updated_at = time.monotonic()
        fraction = 0
        extra_increment = 0
        for i in range(0,self._queue.maxsize):
            self._queue.put_nowait(i)
        while True:
            if not self._queue.full():
                now = time.monotonic()
                increment = rate_limit * (now - updated_at)
                fraction += increment % 1
                extra_increment = fraction // 1
                items_2_add = int(min(self._queue.maxsize - self._queue.qsize(), int(increment) + extra_increment))
                fraction = fraction % 1
                for i in range(0,items_2_add):
                    self._queue.put_nowait(i)
                updated_at = now
            await asyncio.sleep(sleep)
    except asyncio.CancelledError:
        print('Cancelled')
    except Exception as err:
        print(str(err))


async def _allow(self) -> None:
    if self._queue != None:
        # debug 
        #if self._start_time == None:
        #    self._start_time = time.time()
        await self._queue.get()
        self._queue.task_done()
    return None


async def _request(self, *args,**kwargs):
    """Throttled _request()"""
    await self._allow()
    return await super()._request(*args,**kwargs)

0 讨论(0)

悲哀的现实

2020-12-05 12:02

I liked @sraw's approached this with asyncio, but their answer didn't quite cut it for me. Since I don't know if my calls to download are going to each be faster or slower than the rate limit I want to have the option to run many in parallel when requests are slow and run one at a time when requests are very fast so that I'm always right at the rate limit.

I do this by using a queue with a producer that produces new tasks at the rate limit, then many consumers that will either all wait on the next job if they're fast, or there will be work backed up in the queue if they are slow, and will run as fast as the processor/network allow:

import asyncio
from datetime import datetime 

async def download(url):
  # download or whatever
  task_time = 1/10
  await asyncio.sleep(task_time)
  result = datetime.now()
  return result, url

async def producer_fn(queue, urls, max_per_second):
  for url in urls:
    await queue.put(url)
    await asyncio.sleep(1/max_per_second)
 
async def consumer(work_queue, result_queue):
  while True:
    url = await work_queue.get()
    result = await download(url)
    work_queue.task_done()
    await result_queue.put(result)

urls = range(20)
async def main():
  work_queue = asyncio.Queue()
  result_queue = asyncio.Queue()

  num_consumer_tasks = 10
  max_per_second = 5
  consumers = [asyncio.create_task(consumer(work_queue, result_queue))
               for _ in range(num_consumer_tasks)]    
  producer = asyncio.create_task(producer_fn(work_queue, urls, max_per_second))
  await producer

  # wait for the remaining tasks to be processed
  await work_queue.join()
  # cancel the consumers, which are now idle
  for c in consumers:
    c.cancel()

  while not result_queue.empty():
    result, url = await result_queue.get()
    print(f'{url} finished at {result}')
 
asyncio.run(main())

0 讨论(0)