Making 1 milion requests with aiohttp/asyncio - literally

后端 未结 2 1436
后悔当初
后悔当初 2021-02-03 15:11

I followed up this tutorial: https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html and everything works fine when I am doing like 50 000 requests. Bu

相关标签:
2条回答
  • 2021-02-03 15:33

    asyncio is memory bound (like any other program). You can not spawn more task that memory can hold. My guess is that you hit a memory limit. Check dmesg for more information.

    1 millions RPS doesn't mean there is 1M tasks. A task can do several request in the same second.

    0 讨论(0)
  • 2021-02-03 15:45

    Schedule all 1 million tasks at once

    This is the code you are talking about. It takes up to 3 GB RAM so it is easily possible that it will be terminated by the operating system if you have low free memory.

    import asyncio
    from aiohttp import ClientSession
    
    MAX_SIM_CONNS = 50
    LAST_ID = 10**6
    
    async def fetch(url, session):
        async with session.get(url) as response:
            return await response.read()
    
    async def bound_fetch(sem, url, session):
        async with sem:
            await fetch(url, session)
    
    async def fetch_all():
        url = "http://localhost:8080/?id={}"
        tasks = set()
        async with ClientSession() as session:
            sem = asyncio.Semaphore(MAX_SIM_CONNS)
            for i in range(1, LAST_ID + 1):
                task = asyncio.create_task(bound_fetch(sem, url.format(i), session))
                tasks.add(task)
            return await asyncio.gather(*tasks)
    
    if __name__ == '__main__':
        asyncio.run(fetch_all())
    

    Use queue to streamline the work

    This is my suggestion how to use asyncio.Queue to pass URLs to worker tasks. The queue is filled as-needed, there is no pre-made list of URLs.

    It takes only 30 MB RAM :)

    import asyncio
    from aiohttp import ClientSession
    
    MAX_SIM_CONNS = 50
    LAST_ID = 10**6
    
    async def fetch(url, session):
        async with session.get(url) as response:
            return await response.read()
    
    async def fetch_worker(url_queue):
        async with ClientSession() as session:
            while True:
                url = await url_queue.get()
                try:
                    if url is None:
                        # all work is done
                        return
                    response = await fetch(url, session)
                    # ...do something with the response
                finally:
                    url_queue.task_done()
                    # calling task_done() is necessary for the url_queue.join() to work correctly
    
    async def fetch_all():
        url = "http://localhost:8080/?id={}"
        url_queue = asyncio.Queue(maxsize=100)
        worker_tasks = []
        for i in range(MAX_SIM_CONNS):
            wt = asyncio.create_task(fetch_worker(url_queue))
            worker_tasks.append(wt)
        for i in range(1, LAST_ID + 1):
            await url_queue.put(url.format(i))
        for i in range(MAX_SIM_CONNS):
            # tell the workers that the work is done
            await url_queue.put(None)
        await url_queue.join()
        await asyncio.gather(*worker_tasks)
    
    if __name__ == '__main__':
        asyncio.run(fetch_all())
    
    0 讨论(0)
提交回复
热议问题