asyncio/aiohttp - create_task() blocks event loop, gather results in “This event loop is already running ”

假如想象 提交于 2021-01-01 08:12:18

问题


I cannot get both my consumer and my producer running at the same time, it seems worker(), or the aiohttp server are blocking - even when executed simultaneously with asyncio.gather()

If instead I do loop.create_task(worker), this will block and server will never be started.

I've tried every variation I can imagine, including nest_asyncio module - and I can only ever get one of the two components running.

What am I doing wrong?

async def worker():
    batch_size = 30

    print("running worker")
    while True:
        if queue.qsize() > 0:
            future_map = {}

            size = min(queue.qsize(), batch_size)
            batch = []
            for _ in range(size):
                item = await queue.get()
                print("Item: "+str(item))
                future_map[item["fname"]] = item["future"]
                batch.append(item)

            print("processing", batch)
            results = await process_files(batch)
            for dic in results:
                for key, value in dic.items():
                    print(str(key)+":"+str(value))
                    future_map[key].set_result(value)

            # mark the tasks done
            for _ in batch:
                queue.task_done()



def start_worker():
    loop.create_task(worker())

def create_app():
    app = web.Application()
    routes = web.RouteTableDef()
    @routes.post("/decode")
    async def handle_post(request):
        return await decode(request)
    app.add_routes(routes)
    app.on_startup.append(start_worker())
    return app

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    queue = asyncio.Queue()
    app = create_app()
    web.run_app(app)

The above prints "running worker" and does not start the AIOHTTP server.

def run(loop, app, port=8001):
handler = app.make_handler()
f = loop.create_server(handler, '0.0.0.0', port)
srv = loop.run_until_complete(f)
print('serving on', srv.sockets[0].getsockname())
try:
    loop.run_forever()
except KeyboardInterrupt:
    pass
finally:
    loop.run_until_complete(handler.finish_connections(1.0))
    srv.close()
    loop.run_until_complete(srv.wait_closed())
    loop.run_until_complete(app.finish())
loop.close()

def main(app):
    asyncio.gather(run(loop, app), worker())

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    queue = asyncio.Queue()
    app = create_app()
    main(app)

The above starts server, but not the worker.


回答1:


Although await asyncio.sleep(0) fixes the immediate issue, it's not an ideal fix; in fact, it's somewhat of an anti-pattern. To understand why, let's examine why the problem occurs in more detail. The heart of the matter is the worker's while loop - as soon as the queue becomes empty, it effectively boils down to:

while True:
    pass

Sure, the part marked as pass contains a check for qsize() leading to execution of additional code if the queue is non-empty, but once qsize() first reaches 0, that check will always evaluate to false. This is because asyncio is single-threaded and when qsize() == 0 the, the while loop no longer encounters a single await. Without await, it's impossible to relinquish control to a coroutine or callback that might populate the queue, and the while loop becomes infinite.

This is why await asyncio.sleep(0) inside the loop helps: it forces a context switch, guaranteeing that other coroutines will get a chance to run and eventually re-populate the queue. However, it also keeps the while loop constantly running, which means that the event loop will never go to sleep, even if the queue remains empty for hours on end. The event loop will remain in a busy-waiting state for as long as the worker is active. You could alleviate the busy-wait by adjusting the sleep interval to a non-zero value, as suggested by dirn, but that will introduce latency, and will still not allow the event loop to go to sleep when there's no activity.

The proper fix is to not check for qsize(), but to use queue.get() to get the next item. This will sleep as long as needed until the item appears, and immediately wake up the coroutine once it does. Don't worry that this will "block" the worker - it's precisely the point of asyncio that you can have multiple coroutines and that one being "blocked" on an await simply allows others to proceed. For example:

async def worker():
    batch_size = 30

    while True:
        # wait for an item and add it to the batch
        batch = [await queue.get()]
        # batch up more items if available
        while not queue.empty() and len(batch) < batch_size:
            batch.append(await queue.get())
        # process the batch
        future_map = {item["fname"]: item["future"] for item in batch}
        results = await process_files(batch)
        for dic in results:
            for key, value in dic.items():
                print(str(key)+":"+str(value))
                future_map[key].set_result(value)
        for _ in batch:
            queue.task_done()

In this variant we await for something in every iteration of the loop, and sleeping is not needed.



来源:https://stackoverflow.com/questions/63322094/asyncio-aiohttp-create-task-blocks-event-loop-gather-results-in-this-event

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!