Python aiohttp/asyncio - how to process returned data

后端未结

关注

 2  799

-上瘾入骨i 2021-02-04 05:07

Im in the process of moving some synchronous code to asyncio using aiohttp. the synchronous code was taking 15 minutes to run, so I\'m hoping to improves this.

I have so

2条回答

北恋 (楼主)

2021-02-04 05:47

Here's an example with concurrent.futures.ProcessPoolExecutor. If it's created without specifying max_workers, the implementation will use os.cpu_count instead. Also note that asyncio.wrap_future is public but undocumented. Alternatively, there's AbstractEventLoop.run_in_executor.

import asyncio
from concurrent.futures import ProcessPoolExecutor

import aiohttp
import lxml.html


def process_page(html):
    '''Meant for CPU-bound workload'''
    tree = lxml.html.fromstring(html)
    return tree.find('.//title').text


async def fetch_page(url, session):
    '''Meant for IO-bound workload'''
    async with session.get(url, timeout = 15) as res:
      return await res.text()


async def process(url, session, pool):
    html = await fetch_page(url, session)
    return await asyncio.wrap_future(pool.submit(process_page, html))


async def dispatch(urls):
    pool = ProcessPoolExecutor()
    async with aiohttp.ClientSession() as session:
        coros = (process(url, session, pool) for url in urls)
        return await asyncio.gather(*coros)


def main():
    urls = [
      'https://stackoverflow.com/',
      'https://serverfault.com/',
      'https://askubuntu.com/',
      'https://unix.stackexchange.com/'
    ]
    result = asyncio.get_event_loop().run_until_complete(dispatch(urls))
    print(result)

if __name__ == '__main__':
    main()

0 讨论(0)

查看其它2个回答