Python aiohttp/asyncio - how to process returned data

后端 未结 2 799
-上瘾入骨i
-上瘾入骨i 2021-02-04 05:07

Im in the process of moving some synchronous code to asyncio using aiohttp. the synchronous code was taking 15 minutes to run, so I\'m hoping to improves this.

I have so

2条回答
  •  北恋
    北恋 (楼主)
    2021-02-04 05:47

    Here's an example with concurrent.futures.ProcessPoolExecutor. If it's created without specifying max_workers, the implementation will use os.cpu_count instead. Also note that asyncio.wrap_future is public but undocumented. Alternatively, there's AbstractEventLoop.run_in_executor.

    import asyncio
    from concurrent.futures import ProcessPoolExecutor
    
    import aiohttp
    import lxml.html
    
    
    def process_page(html):
        '''Meant for CPU-bound workload'''
        tree = lxml.html.fromstring(html)
        return tree.find('.//title').text
    
    
    async def fetch_page(url, session):
        '''Meant for IO-bound workload'''
        async with session.get(url, timeout = 15) as res:
          return await res.text()
    
    
    async def process(url, session, pool):
        html = await fetch_page(url, session)
        return await asyncio.wrap_future(pool.submit(process_page, html))
    
    
    async def dispatch(urls):
        pool = ProcessPoolExecutor()
        async with aiohttp.ClientSession() as session:
            coros = (process(url, session, pool) for url in urls)
            return await asyncio.gather(*coros)
    
    
    def main():
        urls = [
          'https://stackoverflow.com/',
          'https://serverfault.com/',
          'https://askubuntu.com/',
          'https://unix.stackexchange.com/'
        ]
        result = asyncio.get_event_loop().run_until_complete(dispatch(urls))
        print(result)
    
    if __name__ == '__main__':
        main()
    

提交回复
热议问题