Im in the process of moving some synchronous code to asyncio using aiohttp. the synchronous code was taking 15 minutes to run, so I\'m hoping to improves this.
I have so
Here's an example with concurrent.futures.ProcessPoolExecutor. If it's created without specifying max_workers
, the implementation will use os.cpu_count
instead. Also note that asyncio.wrap_future is public but undocumented. Alternatively, there's AbstractEventLoop.run_in_executor.
import asyncio
from concurrent.futures import ProcessPoolExecutor
import aiohttp
import lxml.html
def process_page(html):
'''Meant for CPU-bound workload'''
tree = lxml.html.fromstring(html)
return tree.find('.//title').text
async def fetch_page(url, session):
'''Meant for IO-bound workload'''
async with session.get(url, timeout = 15) as res:
return await res.text()
async def process(url, session, pool):
html = await fetch_page(url, session)
return await asyncio.wrap_future(pool.submit(process_page, html))
async def dispatch(urls):
pool = ProcessPoolExecutor()
async with aiohttp.ClientSession() as session:
coros = (process(url, session, pool) for url in urls)
return await asyncio.gather(*coros)
def main():
urls = [
'https://stackoverflow.com/',
'https://serverfault.com/',
'https://askubuntu.com/',
'https://unix.stackexchange.com/'
]
result = asyncio.get_event_loop().run_until_complete(dispatch(urls))
print(result)
if __name__ == '__main__':
main()