Asynchronous Requests with Python requests

前端 未结 12 1290
予麋鹿
予麋鹿 2020-11-22 08:47

I tried the sample provided within the documentation of the requests library for python.

With async.map(rs), I get the response codes, but I want to get

相关标签:
12条回答
  • 2020-11-22 08:55

    I know this has been closed for a while, but I thought it might be useful to promote another async solution built on the requests library.

    list_of_requests = ['http://moop.com', 'http://doop.com', ...]
    
    from simple_requests import Requests
    for response in Requests().swarm(list_of_requests):
        print response.content
    

    The docs are here: http://pythonhosted.org/simple-requests/

    0 讨论(0)
  • 2020-11-22 08:58

    I tested both requests-futures and grequests. Grequests is faster but brings monkey patching and additional problems with dependencies. requests-futures is several times slower than grequests. I decided to write my own and simply wrapped requests into ThreadPoolExecutor and it was almost as fast as grequests, but without external dependencies.

    import requests
    import concurrent.futures
    
    def get_urls():
        return ["url1","url2"]
    
    def load_url(url, timeout):
        return requests.get(url, timeout = timeout)
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
    
        future_to_url = {executor.submit(load_url, url, 10): url for url in     get_urls()}
        for future in concurrent.futures.as_completed(future_to_url):
            url = future_to_url[future]
            try:
                data = future.result()
            except Exception as exc:
                resp_err = resp_err + 1
            else:
                resp_ok = resp_ok + 1
    
    0 讨论(0)
  • 2020-11-22 08:58

    Unfortunately, as far as I know, the requests library is not equipped for performing asynchronous requests. You can wrap async/await syntax around requests, but that will make the underlying requests no less synchronous. If you want true async requests, you must use other tooling that provides it. One such solution is aiohttp (Python 3.5.3+). It works well in my experience using it with the Python 3.7 async/await syntax. Below I write three implementations of performing n web requests using

    1. Purely synchronous requests (sync_requests_get_all) using the Python requests library
    2. Synchronous requests (async_requests_get_all) using the Python requests library wrapped in Python 3.7 async/await syntax and asyncio
    3. A truly asynchronous implementation (async_aiohttp_get_all) with the Python aiohttp library wrapped in Python 3.7 async/await syntax and asyncio
    import time
    import asyncio
    import requests
    import aiohttp
    
    from types import SimpleNamespace
    
    durations = []
    
    
    def timed(func):
        """
        records approximate durations of function calls
        """
        def wrapper(*args, **kwargs):
            start = time.time()
            print(f'{func.__name__:<30} started')
            result = func(*args, **kwargs)
            duration = f'{func.__name__:<30} finsished in {time.time() - start:.2f} seconds'
            print(duration)
            durations.append(duration)
            return result
        return wrapper
    
    
    async def fetch(url, session):
        """
        asynchronous get request
        """
        async with session.get(url) as response:
            response_json = await response.json()
            return SimpleNamespace(**response_json)
    
    
    async def fetch_many(loop, urls):
        """
        many asynchronous get requests, gathered
        """
        async with aiohttp.ClientSession() as session:
            tasks = [loop.create_task(fetch(url, session)) for url in urls]
            return await asyncio.gather(*tasks)
    
    
    @timed
    def asnyc_aiohttp_get_all(urls):
        """
        performs asynchronous get requests
        """
        loop = asyncio.get_event_loop()
        return loop.run_until_complete(fetch_many(loop, urls))
    
    
    @timed
    def sync_requests_get_all(urls):
        """
        performs synchronous get requests
        """
        # use session to reduce network overhead
        session = requests.Session()
        return [SimpleNamespace(**session.get(url).json()) for url in urls]
    
    
    @timed
    def async_requests_get_all(urls):
        """
        asynchronous wrapper around synchronous requests
        """
        loop = asyncio.get_event_loop()
        # use session to reduce network overhead
        session = requests.Session()
    
        async def async_get(url):
            return session.get(url)
    
        async_tasks = [loop.create_task(async_get(url)) for url in urls]
        return loop.run_until_complete(asyncio.gather(*async_tasks))
    
    
    if __name__ == '__main__':
        # this endpoint takes ~3 seconds to respond,
        # so a purely synchronous implementation should take
        # little more than 30 seconds and a purely asynchronous
        # implementation should take little more than 3 seconds.
        urls = ['https://postman-echo.com/delay/3']*10
    
        sync_requests_get_all(urls)
        async_requests_get_all(urls)
        asnyc_aiohttp_get_all(urls)
        print('----------------------')
        [print(duration) for duration in durations]
    

    On my machine, this is the output:

    sync_requests_get_all          started
    sync_requests_get_all          finsished in 30.92 seconds
    async_requests_get_all         started
    async_requests_get_all         finsished in 30.87 seconds
    asnyc_aiohttp_get_all          started
    asnyc_aiohttp_get_all          finsished in 3.22 seconds
    ----------------------
    sync_requests_get_all          finsished in 30.92 seconds
    async_requests_get_all         finsished in 30.87 seconds
    asnyc_aiohttp_get_all          finsished in 3.22 seconds
    
    0 讨论(0)
  • 2020-11-22 08:59

    maybe requests-futures is another choice.

    from requests_futures.sessions import FuturesSession
    
    session = FuturesSession()
    # first request is started in background
    future_one = session.get('http://httpbin.org/get')
    # second requests is started immediately
    future_two = session.get('http://httpbin.org/get?foo=bar')
    # wait for the first request to complete, if it hasn't already
    response_one = future_one.result()
    print('response one status: {0}'.format(response_one.status_code))
    print(response_one.content)
    # wait for the second request to complete, if it hasn't already
    response_two = future_two.result()
    print('response two status: {0}'.format(response_two.status_code))
    print(response_two.content)
    

    It is also recommended in the office document. If you don't want involve gevent, it's a good one.

    0 讨论(0)
  • 2020-11-22 08:59

    I have a lot of issues with most of the answers posted - they either use deprecated libraries that have been ported over with limited features, or provide a solution with too much magic on the execution of the request, making it difficult to error handle. If they do not fall into one of the above categories, they're 3rd party libraries or deprecated.

    Some of the solutions works alright purely in http requests, but the solutions fall short for any other kind of request, which is ludicrous. A highly customized solution is not necessary here.

    Simply using the python built-in library asyncio is sufficient enough to perform asynchronous requests of any type, as well as providing enough fluidity for complex and usecase specific error handling.

    import asyncio
    
    loop = asyncio.get_event_loop()
    
    def do_thing(params):
        async def get_rpc_info_and_do_chores(id):
            # do things
            response = perform_grpc_call(id)
            do_chores(response)
    
        async def get_httpapi_info_and_do_chores(id):
            # do things
            response = requests.get(URL)
            do_chores(response)
    
        async_tasks = []
        for element in list(params.list_of_things):
           async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
           async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))
    
        loop.run_until_complete(asyncio.gather(*async_tasks))
    

    How it works is simple. You're creating a series of tasks you'd like to occur asynchronously, and then asking a loop to execute those tasks and exit upon completion. No extra libraries subject to lack of maintenance, no lack of functionality required.

    0 讨论(0)
  • 2020-11-22 09:02

    Note

    The below answer is not applicable to requests v0.13.0+. The asynchronous functionality was moved to grequests after this question was written. However, you could just replace requests with grequests below and it should work.

    I've left this answer as is to reflect the original question which was about using requests < v0.13.0.


    To do multiple tasks with async.map asynchronously you have to:

    1. Define a function for what you want to do with each object (your task)
    2. Add that function as an event hook in your request
    3. Call async.map on a list of all the requests / actions

    Example:

    from requests import async
    # If using requests > v0.13.0, use
    # from grequests import async
    
    urls = [
        'http://python-requests.org',
        'http://httpbin.org',
        'http://python-guide.org',
        'http://kennethreitz.com'
    ]
    
    # A simple task to do to each response object
    def do_something(response):
        print response.url
    
    # A list to hold our things to do via async
    async_list = []
    
    for u in urls:
        # The "hooks = {..." part is where you define what you want to do
        # 
        # Note the lack of parentheses following do_something, this is
        # because the response will be used as the first argument automatically
        action_item = async.get(u, hooks = {'response' : do_something})
    
        # Add the task to our list of things to do via async
        async_list.append(action_item)
    
    # Do our list of things to do via async
    async.map(async_list)
    
    0 讨论(0)
提交回复
热议问题