Making multiple API calls in parallel using Python (IPython)

前端 未结 1 1639
醉梦人生
醉梦人生 2021-01-31 12:22

I am working with Python (IPython & Canopy) and a RESTful content API, on my local machine (Mac).

I have an array of 3000 unique IDs to pull data for from the API an

1条回答
  •  北荒
    北荒 (楼主)
    2021-01-31 13:02

    Without more information about what you are doing in particular, it is hard to say for sure, but a simple threaded approach may make sense.

    Assuming you have a simple function that processes a single ID:

    import requests
    
    url_t = "http://localhost:8000/records/%i"
    
    def process_id(id):
        """process a single ID"""
        # fetch the data
        r = requests.get(url_t % id)
        # parse the JSON reply
        data = r.json()
        # and update some data with PUT
        requests.put(url_t % id, data=data)
        return data
    

    You can expand that into a simple function that processes a range of IDs:

    def process_range(id_range, store=None):
        """process a number of ids, storing the results in a dict"""
        if store is None:
            store = {}
        for id in id_range:
            store[id] = process_id(id)
        return store
    

    and finally, you can fairly easily map sub-ranges onto threads to allow some number of requests to be concurrent:

    from threading import Thread
    
    def threaded_process_range(nthreads, id_range):
        """process the id range in a specified number of threads"""
        store = {}
        threads = []
        # create the threads
        for i in range(nthreads):
            ids = id_range[i::nthreads]
            t = Thread(target=process_range, args=(ids,store))
            threads.append(t)
    
        # start the threads
        [ t.start() for t in threads ]
        # wait for the threads to finish
        [ t.join() for t in threads ]
        return store
    

    A full example in an IPython Notebook: http://nbviewer.ipython.org/5732094

    If your individual tasks take a more widely varied amount of time, you may want to use a ThreadPool, which will assign jobs one at a time (often slower if individual tasks are very small, but guarantees better balance in heterogenous cases).

    0 讨论(0)
提交回复
热议问题