Multithread python requests [duplicate]

前端未结

关注

 3  2082

星月不相逢

相关标签:

3条回答

深忆病人

2021-02-11 03:29

i think it's a good idea to use mutil-thread like threading or multiprocess, or you can use grequests(async requests) due to gevent

0 讨论(0)
发布评论:

提交评论
- 加载中...

北恋

2021-02-11 03:31

You can use asyncio to run tasks concurrently. you can list the url responses (the ones which are completed as well as pending) using the returned value of asyncio.wait() and call coroutines asynchronously. The results will be in an unexpected order, but it is a faster approach.

import asyncio
import functools


async def parse(url):
    print('in parse for url {}'.format(url))

    info = await #write the logic for fetching the info, it waits for the responses from the urls

    print('done with url {}'.format(url))
    return 'parse {} result from {}'.format(info, url)


async def main(sites):
    print('starting main')
    parses = [
        parse(url)
        for url in sites
    ]
    print('waiting for phases to complete')
    completed, pending = await asyncio.wait(parses)

    results = [t.result() for t in completed]
    print('results: {!r}'.format(results))


event_loop = asyncio.get_event_loop()
try:
    websites = ['site1', 'site2', 'site3']
    event_loop.run_until_complete(main(websites))
finally:
    event_loop.close()

0 讨论(0)

南方客

2021-02-11 03:33

Well you can use threads since this is a I/O Bound problem. Using the built in threading library is your best choice. I used the Semaphore object to limit how many threads can run at the same time.

import time
import threading

# Number of parallel threads
lock = threading.Semaphore(2)


def parse(url):
   """
   Change to your logic, I just use sleep to mock http request.
   """

    print 'getting info', url
    sleep(2)

    # After we done, subtract 1 from the lock
    lock.release()


def parse_pool():
    # List of all your urls
    list_of_urls = ['website1', 'website2', 'website3', 'website4']

    # List of threads objects I so we can handle them later
    thread_pool = []

    for url in list_of_urls:
        # Create new thread that calls to your function with a url
        thread = threading.Thread(target=parse, args=(url,))
        thread_pool.append(thread)
        thread.start()

        # Add one to our lock, so we will wait if needed.
        lock.acquire()

    for thread in thread_pool:
        thread.join()

    print 'done'

0 讨论(0)

热议问题