问题
I am writing a GUI app in Pyglet that has to display tens to hundreds of thumbnails from the Internet. Right now, I am using urllib.urlretrieve to grab them, but this blocks each time until they are finished, and only grabs one at a time.
I would prefer to download them in parallel and have each one display as soon as it's finished, without blocking the GUI at any point. What is the best way to do this?
I don't know much about threads, but it looks like the threading module might help? Or perhaps there is some easy way I've overlooked.
回答1:
You'll probably benefit from threading
or multiprocessing modules. You don't actually need to create all those Thread
-based classes by yourself, there is a simpler method using Pool.map
:
from multiprocessing import Pool
def fetch_url(url):
# Fetch the URL contents and save it anywhere you need and
# return something meaningful (like filename or error code),
# if you wish.
...
pool = Pool(processes=4)
result = pool.map(f, image_url_list)
回答2:
As you suspected, this is a perfect situation for threading. Here is a short guide I found immensely helpful when doing my own first bit of threading in python.
回答3:
As you rightly indicated, you could create a number of threads, each of which is responsible for performing urlretrieve operations. This allows the main thread to continue uninterrupted.
Here is a tutorial on threading in python: http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf
回答4:
Here's an example of how to use threading.Thread. Just replace the class name with your own and the run function with your own. Note that threading is great for IO restricted applications like your's and can really speed it up. Using pythong threading strictly for computation in standard python doesn't help because only one thread can compute at a time.
import threading, time
class Ping(threading.Thread):
def __init__(self, multiple):
threading.Thread.__init__(self)
self.multiple = multiple
def run(self):
#sleeps 3 seconds then prints 'pong' x times
time.sleep(3)
printString = 'pong' * self.multiple
pingInstance = Ping(3)
pingInstance.start() #your run function will be called with the start function
print "pingInstance is alive? : %d" % pingInstance.isAlive() #will return True, or 1
print "Number of threads alive: %d" % threading.activeCount()
#main thread + class instance
time.sleep(3.5)
print "Number of threads alive: %d" % threading.activeCount()
print "pingInstance is alive?: %d" % pingInstance.isAlive()
#isAlive returns false when your thread reaches the end of it's run function.
#only main thread now
回答5:
You have these choices:
- Threads: easiest but doesn't scale well
- Twisted: medium difficulty, scales well but shares CPU due to GIL and being single threaded.
- Multiprocessing: hardest. Scales well if you know how to write your own event loop.
I recommend just using threads unless you need an industrial scale fetcher.
回答6:
You either need to use threads, or an asynchronous networking library such as Twisted. I suspect that using threads might be simpler in your particular use case.
来源:https://stackoverflow.com/questions/983144/how-to-do-a-non-blocking-url-fetch-in-python