Python/Urllib2/Threading: Single download thread faster than multiple download threads. Why?

问题

i am working on a project that requires me to create multiple threads to download a large remote file. I have done this already but i cannot understand while it takes a longer amount of time to download a the file with multiple threads compared to using just a single thread. I used my xampp localhost to carry out the time elapsed test. I would like to know if its a normal behaviour or is it because i have not tried downloading from a real server.

Thanks Kennedy

回答1:

9 women can't combine to make a baby in one month. If you have 10 threads, they each have only 10% the bandwidth of a single thread, and there is the additional overhead for context switching, etc.

回答2:

Python threading use something call the GIL (Golbal Interpreter Lock) that sometime degrade the programs execution time.

Without doing a lot of talk here i invite you to read this and this maybe it can help you to understand your problem, you can also see the two conference here and here.

Hope this can help :)

回答3:

Twisted uses non-blocking I/O, that means if data is not available on socket right now, doesn't block the entire thread, so you can handle many socket connections waiting for I/O in one thread simultaneous. But if doing something different than I/O (parsing large amounts of data) you still block the thread.

When you're using stdlib's socket module it does blocking I/O, that means when you're call socket.read and data is not available at the moment — it will block entire thread, so you need one thread per connection to handle concurrent download.

These are two approaches to concurrency:

Fork new thread for new connection (threading + socket from stdlib).
Multiplex I/O and handle may connections in one thread (Twisted).

来源：https://stackoverflow.com/questions/4219134/python-urllib2-threading-single-download-thread-faster-than-multiple-download-t