So I knocked up some test code to see how the multiprocessing module would scale on cpu bound work compared to threading. On linux I get the performance increase that I\'d e
Currently, your counter() function is not modifying much state. Try changing counter() so that it modifies many pages of memory. Then run a cpu bound loop. See if there is still a large disparity between linux and windows.
I'm not running python 2.6 right now, so I can't try it myself.
The python documentation for multiprocessing blames the lack of os.fork() for the problems in Windows. It may be applicable here.
See what happens when you import psyco. First, easy_install it:
C:\Users\hughdbrown>\Python26\scripts\easy_install.exe psyco
Searching for psyco
Best match: psyco 1.6
Adding psyco 1.6 to easy-install.pth file
Using c:\python26\lib\site-packages
Processing dependencies for psyco
Finished processing dependencies for psyco
Add this to the top of your python script:
import psyco
psyco.full()
I get these results without:
serialrun took 1191.000 ms
parallelrun took 3738.000 ms
threadedrun took 2728.000 ms
I get these results with:
serialrun took 43.000 ms
parallelrun took 3650.000 ms
threadedrun took 265.000 ms
Parallel is still slow, but the others burn rubber.
Edit: also, try it with the multiprocessing pool. (This is my first time trying this and it is so fast, I figure I must be missing something.)
@print_timing
def parallelpoolrun(reps):
pool = multiprocessing.Pool(processes=4)
result = pool.apply_async(counter, (reps,))
Results:
C:\Users\hughdbrown\Documents\python\StackOverflow>python 1289813.py
serialrun took 57.000 ms
parallelrun took 3716.000 ms
parallelpoolrun took 128.000 ms
threadedrun took 58.000 ms
Processes are much more lightweight under UNIX variants. Windows processes are heavy and take much more time to start up. Threads are the recommended way of doing multiprocessing on windows.
Just starting the pool takes a long time. I have found in 'real world' programs if I can keep a pool open and reuse it for many different processes,passing the reference down through method calls (usually using map.async) then on Linux I can save a few percent but on Windows I can often halve the time taken. Linux is always quicker for my particular problems but even on Windows I get net benefits from multiprocessing.
It's been said that creating processes on Windows is more expensive than on linux. If you search around the site you will find some information. Here's one I found easily.