So I just finished watching this talk on the Python Global Interpreter Lock (GIL) http://blip.tv/file/2232410.
The gist of it is that the GIL is a pretty good design
I've found the following rule of thumb sufficient over the years: If the workers are dependent on some shared state, I use one multiprocessing process per core (CPU bound), and per core a fix pool of worker threads (I/O bound). The OS will take care of assigining the different Python processes to the cores.