I have a python script that I\'ve written using the multiprocessing module, for faster execution. The calculation is embarrassingly parallel, so the efficiency scales with the
OpenMPI's mpirun, v1.7 and later, defaults to binding processes to cores - that is, when it launches the python junk.py process, it binds it to the core that it will run on. That's fine, and the right default behaviour for most MPI use cases. But here each MPI task is then forking more processes (through the multiprocessing package), and those forked processes inherit the binding state of their parent - so they're all bound to the same core, fighting amongst themselves. (The "P" column in top will show you they're all on the same processor)
If you mpirun -np 2, you'll find two sets of three processes, each on a different core, each contending amongst themselves.
With OpenMPI, you can avoid this by turning off binding,
mpirun -np 1 --bind-to none junk.py
or choosing some other binding which makes sense given the final geometry of your run. MPICH has similar options with hydra.
Note that the fork()ing of subprocesses with mpi isn't always safe or supported, particularly with clusters running with infiniband interconnects, but OpenMPI's mpirun/mpiexec will warn you if it isn't safe.