mpi4py: substantial slowdown by idle cores

问题

I have a python script that recruits MPI for parallel calculations. The scheme of the calculations is following: data processing round 1 - data exchange between processes - data processing round 2. I have a 16 logical core machine (2 x Intel Xeon E5520 2.27GHz). For a reason round 1 cannot be run in parallel. Therefore, 15 cores stay idle. However, despite this fact calculations experience more than 2-fold slowdown.

The problem is illustrated by this script (saved as test.py):

from mpi4py import MPI
import time

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
comm.barrier()
stime = time.time()

if rank == 0:
    print('begin calculations at {:.3f}'.format(time.time() - stime))
    for i in range(1000000000):
        a = 2 * 2
    print('end calculations at {:.3f}'.format(time.time() - stime))
    comm.bcast(a, root = 0)
    print('end data exchange at {:.3f}'.format(time.time() - stime))
else:
    a = comm.bcast(root = 0)

When I run it on 2 cores, I observe:

$ mpiexec -n 2 python3 test.py
begin calculations at 0.000
end calculations at 86.954
end data exchange at 86.954

When I run it on 16 cores, I observe:

$ mpiexec -n 16 python3 test.py
begin calculations at 0.000
end calculations at 174.156
end data exchange at 174.157

Can anyone explain such a difference? An Idea, how to get rid of it, would also be useful.

回答1:

OK, I have finally figured it out.

There are several features contributing to the slowdown:

Waiting for data receiving is active (it checks constantly, if data already arrived), which makes waiting processes no more idle.
Intel virtual cores do not contribute to calculation speed. That means, 8 core machine is still 8 core and behaves like such, irrespective of virtual ones (in some cases, for example, when multithreading module is applied, they can make a modest boost, but not with MPI).

Taking this into account, I modified code, introducing the sleep() function into the waiting processes. Results are represented on the chart (10 measurements were done in each case).

from mpi4py import MPI
import time

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
comm.barrier()
stime = time.time()

if rank == 0:
    for i in range(1000000000):
        a = 2 * 2
    print('end calculations at {:.3f}'.format(time.time() - stime))
    for i in range(1, size):
        comm.send(a, dest = i)
    print('end data exchange at {:.3f}'.format(time.time() - stime))
else:
    while not comm.Iprobe(source = 0):
        time.sleep(1)
    a = comm.recv(source = 0)

enter image description here

来源：https://stackoverflow.com/questions/29170492/mpi4py-substantial-slowdown-by-idle-cores

标签

python

performance

mpi