问题
I have a python script that recruits MPI for parallel calculations. The scheme of the calculations is following: data processing round 1 - data exchange between processes - data processing round 2. I have a 16 logical core machine (2 x Intel Xeon E5520 2.27GHz). For a reason round 1 cannot be run in parallel. Therefore, 15 cores stay idle. However, despite this fact calculations experience more than 2-fold slowdown.
The problem is illustrated by this script (saved as test.py
):
from mpi4py import MPI
import time
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
comm.barrier()
stime = time.time()
if rank == 0:
print('begin calculations at {:.3f}'.format(time.time() - stime))
for i in range(1000000000):
a = 2 * 2
print('end calculations at {:.3f}'.format(time.time() - stime))
comm.bcast(a, root = 0)
print('end data exchange at {:.3f}'.format(time.time() - stime))
else:
a = comm.bcast(root = 0)
When I run it on 2 cores, I observe:
$ mpiexec -n 2 python3 test.py
begin calculations at 0.000
end calculations at 86.954
end data exchange at 86.954
When I run it on 16 cores, I observe:
$ mpiexec -n 16 python3 test.py
begin calculations at 0.000
end calculations at 174.156
end data exchange at 174.157
Can anyone explain such a difference? An Idea, how to get rid of it, would also be useful.
回答1:
OK, I have finally figured it out.
There are several features contributing to the slowdown:
- Waiting for data receiving is active (it checks constantly, if data already arrived), which makes waiting processes no more idle.
- Intel virtual cores do not contribute to calculation speed. That means, 8 core machine is still 8 core and behaves like such, irrespective of virtual ones (in some cases, for example, when
multithreading
module is applied, they can make a modest boost, but not with MPI).
Taking this into account, I modified code, introducing the sleep() function into the waiting processes. Results are represented on the chart (10 measurements were done in each case).
from mpi4py import MPI
import time
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
comm.barrier()
stime = time.time()
if rank == 0:
for i in range(1000000000):
a = 2 * 2
print('end calculations at {:.3f}'.format(time.time() - stime))
for i in range(1, size):
comm.send(a, dest = i)
print('end data exchange at {:.3f}'.format(time.time() - stime))
else:
while not comm.Iprobe(source = 0):
time.sleep(1)
a = comm.recv(source = 0)
来源:https://stackoverflow.com/questions/29170492/mpi4py-substantial-slowdown-by-idle-cores