mpi4py: substantial slowdown by idle cores

半城伤御伤魂 提交于 2020-07-10 03:50:51

问题


I have a python script that recruits MPI for parallel calculations. The scheme of the calculations is following: data processing round 1 - data exchange between processes - data processing round 2. I have a 16 logical core machine (2 x Intel Xeon E5520 2.27GHz). For a reason round 1 cannot be run in parallel. Therefore, 15 cores stay idle. However, despite this fact calculations experience more than 2-fold slowdown.

The problem is illustrated by this script (saved as test.py):

from mpi4py import MPI
import time

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
comm.barrier()
stime = time.time()

if rank == 0:
    print('begin calculations at {:.3f}'.format(time.time() - stime))
    for i in range(1000000000):
        a = 2 * 2
    print('end calculations at {:.3f}'.format(time.time() - stime))
    comm.bcast(a, root = 0)
    print('end data exchange at {:.3f}'.format(time.time() - stime))
else:
    a = comm.bcast(root = 0)

When I run it on 2 cores, I observe:

$ mpiexec -n 2 python3 test.py
begin calculations at 0.000
end calculations at 86.954
end data exchange at 86.954

When I run it on 16 cores, I observe:

$ mpiexec -n 16 python3 test.py
begin calculations at 0.000
end calculations at 174.156
end data exchange at 174.157

Can anyone explain such a difference? An Idea, how to get rid of it, would also be useful.


回答1:


OK, I have finally figured it out.

There are several features contributing to the slowdown:

  • Waiting for data receiving is active (it checks constantly, if data already arrived), which makes waiting processes no more idle.
  • Intel virtual cores do not contribute to calculation speed. That means, 8 core machine is still 8 core and behaves like such, irrespective of virtual ones (in some cases, for example, when multithreading module is applied, they can make a modest boost, but not with MPI).

Taking this into account, I modified code, introducing the sleep() function into the waiting processes. Results are represented on the chart (10 measurements were done in each case).

from mpi4py import MPI
import time

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
comm.barrier()
stime = time.time()

if rank == 0:
    for i in range(1000000000):
        a = 2 * 2
    print('end calculations at {:.3f}'.format(time.time() - stime))
    for i in range(1, size):
        comm.send(a, dest = i)
    print('end data exchange at {:.3f}'.format(time.time() - stime))
else:
    while not comm.Iprobe(source = 0):
        time.sleep(1)
    a = comm.recv(source = 0)

enter image description here



来源:https://stackoverflow.com/questions/29170492/mpi4py-substantial-slowdown-by-idle-cores

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!