Processor/socket affinity in openMPI?

后端 未结 2 477
广开言路
广开言路 2021-02-02 03:35

I know,there are some basic function in openMPI implementation for mapping the different processes to different cores of different sockets(if the system have more than one socke

2条回答
  •  死守一世寂寞
    2021-02-02 04:01

    1. It depends on so many factors that it's impossible for a single "silver bullet" answer to exist. Among the factors are the computational intensity (FLOPS/byte) and the ratio between the amount of local data to the amount of data being passed between the processes. It also depends on the architecture of the system. Computational intensity can be estimated analytically or measured with a profiling tool like PAPI, Likwid, etc. System's architecture can be examined using the lstopo utility, part of the hwloc library, which comes with Open MPI. Unfortunately lstopo cannot tell you how fast each memory channel is and how fast/latent the links between the NUMA nodes are.

    2. Yes, there is: --report-bindings makes each rank print to its standard error output the affinity mask that apply to it. The output varies a bit among the different Open MPI versions:

    Open MPI 1.5.x shows the hexadecimal value of the affinity mask:

    mpiexec --report-bindings --bind-to-core --bycore

    [hostname:00599] [[10634,0],0] odls:default:fork binding child [[10634,1],0] to cpus 0001
    [hostname:00599] [[10634,0],0] odls:default:fork binding child [[10634,1],1] to cpus 0002
    [hostname:00599] [[10634,0],0] odls:default:fork binding child [[10634,1],2] to cpus 0004
    [hostname:00599] [[10634,0],0] odls:default:fork binding child [[10634,1],3] to cpus 0008
    

    This shows that rank 0 has its affinity mask set to 0001 which allows it to run on CPU 0 only. Rank 1 has its affinity mask set to 0002 which allows it to run on CPU 1 only. And so on.

    mpiexec --report-bindings --bind-to-socket --bysocket

    [hostname:21302] [[30955,0],0] odls:default:fork binding child [[30955,1],0] to socket 0 cpus 003f
    [hostname:21302] [[30955,0],0] odls:default:fork binding child [[30955,1],1] to socket 1 cpus 0fc0
    [hostname:21302] [[30955,0],0] odls:default:fork binding child [[30955,1],2] to socket 0 cpus 003f
    [hostname:21302] [[30955,0],0] odls:default:fork binding child [[30955,1],3] to socket 1 cpus 0fc0
    

    In that case the affinity mask alternates between 003f and 0fc0. 003f in binary is 0000000000111111 and such an affinity mask allows each even rank to execute on CPUs from 0 to 5. 0fc0 is 0000111111000000 and therefore odd ranks are only scheduled on CPUs 5 to 11.

    Open MPI 1.6.x uses a nicer graphical display instead:

    mpiexec --report-bindings --bind-to-core --bycore

    [hostname:39646] MCW rank 0 bound to socket 0[core 0]: [B . . . . .][. . . . . .]
    [hostname:39646] MCW rank 1 bound to socket 0[core 1]: [. B . . . .][. . . . . .]
    [hostname:39646] MCW rank 2 bound to socket 0[core 2]: [. . B . . .][. . . . . .]
    [hostname:39646] MCW rank 3 bound to socket 0[core 3]: [. . . B . .][. . . . . .]
    

    mpiexec --report-bindings --bind-to-socket --bysocket

    [hostname:13888] MCW rank 0 bound to socket 0[core 0-5]: [B B B B B B][. . . . . .]
    [hostname:13888] MCW rank 1 bound to socket 1[core 0-5]: [. . . . . .][B B B B B B]
    [hostname:13888] MCW rank 2 bound to socket 0[core 0-5]: [B B B B B B][. . . . . .]
    [hostname:13888] MCW rank 3 bound to socket 1[core 0-5]: [. . . . . .][B B B B B B]
    

    Each socket is represented graphically as a set of square brackets with each core represented by a dot. The core(s) that each rank is bound to is/are denoted by the letter B. Processes are bound to the first hardware thread only.

    Open MPI 1.7.x is a bit more verbose and also knows about hardware threads:

    mpiexec --report-bindings --bind-to-core

    [hostname:28894] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../..][../../../../../..]
    [hostname:28894] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../..][../../../../../..]
    [hostname:28894] MCW rank 2 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../..][../../../../../..]
    [hostname:28894] MCW rank 3 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../..][../../../../../..]
    

    mpiexec --report-bindings --bind-to-socket

    [hostname:29807] MCW rank 0 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [BB/BB/BB/BB/BB/BB][../../../../../..]
    [hostname:29807] MCW rank 1 bound to socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: [../../../../../..][BB/BB/BB/BB/BB/BB]
    [hostname:29807] MCW rank 2 bound to socket 0[core 0[hwt 0-1]], socket 0[core 1[hwt 0-1]], socket 0[core 2[hwt 0-1]], socket 0[core 3[hwt 0-1]], socket 0[core 4[hwt 0-1]], socket 0[core 5[hwt 0-1]]: [BB/BB/BB/BB/BB/BB][../../../../../..]
    [hostname:29807] MCW rank 3 bound to socket 1[core 6[hwt 0-1]], socket 1[core 7[hwt 0-1]], socket 1[core 8[hwt 0-1]], socket 1[core 9[hwt 0-1]], socket 1[core 10[hwt 0-1]], socket 1[core 11[hwt 0-1]]: [../../../../../..][BB/BB/BB/BB/BB/BB]
    

    Open MPI 1.7.x also replaces the --bycore and --bysocket options with the more general --rank-by option.

提交回复
热议问题