Executing hybrid OpenMP/MPI jobs in MPICH

不打扰是莪最后的温柔 提交于 2019-12-01 13:53:46

Since libgomp is missing the equivalent of the respect clause of Intel's KMP_AFFINITY, you could hack it around by providing a wrapper script that reads the list of allowed CPUs from /proc/PID/status (Linux-specific):

#!/bin/sh

GOMP_CPU_AFFINITY=$(grep ^Cpus_allowed_list /proc/self/status | grep -Eo '[0-9,-]+')
export GOMP_CPU_AFFINITY
exec $*

This should work with -bind-to numa then.

I do have a somewhat different solution for binding OpenMP threads to sockets / NUMA nodes when running a mixed MPI / OpenMP code, whenever the MPI library and the OpenMP runtime do not collaborate well by default. The idea is to use numactl and its binding properties. This has even the extra advantage of not only binding the threads to the socket, but also the memory, forcing good memory locality and maximising the bandwidth.

To that end, I first disable any MPI and/or OpenMP binding (with the corresponding mpiexec option for teh former, and with setting OMP_PROC_BIND to false for the later). Then I use the following omp_bind.sh shell script:

#!/bin/bash

numactl --cpunodebind=$(( $PMI_ID % 2 )) --membind=$(( $PMI_ID % 2 )) "$@"

And I run my code this way:

OMP_PROC_BIND="false" OMP_NUM_THREADS=8 mpiexec -ppn 2 -bind-to-none omp_bind.sh a.out args

Depending on the number of sockets on the machine, the 2 would need to be adjusted on the shell. Likewise, the PMI_ID depends on the version of mpiexec used. I saw sometimes MPI_RANK, PMI_RANK, etc.

But anyway, I always found a way of getting it to work and the memory binding comes very handy sometimes, especially to avoid the potential pitfall of the IO buffers eating up all memory on the first NUMA node, leading to the code's memory for the process running on the first socket, allocating memory on the second NUMA node.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!