Spreading OpenMP threads among NUMA nodes

问题

I have a matrix spread among four NUMA-node local memories. Now I want to open 4 threads, each one on a CPU corresponding to a different NUMA-node, so that each thread can access its part of the matrix as fast as possible. OpenMP has the "proc_bind(spread)" option, but it puts the threads on the same NUMA-node, but on far apart CPUs.

How can I force the threads to bind to different NUMA nodes?

Or, if that is not possible: When I use all cores on all nodes (256 threads total), I know how to get the ID of the NUMA node, but I can't control which thread gets which indices e.g. in a for loop. How could I distribute my workload efficiently with respect to the NUMA configuration?

回答1:

Here is what I'd do:

Check which cores are attached to which NUMA node using numactl -H
Assuming for example cores 0, 1, 2 and 3 are each on one of the 4 NUMA nodes you want to use, set the environment variable OMP_PLACES to bind the threads to these cores: export OMP_PLACES="{0},{1},{2},{3}"
Finally launching your OpenMP binary with the local memory allocation policy for numactl: numactl -l myBinary

For what I understood of your question, that should work.

来源：https://stackoverflow.com/questions/50090515/spreading-openmp-threads-among-numa-nodes

标签

c++

multithreading

openmp

numa

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!