问题
I have a matrix spread among four NUMA-node local memories. Now I want to open 4 threads, each one on a CPU corresponding to a different NUMA-node, so that each thread can access its part of the matrix as fast as possible. OpenMP has the "proc_bind(spread)" option, but it puts the threads on the same NUMA-node, but on far apart CPUs.
How can I force the threads to bind to different NUMA nodes?
Or, if that is not possible: When I use all cores on all nodes (256 threads total), I know how to get the ID of the NUMA node, but I can't control which thread gets which indices e.g. in a for loop. How could I distribute my workload efficiently with respect to the NUMA configuration?
回答1:
Here is what I'd do:
- Check which cores are attached to which NUMA node using
numactl -H
- Assuming for example cores 0, 1, 2 and 3 are each on one of the 4 NUMA nodes you want to use, set the environment variable
OMP_PLACES
to bind the threads to these cores:export OMP_PLACES="{0},{1},{2},{3}"
- Finally launching your OpenMP binary with the local memory allocation policy for numactl:
numactl -l myBinary
For what I understood of your question, that should work.
来源:https://stackoverflow.com/questions/50090515/spreading-openmp-threads-among-numa-nodes