How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

妖精的绣舞 提交于 2019-12-10 16:56:16

问题


When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job? Is there an environment variable for this purpose? The GPUs I'm using are all nvidia GPUs. Thanks.


回答1:


You can get the GPU id with the environment variable CUDA_VISIBLE_DEVICES. This variable is a comma separated list of the GPU ids assigned to the job.




回答2:


Slurm stores this information in an environment variable, SLURM_JOB_GPUS.

One way to keep track of such information is to log all SLURM related variables when running a job, for example (following Kaldi's slurm.pl, which is a great script to wrap Slurm jobs) by including the following command within the script run by sbatch:

set | grep SLURM | while read line; do echo "# $line"; done


来源:https://stackoverflow.com/questions/43967405/how-to-get-the-id-of-gpu-allocated-to-a-slurm-job-on-a-multiple-gpus-node

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!