问题
When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job? Is there an environment variable for this purpose? The GPUs I'm using are all nvidia GPUs. Thanks.
回答1:
You can get the GPU id with the environment variable CUDA_VISIBLE_DEVICES
. This variable is a comma separated list of the GPU ids assigned to the job.
回答2:
Slurm stores this information in an environment variable, SLURM_JOB_GPUS
.
One way to keep track of such information is to log all SLURM related variables when running a job, for example (following Kaldi's slurm.pl, which is a great script to wrap Slurm jobs) by including the following command within the script run by sbatch
:
set | grep SLURM | while read line; do echo "# $line"; done
来源:https://stackoverflow.com/questions/43967405/how-to-get-the-id-of-gpu-allocated-to-a-slurm-job-on-a-multiple-gpus-node