问题
I would like to run a job on the cluster. There are a different number of CPUs on different nodes and I have no idea which nodes will be assigned to me. What are the proper options so that the job can create as many tasks as CPUs on all nodes?
#!/bin/bash -l
#SBATCH -p normal
#SBATCH -N 4
#SBATCH -t 96:00:00
srun -n 128 ./run
回答1:
One dirty hack to achieve the objective is using the environment variables provided by the SLURM. For a sample sbatch file:
#!/bin/bash
#SBATCH --job-name=test
#SBATCH --output=res.txt
#SBATCH --time=10:00
#SBATCH --nodes=2
echo $SLURM_CPUS_ON_NODE
echo $SLURM_JOB_NUM_NODES
num_core=$SLURM_CPUS_ON_NODE
num_node=$SLURM_JOB_NUM_NODES
let proc_num=$num_core*$num_node
echo $proc_num
srun -n $proc_num ./run
Only the number of nodes are requested in the job script. $SLURM_CPUS_ON_NODE
will provide the number of cpus per node. You can use it along with other environment variables (eg: $SLURM_JOB_NUM_NODES
) to know the number of tasks possible. In the above script dynamic task calculation is done with the assumption that the nodes are homogenous (i.e $SLURM_CPUS_ON_NODE
will give only single number ).
For heterogeneous nodes, $SLURM_CPUS_ON_NODE
will give multiple values (eg: 2,3 if the nodes allocated has 2 and 3 cpus). In such scenario, $SLURM_JOB_NODELIST
can be used to find out the number of cpus corresponding to the allocated nodes and with that you can calculate the required tasks.
来源:https://stackoverflow.com/questions/57466957/make-use-of-all-cpus-on-slurm