SLURM sbatch job array for the same script but with different input arguments run in parallel

前端未结

关注

 3  2105

I have a problem where I need to launch the same script but with different input arguments.

Say I have a script myscript.py -p -i


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  死守一世寂寞        
                
              
                            
                2021-02-04 17:38
              
            
            
                                                                       
The best approach is to use job arrays.

One option is to pass the parameter p1 when submitting the job script, so you will only have one script, but will have to submit it multiple times, once for each p1 value.

The code will be like this (untested):

#!/bin/bash
#SBATCH --job-name=cv_01
#SBATCH --output=cv_analysis_eis-%j-%a.out
#SBATCH --error=cv_analysis_eis-%j-%a.err
#SBATCH --partition=gpu2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH -a 0-150:5

python myscript.py -p $1 -v $SLURM_ARRAY_TASK_ID


and you will submit it with:

sbatch my_jobscript.sh 0.05
sbatch my_jobscript.sh 0.075
...


Another approach is to define all the p1 parameters in a bash array and submit NxM jobs (untested)

#!/bin/bash
#SBATCH --job-name=cv_01
#SBATCH --output=cv_analysis_eis-%j-%a.out
#SBATCH --error=cv_analysis_eis-%j-%a.err
#SBATCH --partition=gpu2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#Make the array NxM
#SBATCH -a 0-150

PARRAY=(0.05 0.075 0.1 0.25 0.5)    

#p1 is the element of the array found with ARRAY_ID mod P_ARRAY_LENGTH
p1=${PARRAY[`expr $SLURM_ARRAY_TASK_ID % ${#PARRAY[@]}`]}
#v is the integer division of the ARRAY_ID by the lenght of 
v=`expr $SLURM_ARRAY_TASK_ID / ${#PARRAY[@]}`
python myscript.py -p $p1 -v $v

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  误落风尘        
                
              
                            
                2021-02-04 17:56
              
            
            
                                                                       
If you use SLURM job arrays, you could linearise the index of your two for loops, and then do a comparison of the loop index and the array task id:

#!/bin/bash
#SBATCH --job-name=cv_01
#SBATCH --output=cv_analysis_eis-%j.out
#SBATCH --error=cv_analysis_eis-%j.err
#SBATCH --partition=gpu2
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH -a 0-154

# NxM = 5 * 31 = 154

p1_arr=(0.05 0.075 0.1 0.25 0.5)

# SLURM_ARRAY_TASK_ID=154 # comment in for testing

for ip1 in {0..4} # 5 steps
do
    for i in {0..150..5} # 31 steps
    do
        let task_id=$i/5+31*$ip1

        # printf $task_id"\n" # comment in for testing

        if [ "$task_id" -eq "$SLURM_ARRAY_TASK_ID" ]
        then
          p1=${p1_arr[ip1]}
          # printf "python myscript.py -p $p1 -v $i\n" # comment in for testing
          python myscript.py -p $p1 -v $i\n
        fi
    done
done


This answer is pretty similar to Carles. I would thus have preferred to write it as a comment but do not have enough reputation.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2021-02-04 18:03
              
            
            
                                                                       
According to this page, job arrays incur significant overhead:


  If the running time of your program is small, say ten minutes or less, creating a job array will incur a lot of overhead and you should consider packing your jobs.


That page provides a few examples to run your kind of job, using both arrays and "packed jobs."

If you don't want/need to specify the resources for your job, here is another approach: I'm not sure if it's a usecase that was intended by Slurm, but it appears to work, and the submission script looks a little bit nicer since we don't have to linearize the indices to fit it into the job-array paradigm. Plus it works well with nested loops of arbitrary depth.

Run this directly as a shell script:

#!/bin/bash
FLAGS="--ntasks=1 --cpus-per-task=1"
for i in 1 2 3 4 5; do
        for j in 1 2 3 4 5; do
            for k in 1 2 3 4 5; do
                sbatch $FLAGS testscript.py $i $j $k
        done
    done
done


where you need to make sure testscript.py points to the correct interpreter in the first line using the #! e.g.

#!/usr/bin/env python 
import time
import sys
time.sleep(5)
print "This is my script"
print sys.argv[1], sys.argv[2], sys.argv[3] 


Alternatively (untested), you can use the --wrap flag like this 

sbatch $FLAGS --wrap="python testscript.py $i $j $k"


and you won't need the #!/usr/bin/env python line in testscript.py
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复