Wait for set of qsub jobs to complete

社会主义新天地 提交于 2019-11-28 04:54:55

Launch your qsub jobs, using the -N option to give them arbitrary names (job1, job2, etc):

qsub -N job1 -cwd ./job1_script
qsub -N job2 -cwd ./job2_script
qsub -N job3 -cwd ./job3_script

Launch your script and tell it to wait until the jobs named job1, job2 and job3 are finished before it starts:

qsub -hold_jid job1,job2,job3 -cwd ./results_script

Another alternative (from here) is as follows:

FIRST=$(qsub job1.pbs)
echo $FIRST
SECOND=$(qsub -W depend=afterany:$FIRST job2.pbs)
echo $SECOND
THIRD=$(qsub -W depend=afterany:$SECOND job3.pbs)
echo $THIRD

The insight is that qsub returns the jobid and this is typically dumped to standard output. Instead, capture it in a variable ($FIRST, $SECOND, $THIRD) and use the -W depend=afterany:[JOBIDs] flag when you enqueue your jobs to control the dependency structure of when they are dequeued.

j_m
qsub -hold_jid job1,job2,job3 -cwd ./myscript

If all the jobs have a common pattern in the name, you can provide that pattern when you submit the jobs. https://linux.die.net/man/1/sge_types shows you what patterns you can use. example:

-hold_jid "job_name_pattern*"

This works in bash, but the ideas should be portable. Use -terse to facilitate building up a string with job ids to wait on; then submit a dummy job that uses -hold_jid to wait on the previous jobs and -sync y so that qsub doesn't return until it (and thus all prereqs) has finished:

# example where each of three jobs just sleeps for some time:
job_ids=$(qsub -terse -b y sleep 10)
job_ids=job_ids,$(qsub -terse -b y sleep 20)
job_ids=job_ids,$(qsub -terse -b y sleep 30)
qsub -hold_jid ${job_ids} -sync y -b y echo "DONE"  
  • -terse option makes the output of qsub just be the job id
  • -hold_jid option (as mentioned in other answers) makes a job wait on specified job ids
  • -sync y option (referenced by the OP) asks qsub not to return until the submitted job is finished
  • -b y specifies that the command is not a path to a script file (for instance, I'm using sleep 30 as the command)

See the man page for more details.

In case you have 150 files that you want process and be able to run only 15 each time, while the other are in holding in the queue you can set something like this.

# split my list files in a junk of small list having 10 file each
awk 'NR%10==1 {x="F"++i;}{ print >  "list_part"x".txt" }'  list.txt

qsub all the jobs in such a way that the first of each list_part*.txt hold the second one ....the second one hold the third one .....and so on.

for list in $( ls list_part*.txt ) ; do
    PREV_JOB=$(qsub start.sh) # create a dummy script start.sh just for starting
 for file in  $(cat $list )  ; do
   NEXT_JOB=$(qsub -v file=$file  -W depend=afterany:$PREV_JOB  myscript.sh )
   PREV_JOB=$NEXT_JOB
 done
done

This is useful if you have in myscript.sh a procedure that require move or download many files or create intense traffic in the cluster-lan

I needed more flexibility, so I built a Python module for this and other purposes here. You can run the module directly as a script (python qsub.py) for a demo.

Usage:

$ git clone https://github.com/stevekm/util.git
$ cd util
$ python
Python 2.7.3 (default, Mar 29 2013, 16:50:34)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import qsub
>>> job = qsub.submit(command = 'echo foo; sleep 60', print_verbose = True)
qsub command is:

qsub -j y -N "python" -o :"/home/util/" -e :"/home/util/" <<E0F
set -x
echo foo; sleep 60
set +x
E0F

>>> qsub.monitor_jobs(jobs = [job], print_verbose = True)
Monitoring jobs for completion. Number of jobs in queue: 1
Number of jobs in queue: 0
No jobs remaining in the job queue
([Job(id = 4112505, name = python, log_dir = None)], [])

Designed with Python 2.7 and SGE since thats what our system runs. The only non-standard Python libraries required are the included tools.py and log.py modules, and sh.py (also included)

Obviously not as helpful if you wish to stay purely in bash, but if you need to wait on qsub jobs then I would imagine your workflow is edging towards a complexity that would benefit from using Python instead.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!