Can Snakemake work if a rule's shell command is a cluster job?

不打扰是莪最后的温柔 提交于 2020-06-13 09:03:05

问题


In below example, if shell script shell_script.sh sends a job to cluster, is it possible to have snakemake aware of that cluster job's completion? That is, first, file a should be created by shell_script.sh which sends its own job to the cluster, and then once this cluster job is completed, file b should be created.

For simplicity, let's assume that snakemake is run locally meaning that the only cluster job originating is from shell_script.sh and not by snakemake .

localrules: that_job

rule all:
    input:
        "output_from_shell_script.txt",
        "file_after_cluster_job.txt"

rule that_job:
    output:
        a = "output_from_shell_script.txt",
        b = "file_after_cluster_job.txt"
    shell:
        """
        shell_script.sh {output.a}
        touch {output.b}
        """

PS - At the moment, I am using sleep command to give it a waiting time before the job is "completed". But this is an awful workaround as this could give rise to several problems.


回答1:


Snakemake can manage this for you with the --cluster argument on the command line.
You can supply a template for the jobs to be executed on the cluster.
As an example, here is how I use snakemake on a SGE managed cluster:

template which will encapsulate the jobs which I called sge.sh:

#$ -S /bin/bash
#$ -cwd
#$ -V

{exec_job}

then I use directly on the login node:

snakemake -rp --cluster "qsub -e ./logs/ -o ./logs/" -j 20 --jobscript sge.sh --latency-wait 30

--cluster will tell which queuing system to use
--jobscript is the template in which jobs will be encapsulated
--latency-wait is important if the file system takes a bit of time to write the files. You job might end and return before the output of the rules are actually visible to the filesystem which will cause an error

Note that you can specify rules not to be executed on the nodes in the Snakefile with the keyword localrules:

Otherwise, depending on your queuing system, some options exist to wait for job sent to cluster to finish:
SGE: Wait for set of qsub jobs to complete
SLURM: How to hold up a script until a slurm job (start with srun) is completely finished?
LSF: https://superuser.com/questions/46312/wait-for-one-or-all-lsf-jobs-to-complete



来源:https://stackoverflow.com/questions/50034797/can-snakemake-work-if-a-rules-shell-command-is-a-cluster-job

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!