Snakemake cannot handle very long command line?

不羁岁月 提交于 2021-02-08 21:56:04

问题


This is a very strange problem. When my {input} specified in the rule section is a list of <200 files, snakemake worked all right. But when {input} has more than 500 files, snakemake just quitted with messages (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!). The complete log did not provide any error messages.

For the log, please see: https://github.com/snakemake/snakemake/files/5285271/2020-09-25T151835.613199.snakemake.log

The rule that worked is (NOTE the input is capped to 200 files):

rule combine_fastq:
    input:
        lambda wildcards: samples.loc[(wildcards.sample), ["fq"]].dropna()[0].split(',')[:200]
    output:
        "combined.fastq/{sample}.fastq.gz"
    group: "minion_assemble"
    shell:
        """
echo {input} >  {output}
        """

The rule that failed is:

rule combine_fastq:
    input:
        lambda wildcards: samples.loc[(wildcards.sample), ["fq"]].dropna()[0].split(',')
    output:
        "combined.fastq/{sample}.fastq.gz"
    group: "minion_assemble"
    shell:
        """
echo {input} >  {output}
        """

My question is also posted in GitHub: https://github.com/snakemake/snakemake/issues/643.


回答1:


I second Maarten's answer, with that many files you are running up against a shell limit; snakemake is just doing a poor job helping you identify the problem.

Based on the issue you reference, it seems like you are using cat to combine all of your files. Maybe following the answer here would help:

rule combine_fastq_list:
    input:
        lambda wildcards: samples.loc[(wildcards.sample), ["fq"]].dropna()[0].split(',')
    output:
        temp("{sample}.tmp.list")
    group: "minion_assemble"
    script:
        with open(output[0]) as out:
            out.write('\n'.join(input))

rule combine_fastq:
    input:
        temp("{sample}.tmp.list")
    output:
        'combined.fastq/{sample}.fastq.gz'
    group: "minion_assemble"
    shell:
        'cat {input} | '  # this is reading the list of files from the file
            'xargs zcat -f | '
            '...'

Hope it gets you on the right track.

edit

The first option executes your command separately for each input file. A different option that executes the command once for the whole list of input is:

rule combine_fastq:
    ...
    shell:
        """
        command $(< {input}) ...
        """


来源:https://stackoverflow.com/questions/64073269/snakemake-cannot-handle-very-long-command-line

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!