execute snakemake rule as last rule

强颜欢笑 提交于 2021-01-29 12:54:44

问题


I tried to create a snakemake file to run sortmeRNA pipeline:

SAMPLES = ['test']
READS=["R1", "R2"]

rule all:
    input: expand("Clean/4.Unmerge/{exp}.non_rRNA_{read}.fastq", exp = SAMPLES, read = READS)

rule unzip:
    input: 
        fq = "trimmed/{exp}.{read}.trimd.fastq.gz"
    output: 
        ofq = "Clean/1.Unzipped/{exp}.{read}.trimd.fastq"
    shell: "gzip -dkc < {input.fq} > {output.ofq}"

rule merge_paired:
    input: 
        read1 = "Clean/1.Unzipped/{exp}.R1.trimd.fastq",
        read2 = "Clean/1.Unzipped/{exp}.R2.trimd.fastq"
    output: 
        il = "Clean/2.interleaved/{exp}.il.trimd.fastq"
    shell: "merge-paired-reads.sh {input.read1} {input.read2} {output.il}"

rule sortmeRNA:
    input: 
        ilfq = "Clean/2.interleaved/{exp}.il.trimd.fastq"
    output:
        reads_rRNA = "Clean/3.sorted/{exp}_reads_rRNA",
        non_rRNA = "Clean/3.sorted/{exp}_reads_nonRNA"
    params:
        silvabac = "rRNA_databases/silva-bac-16s-id90.fasta,index/silva-bac-16s-db:rRNA_databases/silva-bac-23s-id98.fasta,index/silva-bac-23s-db",
        silvaarc = "rRNA_databases/silva-arc-16s-id95.fasta,index/silva-arc-16s-db:rRNA_databases/silva-arc-23s-id98.fasta,index/silva-arc-23s-db",
        silvaeuk = "rRNA_databases/silva-euk-18s-id95.fasta,index/silva-euk-18s-db:rRNA_databases/silva-euk-28s-id98.fasta,index/silva-euk-28s-db",
        rfam = "rRNA_databases/rfam-5s-database-id98.fasta,index/rfam-5s-db:rRNA_databases/rfam-5.8s-database-id98.fasta,index/rfam-5.8s-db",
        acc = "--num_alignments 1 --fastx --log -a 20 -m 64000 --paired_in -v"
    log:
        "Clean/sortmeRNAlogs/{exp}_sortmeRNA.log"
        shell:'''
        sortmerna --ref {params.silvabac}:{params.silvaarc}:{params.silvaeuk}:{params.rfam} --reads {input.ilfq} --aligned {output.reads_rRNA} --other {output.non_rRNA} {params.acc}
        '''
rule unmerge_paired:
    input:
        inun = "Clean/3.sorted/{exp}_reads_nonRNA.fastq"
    output:
        R1 = "Clean/4.Unmerge/{exp}.non_rRNA_R1.fastq",
        R2 = "Clean/4.Unmerge/{exp}.non_rRNA_R2.fastq"
    shell:"unmerge-paired-reads.sh {input.inun} {output.R1} {output.R2}"

This worked fine !. But for 1 sample it produced an output with size ~53 GB. I have 90 samples to run and cannot afford huge disk space. I tried to make output of rules unzip,merge_paired,sortmeRNA as temp(), but upon executing unmerge_paired raises "Missing input files exception" error. I also tried to add rule_remove to delete all those intermediate directories. But that is not executed as last rule, rather somewhere in the middle raising error again !. Is there any efficient way to do this ?

The error that occurs is:

MissingInputException in line 45 of sortmeRNA_pipeline_memv2.0.snakefile:
Missing input files for rule unmerge_paired:
Clean/3.sorted/test_reads_nonRNA.fastq

Also please note that, rule sortmeRNA requires a string for output and produces string.fastq file, which is then input into rule unmerge_paired ! Thanks.


回答1:


For Snakemake to connect the input of one rule to the output of another, they will need to be identical. The way you describe the output of sortmeRNA and the input unmerge_paired do not work together, whether you put temp() around it or not.

rule sortmeRNA:
    input: 
        ilfq = "Clean/2.interleaved/{exp}.il.trimd.fastq"
    output:
        reads_rRNA = temp("Clean/3.sorted/{exp}_reads_rRNA.fastq"),
        non_rRNA = temp("Clean/3.sorted/{exp}_reads_nonRNA.fastq")
    params:
        reads_rRNA = "Clean/3.sorted/{exp}_reads_rRNA",
        non_rRNA = "Clean/3.sorted/{exp}_reads_nonRNA"            
    shell:
        '''
        sortmerna --aligned {params.reads_rRNA} --other {params.non_rRNA} ...
        '''

rule unmerge_paired:
    input:
        inun = "Clean/3.sorted/{exp}_reads_nonRNA.fastq"  # or rules.sortmeRNA.output.non_rRNA
    output:
        R1 = "Clean/4.Unmerge/{exp}.non_rRNA_R1.fastq",
        R2 = "Clean/4.Unmerge/{exp}.non_rRNA_R2.fastq"
    shell:
        "unmerge-paired-reads.sh {input.inun} {output.R1} {output.R2}"

I removed all the stuff which isn't necessary to understand what is going on, you will have to place those back obviously. I changed the output of sortmeRNA to the actual output of the rule (and made them temp). I also added two params, which are the same as the output but then without the fastq extension.



来源:https://stackoverflow.com/questions/59766049/execute-snakemake-rule-as-last-rule

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!