Snakemake: HISAT2 alignment of many RNAseq reads against many genomes UPDATED

匆匆过客 提交于 2020-01-25 07:31:05

问题


I have several genome files with suffix .1.ht2l to .8.ht2l

bob.1.ht2l
...
bob.8.ht2l
steve.1.ht2l 
...
steve.8.ht2l

and sereval RNAseq samples

flower_kevin_1.fastq.gz
flower_kevin_2.fastq.gz
flower_daniel_1.fastq.gz
flower_daniel_2.fastq.gz 

I need to align all rnaseq reads against each genome. UPDATED as dariober suggested:

workdir: "/path/to/aligned"  
(HISAT2_INDEX_PREFIX,)=glob_wildcards("/path/to/index/{prefix}.1.ht2l")
(SAMPLES,)=glob_wildcards("/path/to/{sample}_1.fastq.gz") 
print(HISAT2_INDEX_PREFIX)  
print (SAMPLES)

rule all:
    input: 
        expand("{prefix}.{sample}.bam", zip, prefix=HISAT2_INDEX_PREFIX, sample=SAMPLES)

rule hisat2:
    input:
        hisat2_index=expand("%s.{ix}.ht2l" % "/path/to/index/{prefix}", ix=range(1, 9), prefix = HISAT2_INDEX_PREFIX),
        fastq1="/path/to/{sample}_1.fastq.gz",
        fastq2="/path/to/{sample}_2.fastq.gz"
    output:
        bam = "{prefix}.{sample}.bam",
        txt = "{prefix}.{sample}.txt",
    log: "{prefix}.{sample}.snakemake_log.txt"
    threads: 5
    shell:
      "/Tools/hisat2-2.1.0/hisat2 -p {threads} -x {/path/to/index/{wildcards.prefix}"
      " -1 {input.fastq1} -2 {input.fastq2}  --summary-file {output.txt} |"
      "/Tools/samtools-1.9/samtools sort -@ {threads} -o {output.bam}"

The problem I get is when running HISAT2 is taking as -x input all bob.1.ht2l:bob.8.ht2l and steve.1.ht2l:steve.8.ht2l at once. While rna-seq should be mapped at each genome separately. Where is the error? NB: my previous question: Snakemake: HISAT2 alignment of many RNAseq reads against many genomes


回答1:


I think your confusion comes from the fact that hisat wants a prefix to the index files, not all the list of index files. So instead of -x {input.hisat2_index} (i.e. the list of index files) use something like -x /path/to/{wildcards.prefix}.

In other words, the input hisat2_index=expand(...) should be there only to tell snakemake to start this rule only after these files are ready but you don't use them directly (well, hisat does use them of course but you don't pass them on the command line).



来源:https://stackoverflow.com/questions/59754604/snakemake-hisat2-alignment-of-many-rnaseq-reads-against-many-genomes-updated

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!