问题
I have several genome files with suffix .1.ht2l to .8.ht2l
bob.1.ht2l
...
bob.8.ht2l
steve.1.ht2l
...
steve.8.ht2l
and sereval RNAseq samples
flower_kevin_1.fastq.gz
flower_kevin_2.fastq.gz
flower_daniel_1.fastq.gz
flower_daniel_2.fastq.gz
I need to align all rnaseq reads against each genome. UPDATED as dariober suggested:
workdir: "/path/to/aligned"
(HISAT2_INDEX_PREFIX,)=glob_wildcards("/path/to/index/{prefix}.1.ht2l")
(SAMPLES,)=glob_wildcards("/path/to/{sample}_1.fastq.gz")
print(HISAT2_INDEX_PREFIX)
print (SAMPLES)
rule all:
input:
expand("{prefix}.{sample}.bam", zip, prefix=HISAT2_INDEX_PREFIX, sample=SAMPLES)
rule hisat2:
input:
hisat2_index=expand("%s.{ix}.ht2l" % "/path/to/index/{prefix}", ix=range(1, 9), prefix = HISAT2_INDEX_PREFIX),
fastq1="/path/to/{sample}_1.fastq.gz",
fastq2="/path/to/{sample}_2.fastq.gz"
output:
bam = "{prefix}.{sample}.bam",
txt = "{prefix}.{sample}.txt",
log: "{prefix}.{sample}.snakemake_log.txt"
threads: 5
shell:
"/Tools/hisat2-2.1.0/hisat2 -p {threads} -x {/path/to/index/{wildcards.prefix}"
" -1 {input.fastq1} -2 {input.fastq2} --summary-file {output.txt} |"
"/Tools/samtools-1.9/samtools sort -@ {threads} -o {output.bam}"
The problem I get is when running HISAT2 is taking as -x input all bob.1.ht2l:bob.8.ht2l and steve.1.ht2l:steve.8.ht2l at once. While rna-seq should be mapped at each genome separately. Where is the error? NB: my previous question: Snakemake: HISAT2 alignment of many RNAseq reads against many genomes
回答1:
I think your confusion comes from the fact that hisat wants a prefix to the index files, not all the list of index files. So instead of -x {input.hisat2_index}
(i.e. the list of index files) use something like -x /path/to/{wildcards.prefix}
.
In other words, the input hisat2_index=expand(...)
should be there only to tell snakemake to start this rule only after these files are ready but you don't use them directly (well, hisat does use them of course but you don't pass them on the command line).
来源:https://stackoverflow.com/questions/59754604/snakemake-hisat2-alignment-of-many-rnaseq-reads-against-many-genomes-updated