snakemake

Snakemake: HISAT2 alignment of many RNAseq reads against many genomes

﹥>﹥吖頭↗ 提交于 2020-01-25 08:02:34
问题 I have several genome files, all with suffix .1.ht2l to .8.ht2l bob.1.ht2l bob.2.ht2l bob.3.ht2l bob.4.ht2l bob.5.ht2l bob.6.ht2l bob.7.ht2l bob.8.ht2l steve.1.ht2l ....steve.8.ht2l and so on and sereval RNAseq samples, like flower_kevin_1.fastq.gz flower_kevin_2.fastq.gz flower_daniel_1.fastq.gz flower_daniel_2.fastq.gz and so on also with different tissues. I would like to align all rnaseq reds against the genomes. UPDATED: workdir: "/path/to/aligned" (HISAT2_INDEX_PREFIX,)=glob_wildcards("

Calling another pipeline within a snakefile result in mising output errors

∥☆過路亽.° 提交于 2020-01-25 07:47:05
问题 I am using an assembly pipeline called Canu inside my snakemake pipeline, but when it comes to the rule calling Canu, snakemake exits witht he MissingOutputException error as the pipeline submits multiple jobs to the cluster itself so it seems snakemake expects the output after the first job has finished. Is there a way to avoid this? I know I could use a very long --latency-wait option but this is not very optimal. snakefile code: #!/miniconda/bin/python workdir: config["path_to_files"]

Snakemake: HISAT2 alignment of many RNAseq reads against many genomes UPDATED

匆匆过客 提交于 2020-01-25 07:31:05
问题 I have several genome files with suffix .1.ht2l to .8.ht2l bob.1.ht2l ... bob.8.ht2l steve.1.ht2l ... steve.8.ht2l and sereval RNAseq samples flower_kevin_1.fastq.gz flower_kevin_2.fastq.gz flower_daniel_1.fastq.gz flower_daniel_2.fastq.gz I need to align all rnaseq reads against each genome. UPDATED as dariober suggested: workdir: "/path/to/aligned" (HISAT2_INDEX_PREFIX,)=glob_wildcards("/path/to/index/{prefix}.1.ht2l") (SAMPLES,)=glob_wildcards("/path/to/{sample}_1.fastq.gz") print(HISAT2

Snakemake checkpoint (exited with non-zero exit code)

徘徊边缘 提交于 2020-01-24 22:43:05
问题 I need to make a checkpoint in Snakemake at the step where chromosomes are scattered to call copy number variants with GATK: rule all: input: 'aggregated/chr1' # step that gives non-zero exit code error checkpoint scattering: input: interval = 'gcfiltered_{chr}.interval_list' output: directory('scatter_{chr}') shell: 'mkdir -p {output} && ' 'gatk --java-options "-Xmx8G" IntervalListTools ' '--INPUT {input.interval} ' '--SUBDIVISION_MODE INTERVAL_COUNT ' '--SCATTER_CONTENT 600 ' '--OUTPUT

Handling parallelization

Deadly 提交于 2020-01-24 15:34:26
问题 I'm a bit new on snakemake. Imagine that I have a rule, like the one below (I've set the number of threads to 10). Is there any way to make snakemake magically handles the parallelization of the loop for in this rule? rule MY_RULE: input: input_file=TRAIN_DATA output: output_file=OUTPUT_DATA threads: 10 run: for f,o in zip(input.input_file, output.output_file): DO_SOMETHING_AND_SAVE(f,o) Thanks 回答1: I guess your rule could be re-written as (with additional code to make a small self-contained

Snakemake “Missing files after X seconds” error

▼魔方 西西 提交于 2020-01-16 12:00:11
问题 I am getting the following error every time I try to run my snakemake script: Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 16 Rules claiming more threads will be scaled down. Job counts: count jobs 1 pear 1 [Wed Dec 4 17:32:54 2019] rule pear: input: Unmap_41_1.fastq, Unmap_41_2.fastq output: merged_reads/Unmap_41.fastq jobid: 0 wildcards: sample=Unmap_41, extension=fastq Waiting at most 120 seconds for missing files. MissingOutputException in line 14 of /faststorage

Using Conda enviroment in SnakeMake on SGE cluster problem

独自空忆成欢 提交于 2020-01-15 23:02:07
问题 Related: SnakeMake rule with Python script, conda and cluster I have been trying to set up my SnakeMake pipelines to run on SGE clusters (qsub). Using simple commands or tools that are installed directly to computational nodes, there is no problem. However, there is a problem when I try to set up SnakeMake to download tools through Conda on SGE nodes. My testing Snakefile is: rule bwa_sge_c_test: conda: "bwa.yaml" shell: "bwa > snaketest.txt" "bwa.yaml" file is: channels: - bioconda

How can I run multiple runs of pipeline with different config files - issue with lock on .snakemake directory

雨燕双飞 提交于 2020-01-15 11:26:06
问题 I am running a snakemake pipeline from the same working directory but with different config files and the input / output are in different directories too. The issue seems to be that although both runs are using data in different folders snakemake creates the lock on the pipeline folder due to the .snakemake folder and the lock folder within. Is there a way to force separate .snakemake folders? code example below: Both runs are ran from within /home/pipelines/qc_pipeline : run 1: /home/apps

Include Parameters and source code in Snakemake HTML Report

|▌冷眼眸甩不掉的悲伤 提交于 2020-01-15 11:12:07
问题 I want to include the shell command as well as the source code of external scripts of snakemake Rules in my html report (I saw that people have those in the table of the RULE seqment). The example below is part of the Basic Example from the doc. https://snakemake.readthedocs.io/en/stable/tutorial/basics.html rule bcftools_call: input: fa="data/genome.fa", bam=expand("sorted_reads/{sample}.bam", sample=SAMPLES), bai=expand("sorted_reads/{sample}.bam.bai", sample=SAMPLES) output: "calls/all.vcf

Include Parameters and source code in Snakemake HTML Report

浪尽此生 提交于 2020-01-15 11:11:14
问题 I want to include the shell command as well as the source code of external scripts of snakemake Rules in my html report (I saw that people have those in the table of the RULE seqment). The example below is part of the Basic Example from the doc. https://snakemake.readthedocs.io/en/stable/tutorial/basics.html rule bcftools_call: input: fa="data/genome.fa", bam=expand("sorted_reads/{sample}.bam", sample=SAMPLES), bai=expand("sorted_reads/{sample}.bam.bai", sample=SAMPLES) output: "calls/all.vcf