snakemake

snakemake list rules to execute in cluster and local

牧云@^-^@ 提交于 2021-02-05 09:40:00
问题 I know there is way to declare rules that need to be executed on the local machine using localrules as: localrules: all, foo Is there a similar option to declare rules that need to be executed on the cluster? Perhaps a clusterrules option? I have a bunch of rules in my pipeline that don't need to be executed on the cluster and while I can list them all with localrules , it will be easier to just enter the one or two rules that need to be executed on the cluster. An alternative option is the

How to use dict value and key in Snakemake rules

雨燕双飞 提交于 2021-01-29 13:32:01
问题 I stuck in a Snakemake issue for a lone while. Propose I have a dict like: dict_A = {A:"id1","id2","id3", B:"id2","id3","id4","id5", C:"id1","id4","id5"} and I want to write rules like: input: "{dict_A.keys()}/{dict_A[key]}_R1.txt" output: "{dict_A.keys()}/{dict_A[key]}_R1_filter.txt" shell: "XXX {input} > {output}" While, I try to search on google and StackOverflow but I can't figure out this problem. Really hope there is someone can help me ! Really thanks! 回答1: Snakemake works by providing

Parallelise output of input function in Snakemake

南楼画角 提交于 2021-01-29 13:30:51
问题 Hello Snakemake community, I am having quite some troubles to define correctly a function in Snakemake and call it in the params section. The output of the function is a list and my aim is to use each item of the list as a parameter of a shell command. In other words, I would like to run multiple jobs in parallel of the same shell command with a different parameter. This is the function: import os, glob def get_scontigs_names(wildcards): scontigs = glob.glob(os.path.join("reference",

execute snakemake rule as last rule

强颜欢笑 提交于 2021-01-29 12:54:44
问题 I tried to create a snakemake file to run sortmeRNA pipeline: SAMPLES = ['test'] READS=["R1", "R2"] rule all: input: expand("Clean/4.Unmerge/{exp}.non_rRNA_{read}.fastq", exp = SAMPLES, read = READS) rule unzip: input: fq = "trimmed/{exp}.{read}.trimd.fastq.gz" output: ofq = "Clean/1.Unzipped/{exp}.{read}.trimd.fastq" shell: "gzip -dkc < {input.fq} > {output.ofq}" rule merge_paired: input: read1 = "Clean/1.Unzipped/{exp}.R1.trimd.fastq", read2 = "Clean/1.Unzipped/{exp}.R2.trimd.fastq" output:

InputFunctionException: unexpected EOF while parsing

僤鯓⒐⒋嵵緔 提交于 2021-01-28 13:47:23
问题 Major EDIT : Having fixed a couple of issues thanks to comments and written a minimal reproducible example to help my helpers, I've narrowed down the issue to a difference between execution locally and using DRMAA. Here is a minimal reproducible pipeline that does not require any external file download and can be executed out of the box or clone following git repository: git clone git@github.com:kevinrue/snakemake-issue-all.git When I run the pipeline using DRMAA I get the following error:

InputFunctionException: unexpected EOF while parsing

假如想象 提交于 2021-01-28 13:46:45
问题 Major EDIT : Having fixed a couple of issues thanks to comments and written a minimal reproducible example to help my helpers, I've narrowed down the issue to a difference between execution locally and using DRMAA. Here is a minimal reproducible pipeline that does not require any external file download and can be executed out of the box or clone following git repository: git clone git@github.com:kevinrue/snakemake-issue-all.git When I run the pipeline using DRMAA I get the following error:

Run Snakemake rule one sample at a time

痞子三分冷 提交于 2021-01-28 10:36:27
问题 I'm creating a Snakemake workflow that will wrap up some of the tools in the nvidia clara parabricks pipelines. Because these tools run on GPU's, they typically can only handle one sample at a time, otherwise the GPU will run out of memory. However, Snakemake shoves all the samples through to Parabricks at one time - seemingly unaware of the GPU memory limits. One solution would be to tell Snakemake to process one sample at a time, thus the question: How do I get Snakemake to process one

Run Snakemake rule one sample at a time

你。 提交于 2021-01-28 10:34:03
问题 I'm creating a Snakemake workflow that will wrap up some of the tools in the nvidia clara parabricks pipelines. Because these tools run on GPU's, they typically can only handle one sample at a time, otherwise the GPU will run out of memory. However, Snakemake shoves all the samples through to Parabricks at one time - seemingly unaware of the GPU memory limits. One solution would be to tell Snakemake to process one sample at a time, thus the question: How do I get Snakemake to process one

Run Snakemake rule one sample at a time

十年热恋 提交于 2021-01-28 10:33:48
问题 I'm creating a Snakemake workflow that will wrap up some of the tools in the nvidia clara parabricks pipelines. Because these tools run on GPU's, they typically can only handle one sample at a time, otherwise the GPU will run out of memory. However, Snakemake shoves all the samples through to Parabricks at one time - seemingly unaware of the GPU memory limits. One solution would be to tell Snakemake to process one sample at a time, thus the question: How do I get Snakemake to process one

Snakemake - rule that downloads data

梦想与她 提交于 2021-01-27 19:39:08
问题 I am having some trouble implementing a pipeline in which the first step is downloading the data from some server. As far as I understand, all rules must have inputs which are files. However, in my case the "input" is an ID string given to a script which accesses the server and downloads the data. I am aware of the remote files option in snakemake, but the server I am downloading from (ENA) is not on that list. Moreover, I am using a script which calls aspera in order to improve download