Snakemake passes only the first path in the output to shell command

喜你入骨 提交于 2019-12-25 00:15:57

问题


I am trying to feed all of the paths at once in one variable to a python script in snakemake like that:

rule neo4j:
  input:
      script = 'python/neo4j.py',
      path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
      path_to_umap = 'results/umap/{sample}_umap.csv',
      path_to_mtx = 'data_files/normalized/{sample}.csv'
  output: 'results/neo4j/{sample}/cells.csv', 'results/neo4j/{sample}/genes.csv', 
      'results/neo4j/{sample}/cl_nodes.csv', 'results/neo4j/{sample}/cl_contains.csv',
      'results/neo4j/{sample}/cl_isin.csv', 'results/neo4j/{sample}/expr_by.csv',
      'results/neo4j/{sample}/expr_ess.csv'
  shell:
      "python {input.script} -path_to_cl {input.path_to_cl} -path_to_umap {input.path_to_umap} -path_to_mtx {input.path_to_mtx} -output {output}"

When I am accessing output parameter in the python script it sees only the first path: 'results/neo4j/{sample}/cells.csv'. I have also tried naming each path, but it did not fix the issue. How to pass all paths in the output of the rule as an array or as dictionary to be able to access them later in python?


回答1:


If I understand correctly your issue, your problem is that the neo4j.py script doesn't accept more than one file for its -output argument: The shell command probably ends with the full list of files (check with the -p option of snakemake), but only the first one is taken into account by the script.

If that is indeed the case, a possibly cleaner approach would be to modify the interface of your neo4j.py script so that it uses one argument for each of its output files.

You would then modify your rule as follows:

rule neo4j:
    input:
        script = 'python/neo4j.py',
        path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
        path_to_umap = 'results/umap/{sample}_umap.csv',
        path_to_mtx = 'data_files/normalized/{sample}.csv'
    output:
        cells = 'results/neo4j/{sample}/cells.csv',
        genes = 'results/neo4j/{sample}/genes.csv',
        nodes = 'results/neo4j/{sample}/cl_nodes.csv',
        contains = 'results/neo4j/{sample}/cl_contains.csv',
        isin = 'results/neo4j/{sample}/cl_isin.csv',
        by = 'results/neo4j/{sample}/expr_by.csv',
        ess = 'results/neo4j/{sample}/expr_ess.csv'
    shell:
        """
        python {input.script} \\
            --path_to_cl {input.path_to_cl} \\
            --path_to_umap {input.path_to_umap} \\
            --path_to_mtx {input.path_to_mtx} \\
            --cells {output.cells} \\
            --genes {output.genes} \\
            --nodes {output.nodes} \\
            --contains {output.contains} \\
            --isin {output.isin} \\
            --by {output.by} \\
            --ess {output.ess}
        """

Some potentially useful python modules to set up the interface of your script:

  • docopt
  • argparse

Edit

If you don't want to pass each input file as an individual argument, you could simply pass the output directory, and let your script build the output paths from this single parameters. Given the file names you want, this seems possible:

rule neo4j:
    input:
        script = 'python/neo4j.py',
        path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
        path_to_umap = 'results/umap/{sample}_umap.csv',
        path_to_mtx = 'data_files/normalized/{sample}.csv'
    output:
        'results/neo4j/{sample}/cells.csv',
        'results/neo4j/{sample}/genes.csv',
        'results/neo4j/{sample}/cl_nodes.csv',
        'results/neo4j/{sample}/cl_contains.csv',
        'results/neo4j/{sample}/cl_isin.csv',
        'results/neo4j/{sample}/expr_by.csv',
        'results/neo4j/{sample}/expr_ess.csv'
    shell:
        """
        python {input.script} \\
            --path_to_cl {input.path_to_cl} \\
            --path_to_umap {input.path_to_umap} \\
            --path_to_mtx {input.path_to_mtx} \\
            --out_dir results/neo4j/{wildcards.sample}
        """



回答2:


rule hello:
    output:
        "woot", "hoot"
    run:
        for f in output:
            print(f)
        print(output[1])

prints "woot", "hoot", "hoot".



来源:https://stackoverflow.com/questions/52088953/snakemake-passes-only-the-first-path-in-the-output-to-shell-command

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!