Snakemake passes only the first path in the output to shell command

问题

I am trying to feed all of the paths at once in one variable to a python script in snakemake like that:

rule neo4j:
  input:
      script = 'python/neo4j.py',
      path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
      path_to_umap = 'results/umap/{sample}_umap.csv',
      path_to_mtx = 'data_files/normalized/{sample}.csv'
  output: 'results/neo4j/{sample}/cells.csv', 'results/neo4j/{sample}/genes.csv', 
      'results/neo4j/{sample}/cl_nodes.csv', 'results/neo4j/{sample}/cl_contains.csv',
      'results/neo4j/{sample}/cl_isin.csv', 'results/neo4j/{sample}/expr_by.csv',
      'results/neo4j/{sample}/expr_ess.csv'
  shell:
      "python {input.script} -path_to_cl {input.path_to_cl} -path_to_umap {input.path_to_umap} -path_to_mtx {input.path_to_mtx} -output {output}"

When I am accessing output parameter in the python script it sees only the first path: 'results/neo4j/{sample}/cells.csv'. I have also tried naming each path, but it did not fix the issue. How to pass all paths in the output of the rule as an array or as dictionary to be able to access them later in python?

回答1:

If I understand correctly your issue, your problem is that the neo4j.py script doesn't accept more than one file for its -output argument: The shell command probably ends with the full list of files (check with the -p option of snakemake), but only the first one is taken into account by the script.

If that is indeed the case, a possibly cleaner approach would be to modify the interface of your neo4j.py script so that it uses one argument for each of its output files.

You would then modify your rule as follows:

rule neo4j:
    input:
        script = 'python/neo4j.py',
        path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
        path_to_umap = 'results/umap/{sample}_umap.csv',
        path_to_mtx = 'data_files/normalized/{sample}.csv'
    output:
        cells = 'results/neo4j/{sample}/cells.csv',
        genes = 'results/neo4j/{sample}/genes.csv',
        nodes = 'results/neo4j/{sample}/cl_nodes.csv',
        contains = 'results/neo4j/{sample}/cl_contains.csv',
        isin = 'results/neo4j/{sample}/cl_isin.csv',
        by = 'results/neo4j/{sample}/expr_by.csv',
        ess = 'results/neo4j/{sample}/expr_ess.csv'
    shell:
        """
        python {input.script} \\
            --path_to_cl {input.path_to_cl} \\
            --path_to_umap {input.path_to_umap} \\
            --path_to_mtx {input.path_to_mtx} \\
            --cells {output.cells} \\
            --genes {output.genes} \\
            --nodes {output.nodes} \\
            --contains {output.contains} \\
            --isin {output.isin} \\
            --by {output.by} \\
            --ess {output.ess}
        """

Some potentially useful python modules to set up the interface of your script:

docopt
argparse

Edit

If you don't want to pass each input file as an individual argument, you could simply pass the output directory, and let your script build the output paths from this single parameters. Given the file names you want, this seems possible:

rule neo4j:
    input:
        script = 'python/neo4j.py',
        path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
        path_to_umap = 'results/umap/{sample}_umap.csv',
        path_to_mtx = 'data_files/normalized/{sample}.csv'
    output:
        'results/neo4j/{sample}/cells.csv',
        'results/neo4j/{sample}/genes.csv',
        'results/neo4j/{sample}/cl_nodes.csv',
        'results/neo4j/{sample}/cl_contains.csv',
        'results/neo4j/{sample}/cl_isin.csv',
        'results/neo4j/{sample}/expr_by.csv',
        'results/neo4j/{sample}/expr_ess.csv'
    shell:
        """
        python {input.script} \\
            --path_to_cl {input.path_to_cl} \\
            --path_to_umap {input.path_to_umap} \\
            --path_to_mtx {input.path_to_mtx} \\
            --out_dir results/neo4j/{wildcards.sample}
        """

回答2:

rule hello:
    output:
        "woot", "hoot"
    run:
        for f in output:
            print(f)
        print(output[1])

prints "woot", "hoot", "hoot".

来源：https://stackoverflow.com/questions/52088953/snakemake-passes-only-the-first-path-in-the-output-to-shell-command

标签

python

snakemake