问题
I am trying to feed all of the paths at once in one variable to a python
script in snakemake
like that:
rule neo4j:
input:
script = 'python/neo4j.py',
path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
path_to_umap = 'results/umap/{sample}_umap.csv',
path_to_mtx = 'data_files/normalized/{sample}.csv'
output: 'results/neo4j/{sample}/cells.csv', 'results/neo4j/{sample}/genes.csv',
'results/neo4j/{sample}/cl_nodes.csv', 'results/neo4j/{sample}/cl_contains.csv',
'results/neo4j/{sample}/cl_isin.csv', 'results/neo4j/{sample}/expr_by.csv',
'results/neo4j/{sample}/expr_ess.csv'
shell:
"python {input.script} -path_to_cl {input.path_to_cl} -path_to_umap {input.path_to_umap} -path_to_mtx {input.path_to_mtx} -output {output}"
When I am accessing output
parameter in the python
script it sees only the first path: 'results/neo4j/{sample}/cells.csv'
. I have also tried naming each path, but it did not fix the issue. How to pass all paths in the output
of the rule as an array or as dictionary to be able to access them later in python
?
回答1:
If I understand correctly your issue, your problem is that the neo4j.py
script doesn't accept more than one file for its -output
argument: The shell command probably ends with the full list of files (check with the -p
option of snakemake
), but only the first one is taken into account by the script.
If that is indeed the case, a possibly cleaner approach would be to modify the interface of your neo4j.py
script so that it uses one argument for each of its output files.
You would then modify your rule as follows:
rule neo4j:
input:
script = 'python/neo4j.py',
path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
path_to_umap = 'results/umap/{sample}_umap.csv',
path_to_mtx = 'data_files/normalized/{sample}.csv'
output:
cells = 'results/neo4j/{sample}/cells.csv',
genes = 'results/neo4j/{sample}/genes.csv',
nodes = 'results/neo4j/{sample}/cl_nodes.csv',
contains = 'results/neo4j/{sample}/cl_contains.csv',
isin = 'results/neo4j/{sample}/cl_isin.csv',
by = 'results/neo4j/{sample}/expr_by.csv',
ess = 'results/neo4j/{sample}/expr_ess.csv'
shell:
"""
python {input.script} \\
--path_to_cl {input.path_to_cl} \\
--path_to_umap {input.path_to_umap} \\
--path_to_mtx {input.path_to_mtx} \\
--cells {output.cells} \\
--genes {output.genes} \\
--nodes {output.nodes} \\
--contains {output.contains} \\
--isin {output.isin} \\
--by {output.by} \\
--ess {output.ess}
"""
Some potentially useful python modules to set up the interface of your script:
- docopt
- argparse
Edit
If you don't want to pass each input file as an individual argument, you could simply pass the output directory, and let your script build the output paths from this single parameters. Given the file names you want, this seems possible:
rule neo4j:
input:
script = 'python/neo4j.py',
path_to_cl = 'results/clusters/umap/{sample}_umap_clusters.csv',
path_to_umap = 'results/umap/{sample}_umap.csv',
path_to_mtx = 'data_files/normalized/{sample}.csv'
output:
'results/neo4j/{sample}/cells.csv',
'results/neo4j/{sample}/genes.csv',
'results/neo4j/{sample}/cl_nodes.csv',
'results/neo4j/{sample}/cl_contains.csv',
'results/neo4j/{sample}/cl_isin.csv',
'results/neo4j/{sample}/expr_by.csv',
'results/neo4j/{sample}/expr_ess.csv'
shell:
"""
python {input.script} \\
--path_to_cl {input.path_to_cl} \\
--path_to_umap {input.path_to_umap} \\
--path_to_mtx {input.path_to_mtx} \\
--out_dir results/neo4j/{wildcards.sample}
"""
回答2:
rule hello:
output:
"woot", "hoot"
run:
for f in output:
print(f)
print(output[1])
prints "woot", "hoot", "hoot".
来源:https://stackoverflow.com/questions/52088953/snakemake-passes-only-the-first-path-in-the-output-to-shell-command