How to access cluster_config dict within rule?

问题

I'm working on writing a benchmarking report as part of a workflow, and one of the things I'd like to include is information about the amount of resources requested for each job.

Right now, I can manually require the cluster config file ('cluster.json') as a hardcoded input. Ideally, though, I would like to be able to access the per-rule cluster config information that is passed through the --cluster-config arg. In init.py, this is accessed as a dict called cluster_config.

Is there any way of importing or copying this dict directly into the rule?

回答1:

From the documentation, it looks like you can now use a custom wrapper script to access the job properties (including the cluster config data) when submitting the script to the cluster. Here is an example from the documentation:

#!python

#!/usr/bin/env python3
import os
import sys

from snakemake.utils import read_job_properties

jobscript = sys.argv[1]
job_properties = read_job_properties(jobscript)

# do something useful with the threads
threads = job_properties[threads]

# access property defined in the cluster configuration file (Snakemake >=3.6.0)
job_properties["cluster"]["time"]

os.system("qsub -t {threads} {script}".format(threads=threads, script=jobscript))

During submission (last line of the previous example) you could either pass the arguments you want from the cluster.json to the script or dump the dict into a JSON file, pass the location of that file to the script during submission, and parse the json file inside your script. Here is an example of how I would change the submission script to do the latter (untested code):

#!python

#!/usr/bin/env python3
import os
import sys
import tempfile
import json

from snakemake.utils import read_job_properties

jobscript = sys.argv[1]
job_properties = read_job_properties(jobscript)

job_json = tempfile.mkstemp(suffix='.json')
json.dump(job_properties, job_json)

os.system("qsub -t {threads} {script} -- {job_json}".format(threads=threads, script=jobscript, job_json=job_json))

job_json should now appear as the first argument to the job script. Make sure to delete the job_json at the end of the job.

From a comment on another answer, it appears that you are just looking to store the job_json somewhere along with the job's output. In that case, it might not be necessary to pass job_json to the job script at all. Just store it in a place of your choosing.

回答2:

You can manage the resources for the cluster easily per Rules.

Indeed you have the keyword "resources:" to use like this :

rule one:
input:     ...
output:    ...
resources: 
    gpu=1,
    time=HH:MM:SS
threads: 4
shell: "..."

You can specify the resources by the yaml configuration files for the cluster give with the parameter --cluster-config like this:

rule one:
input:     ...
output:    ...
resources: 
    time=cluster_config["one"]["time"]
threads: 4
shell: "..."

When you will call snakemake you will just have to access to the resources like this (example for slurm cluster):

snakemake --cluster "sbatch -c {threads} -t {resources.time} " --cluster-config cluster.yml

It will send each rule with its specific resources for the cluster.

For more informations, you can check the documentations with this link : http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html

Best regards

来源：https://stackoverflow.com/questions/44785833/how-to-access-cluster-config-dict-within-rule

标签

snakemake