How to trace-back exact software version(s) used to generate result-files in a snakemake workflow

问题

Say I'm following the best practise workflow suggested for snakemake. Now I'd like to know how (i.e. which version) a given file, say plots/myplot.pdf, was generated. I found this surprisingly hard if not impossible only having the result folder at hand.

In more detail, say I was generated the results using. snakemake --use-conda --conda-prefix ~/.conda/myenvs which will resolve and download the conda-environments specified in the rule below (copied from the documentation):

rule NAME:
    input:
        "table.txt"
    output:
        "plots/myplot.pdf"
    conda:
        "envs/ggplot.yaml"
    script:
        "scripts/plot-stuff.R"

Say the content of envs/ggplot.yaml is the following:

channels:
  - conda-forge
dependencies:
  - r-ggplot2

After completion the ggplot environment will have been saved under say (note, the env name d2d1d57b assigned by snakemake automatically): ~/.conda/myevns/d2d1d57b

The problem is that if I ship the workflow subfolder e.g. as the result to someone else (or as supplement to a paper), I don't know what ggplot version was used for that run. All I know is the content of the yaml file (which is also reported when using --reports.). Also, since ggplot depends on other software, such as for instance R, I wouldn't know which R version was used for a given rule using this environment, since yaml file doesn't list indirect dependencies.

Ideally, I'd like want to have the complete environment software version shipped with the workflow results. As a workaround one could use conda env export name_of_env and copy the output in the result folder, but strangly conda list -n ~/.conda/myevns/d2d1d57b does not work ( due to error Characters not allowed: ('/', ' ', ':', '#'))

Creating a environment manually and inspecting indeed gives me (among other info):

r-base                    4.0.2                he766273_1    conda-forge
r-ggplot2                 3.3.2             r40h6115d3f_0    conda-forge

That's exactly what I'm after, but this of course would be too tedious manually.

This is also true when using wrappers as far as I can tell.

In summary, given a workflow or even for a given file within the workflow, how to trace back which exact software version(s) were used to generate it. Ideally, this information would be automatically shipped with the result of a workflow by default.

Maybe I'm even missing something very obvious, so hopefully someone can shed some light on this.

回答1:

Based on our discussion in the comments, you could redirect your environment to a log file:

rule NAME:
    input:
        "table.txt"
    output:
        "plots/myplot.pdf"
    log:
        "mylog.txt"
    conda:
        "envs/ggplot.yaml"
    shell:
        """
        conda env export > {log} 
        yourcode
        """

~~However as you indicate this won't work if people do not use --use-conda, plus it is tedious to add this to each rule, so you could try something like this (not tested, might not work):~~

if workflow.use_conda:
    shell.prefix("set -o pipefail; conda env export > {log}; ")

Which adds the export to each shell command!

Now if you use scripts, I am not so sure anymore how to continue. "easiest" might be to just call "conda env export" in a shell command inside python/R

edit

the shell prefix trick does not seem to work, so I striked through the text.

回答2:

As @Maarten-vd-Sande mentioned, version should be specified in the conda env file. Just as you may have thought, you will also need to define r-base and its version in conda env file so as to ensure the use of specific version of R. See here for an example from a snakemake-wrapper.

As part of best practices towards reproducible research, it is highly recommended to specify tool versions in conda env files. Snakemake-wrappers typically follow this rule, but you might find some not following this.

来源：https://stackoverflow.com/questions/64043879/how-to-trace-back-exact-software-versions-used-to-generate-result-files-in-a-s

标签

conda

snakemake

reproducible-research