Best practices for (bio)conda versions in Snakemake wrappers?

六眼飞鱼酱① 提交于 2020-12-31 15:04:55

问题


What would be best environment.yml practices for specifying packages in Snakemake wrappers using conda? I understand that the channels should be:

channels:    
  - conda-forge
  - bioconda
  - base

However, what is a good choice for specifying packages? Do I specify no version? Full versions?

Using full versions has led to using infinite/super long conda environment resoultion problems before. However, not pinning versions gives the risk of implicitely upgrading to an incompatible version of a package.

Do I specify only direct dependencies or should I put the output of conda env export there so everything is frozen?


回答1:


For package version numbers, I would usually opt for pinning the major and minor version. This way, users will get the newest security patches and bug fixes whenever they create an environment, while nothing should change in a backward incompatible way (wherever developers properly follow semantic versioning).

Also, I would only specify direct dependencies and let the environment solver handle any implicit dependencies. This provides a certain level of freedom to meet different needs for different packages, while usually the packages' recipes should specify any restrictions to particular versions.

Another way to avoid (future) conflicts and keep environment creation quick, is to keep environments as small and granular as possible (see Johannes' comment below). If different rules share only some dependencies but not others, I would rather create separate minimal environments for each rule than reuse a bigger environment. Snakemake wrappers will do this anyways, as each wrapper has its own environment definition.

As Johannes pointed out, the same applies to channels: Only specify channels that you are actually using and it is not necessary to specify the base channel any more. And when using mamba, you can specify bioconda as the first channel.

Talking of mamba: If speed matters, I would currently use mamba to do the environment solving -- it is usually much faster than conda and is better at ensuring that you get the most up to date version of packages. In snakemake, you can use it via --conda-frontend mamba as also pointed out in Maarten's comment to the question.

But, of course everything always depends. If you have known incompatibilities of versions that are not handled by the packages' recipes, specifying and pinning implicit dependencies can be necessary. If you have software that creates output which can change with a patch version, then you of course have to pin the patch version.



来源:https://stackoverflow.com/questions/64594146/best-practices-for-bioconda-versions-in-snakemake-wrappers

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!