Can conda environment inherit base packages?

后端 未结 1 1768
没有蜡笔的小新
没有蜡笔的小新 2021-02-08 06:14

I\'m looking for a solution where environments do inherit from root, but searching for the answer there seems to be a lot of confusion. Many OP questions believe they a

相关标签:
1条回答
  • 2021-02-08 06:52

    Should Conda environments inherit base packages?

    No. The recommended workflow is to use conda create --clone to create a new standalone environment, and then mutate that environment to add additional packages. Alternatively, one can dump the template environment to a YAML (conda env export > env.yaml), edit it to include or remove packages, and then create a new environment from that (conda env create -f env.yaml -n foo).

    Concern about this wasting storage is unfounded in most situations.1 There can be a mirage of new environments taking up more space than they really do, due to Conda's use of hardlinks to minimize redundancy. A more detailed analysis of this can be found in the question, Why are packages installed rather than just linked to a specific environment?.

    Can Conda environments inherit base packages?

    It's not supported, but it's possible. First, let's explicitly state that nested activation of Conda environments via the conda activate --stack command does not enable or help to allow inheritance of Python packages across environments. This is because it does not manipulate PYTHONPATH, but instead only keeps the previous active environment on PATH and skips the deactivate scripts. A more detailed discussion of this is available in this GitHub Issue.

    Now that we've avoided that red herring, let's talk about PYTHONPATH. One can use this environment variable to include additional site-packages directories to search. So, naively, something like

    conda activate foo
    PYTHONPATH=$CONDA_ROOT/lib/python3.7/site-packages python
    

    should launch Python with the packages of both base and foo available to it. A key constraint for this to work is that the Python in the new environment must match that of base up to and including the minor version (in this case 3.7.*).

    Thinking through the details

    While this will achieve package inheritance, we need to consider: Will this actually conserve space? I'd argue that in practice it likely won't, and here's why.

    Presumably, we don't want to physically duplicate the Python installation, but the new environment must have a Python installed in order to help constrain solving for the new packages we want. To do this, we should not only match the Python version (conda create -n foo python=3.7), but rather the exact same build as base:

    # first check base's python
    conda list -n base '^python$'
    # EXAMPLE RESULT
    # Name                    Version                   Build  Channel
    python                    3.7.6                h359304d_2 
    
    # use this when creating the environment
    conda create -n foo python=3.7.6=h359304d_2
    

    This will let Conda do its linking thing and use the same physical copy in both environments. However, there is no guarantee that Python's dependencies will also reuse the packages in base. In fact, if any compatible newer versions are available, it will download and install those.

    Furthermore, let's say that we now install scikit-learn:

    conda install -n foo scikit-learn
    

    This again is going to check for the newest versions of it and its dependencies, independent of whether older but compatible versions of those dependencies are already available through base. So, more packages are unnecessarily being installed into the package cache.

    The pattern here seems to be that we really want to find a way to have the foo env install new packages, but use as many of the existing packages to satisfy dependencies. And that is exactly what conda create --clone already does.2

    Hence, I lose the motivation to bother with inheritance altogether.


    Note

    I'd speculate that for the special case of pure Python packages it may be plausible to use pip install --target from the base environment to install packages compatible with base to a location outside of base. The user could then add this directory to PYTHONPATH before launching python from base.

    This would not be my first choice. I know the clone strategy is manageable; I wouldn't know what to expect with this going long-term.


    [1] This will hold as long as the locations of the package cache (pkgs_dirs) and where the environment is created (which defaults to envs_dirs) are on the same volume. Configurations with multiple volumes should be using softlinks, which will ultimately have the same effect. Unless one has manually disabled linking of both types, Conda will do a decent job at silently minimizing redundancy.

    [2] Technically, one might also have a stab at using the --offline flag to force Conda to use what it already has cached. However, the premise of OP is that the additional package is new, so it may not be wise to assume we already have a compatible version in the cache.

    0 讨论(0)
提交回复
热议问题