问题
I've tested various ways to manage my project dependencies in Python so far:
- Installing everything global with pip (saves spaces, but sooner or later gets you in trouble)
- pip & venv or virtualenv (a bit of a pain to manage, but ok for many cases)
- pipenv & pipfile (a little bit easier than venv/virtualenv, but slow and some vendor-lock, virtual envs hide somewhere else than the actual project folder)
- conda as package and environment manager (great as long as the packages are all available in conda, mixing pip & conda is a bit hacky)
- Poetry - I haven't tried this one
- ...
My problem with all of these (except 1.) is that my harddrive space fills up pretty fast: I am not a developer, I use Python for my daily work. Therefore, I have hundreds of small projects that all do their thing. Unfortunately, for 80% of projects I need the "big" packages: numpy
, pandas
, scipy
, matplotlib
- you name it. A typical small project is about 1000 to 2000 lines of code, but has 800MB of package dependencies in venv/virtualenv/pipenv. Virtually I have about 100+ GB of my HDD filled with python virtual dependencies.
Moreover, installing all of these in each virtual environment takes time. I am working in Windows, many packages cannot be easily installed from pip in windows: Shapely
, Fiona
, GDAL
- I need the precompiled wheels from Christoph Gohlke. This is easy, but it breaks most workflows (e.g. pip install -r requirements.txt
or pipenv install
from pipfile). I feel like I am 40% installing/updating package dependencies and only 60% of my time writing code. Further, none of these package managers really help with publishing & testing code, so I need other tools e.g. setuptools
, tox
, semantic-release
, twine
...
I've talked to colleagues but they all face the same problem and no one seems to have a real solution. I was wondering if there is an approach to have some packages, e.g. the ones you use in most projects, installed globally - for example, numpy
, pandas
, scipy
, matplotlib
would be installed with pip in C:\Python36\Lib\site-packages
or with conda
in C:\ProgramData\Miniconda3\Lib\site-packages
- these are well developed packages that don't often break things. And if, I would like to fix that anyway soon in my projects.
Other things would go in local virtualenv-folders - I am tempted to move my current workflow from pipenv
to conda
.
Does such an approach make sense at all? At least there has been a lot of development lately in python, perhaps something emerged that I didn't see yet.
Is there any best-practice guidance on how to setup files in such a mixed global-local environment, e.g. how to maintain setup.py
, requirements.txt
or pyproject.toml
for sharing development projects through Gitlab, Github etc.? What are the pitfalls/caveats?
There's also this great blog post from Chris Warrick that explains it pretty much fully.
[Update]
After half a year, I can say that working with Conda (Miniconda) has solved most of my problems:
- it runs on every system, WSL, Windows, native Linux etc.
conda env create -f myenv.yml
is the same on every platform - most packages are already available on conda-forge, it is easy to get own packages accepted on conda-forge
- for those packages not on conda, I can install
pip
in conda environment and add packages from pypi with pip. Hint:conda update --all -n myenv -c conda-forge
will only update packages from conda, not those installed withpip
. Pip installed dependencies must be updated manually withpip install pack_name --upgrade
. Note that installing packages with pip in conda is an emergency solution that should typically be avoided - I can create strict or open
environment.yml
, specifying the conda channel priority, the packages from conda and the packages from pip - I can create conda environments from those ymls in a single statement, e.g. to setup a dev environment in Gitlab Continuous Integration, using the
Miniconda3 Docker
- this makes test-runs very simple and straight forward - package versions in
yml
s can be defined strict or open, depending on the situation. E.g. you can fix the env to Python 3.6, but have it retrieve any security updates in this version-range (e.g. 3.6.9) - I found that conda solves almost all problems with c-compiled dependencies in Windows; conda env's in Windows do allow freezing python code into an executable (tested!) that can be distributed to Windows end-users who cannot use package managers for some reason.
- regarding the issue with "big dependencies": I ended up creating many specific (i.e. small) and a few unspecific (i.e. big) conda environments: For example, I have a quite big
jupyter_env
, where jupyter lab and most of my scientific packages are installed (numpy, geos, pandas scipy etc.) - I activate it whenever I need access to these tools, I can keep those up to date in a single place. For development of specific packages, I have extra environments that are only used for the package-dependencies (e.g.packe1_env
). I have about 10 environemnts overall, which is manageable. Some general purpose tools are installed in the base conda environment, e.g.pylint
. Be warned: to make pylint/pycodestyle/autopep8 etc. work (e.g.) in VS Code, it must be installed to the same env that contains the python-code-dependencies - otherwise, you'll get unresolved import warnings - I installed miniconda with Chocolatey package manager for windows. I keep it up to date with
conda update -n base conda
, and my envs withconda update --all -n myenv -c conda-forge
once a week, works like a charm! - New Update: there's a
--stack
flag available (as of 2019-02-07) that allows stacking conda environments, e.g.conda activate my_big_env
thenconda activate --stack dev_tools_env
allows making some general purpose packages available in many envs. However, use with caution - I found that code linters, such as pylint, must be in the same env as the dependencies of the code that is linted - New Update 2: I started using
conda
fromWindows Subsystem for Linux
(WSL), this improved again my workflow significantly: packages are installed faster, I can work with VS Code Insiders in Windows directly connected to WSL and there're far less bugs with python packages in the Linux environment. - Another Update on a side note, the Miniconda Docker allows converting local conda env workflows flawlessly into containerized infrastructure (CI & CD), tested this for a while now and pretty happy with it - the Dockerfile is cleaner than with Python Docker because conda manages more of the dependency work than pip does. I use this nowadays more and more, for example, when working with jupyter lab, which is started from within a container.
- yes, I stumbled into compatibility problems between certain packages in a conda env, but very rarely. There're two approaches: if it is an important env that must work stable, enable
conda config --env --set channel_priority strict
- this will only install versions that are compatible. With very few and rare package combinations, this may result in unsolvable dependency conflicts (i.e. the env cannot be created). In this case, I usually create smaller envs for experimental development, with less packages andchannel_priority
set toflexible
(the default). Sometimes, package subsets exists that are easier to solve such asgeoviews-core
(instead ofgeoviews
) ormatplotlib-base
(instead ofmatplotlib
). It's also a good approach to try lower python versions for those experimental envs that are unsolvable withstrict
, e.g.conda create -n jupyter_exp_env python=3.6 -c conda-forge
. A last-resort hack is installing packages with pip, which avoids conda's package resolver (but may result in unstable environments and other issues, you've been warned!). Make sure to explicitly installpip
in your env first.
One overall drawback is that conda gets kind of slow when using the large conda-forge channel. They're working on it, but at the same time conda-forge index is growing really fast.
回答1:
Problem
You have listed a number of issues that no one approach may be able to completely resolve:
- space
'I need the "big" packages: numpy, pandas, scipy, matplotlib... Virtually I have about 100+ GB of my HDD filled with python virtual dependencies'
- time
... installing all of these in each virtual environment takes time
- publishing
... none of these package managers really help with publishing & testing code ...
- workflow
I am tempted to move my current workflow from pipenv to conda.
Thankfully, what you have described is not quite the classic dependency problem that plagues package managers - circular dependencies, pinning dependencies, versioning, etc.
Details
I have used conda on Windows many years now under similar restrictions with reasonable success. Conda was orginally designed to make installing scipy-related packages easier. It still does.
If you are using the "scipy stack" (scipy, numpy, pandas, ...), conda is your most reliable choice.
Conda can:
- install scipy packages
- install C-extensions and non-Python packages (needed to run numpy and other packages)
- integrate conda packages, conda channels (you should look into this) and pip to access packages
- dependency separation with virtual environments
Conda can't:
- help with publishing code
Reproducible Envs
The following steps should help reproduce virtualenvs if needed:
- Do not install scipy packages with pip. I would rely on conda to do the heavy lifting. It is much faster and more stable. You can pip install less common packages inside conda environments.
- On occasion, a pip package may conflict with conda packages within an environment (see release notes addressing this issue).
Avoid pip-issues:
I was wondering if there is an approach to have some packages, e.g. the ones you use in most projects, installed globally ... Other things would go in local virtualenv-folders
A. Make a working environment separate from your base environment, e.g. workenv
. Consider this your goto, "global" env to do a bulk of your daily work.
> conda create -n workenv python=3.7 numpy pandas matplotblib scipy
> activate workenv
(workenv)>
B. Test installations of uncommon pip packages (or weighty conda packages) within a clone of the working env
> conda create --name testenv --clone workenv
> activate testenv
(testenv)> pip install pint
Alternatively, make new environments with minimal packages using a requirements.txt
file
C. Make a backup of dependencies into a requirements.txt
-like file called environment.yaml
per virtualenv. Optionally make a script to run this command per environment. See docs. Create environments in the future from this file:
> conda create --name testenv --file environment.yml
> activate testenv
(testenv)> conda list
Publishing
The packaging problem is an ongoing, separate issue that has gained traction with the advent of pyproject.toml
file via PEP 518 (see related blog post by author B. Cannon). Packaging tools such as flit or poetry have adopted this modern convention to make distributions and publish them to a server or packaging index (PyPI). The pyproject.toml
concept tries to move away from traditional setup.py
files with specific dependence to setuptools
.
Dependencies
Tools like pipenv and poetry have a unique modern approach to addressing the dependency problem via a "lock" file. This file allows you to track and reproduce the state of your dependency graphs, something novel in the Python packaging world so far (see more on Pipfile vs. setup.py here). Moreover, there are claims that you can still use these tools in conjunction with conda, although I have not tested the extent of these claims. The lock file isn't standardized yet, but according to core developer B. Canon in an interview on The future of Python packaging, (~33m) "I'd like to get us there."
Summary
If you are working with any package from the scipy stack, use conda (Recommended):
- To conserve space, time and workflow issues use conda or miniconda.
- To resolve deploying applications or using a "lock" file on your dependencies, consider the following in conjunction with conda:
pipenv
: use to deploy and makePipfile.lock
poetry
: use to deploy and makepoetry.lock
- To publish a library on PyPI, consider:
pipenv
: develop viapipenv install -e.
and manually publish with twineflit
: automatically package and *publishpoetry
: automatically package and publish
See Also
- Podcast interview with B. Cannon discussing the general packaging problem, pyproject.toml, lock files and tools.
- Podcast interview with K. Reitz discussing packaging tools (
pipenv
vs.pip
, 37m) and dev environment.
回答2:
I was wondering if there is an approach to have some packages, e.g. the ones you use in most projects, installed globally ... Other things would go in local virtualenv-folders
Yes, virtualenv supports this. Install the globally-needed packages globally, and then, whenever you create a virtualenv, supply the --system-site-packages option so that the resulting virtualenv will still be able to use globally-installed packages. When using tox, you can set this option in the created virtualenvs by including sitepackages=true in the appropriate [testenv]
section(s).
回答3:
An update on my progress:
Conda package manager turned out to work better for me than pipenv for the following reasons:
- by default, global dependencies are available from within conda virtual envs
- it is faster than pipenv when installing/updating dependencies
- combining pip and conda is really not that problematic, for anything where a conda package is available, install with conda, if not, simply install with pip
- by using
environment.yml
, it is possible to have a environment and dependencies re-created on both linux and windows in seconds - environment.yml allows specifying pip and conda dependencies separately (e.g. this solves the above problems with Fiona, Shapely, GDal etc. in Windows, by using conda versions) - conda solves most of the difficulties of maintaining packages/dependencies across platforms (e.g. linux, mac, win)
- it was no problem to have conda (e.g. miniconda) installed side-by-side to an independent python install and use conda through
conda run
- if environments.yml is missing, it is possible to create an env from requirements.txt (
conda create -n new environment --file requirements.txt
)
Unfortunately, the process of creating the environment.yml
seems not really described consistently anywhere. After a while, I realized that the automatically created file (conda env export environment.yml
) should be manually edited to contain the least possible list of dependencies (and let conda solve the rest on install). Otherwise, the environment.yml will be not cross-system compatible.
Anyway, this workflow solves most of my problems described above and I am kind of happy that I don't need to use pipenv or virtualenv anymore.
There're still some drawbacks,
One needs to maintain dependencies in multiple files:
- setup.py
- environment.yml
- It is not possible to execute a program directly (e.g. with a shortcut) in its environment, e.g. this works without problems with
pipenv run
, but:conda run
will not automaticallysource activate env
- this is an open issue and may be solved sometime
- cx_freeze will not correctly include global dependencies from outside conda env
- conda will be difficult if you need dependencies that require compilation (e.g. C-Extensions, etc.), see below or here
来源:https://stackoverflow.com/questions/54475042/python-dependency-hell-a-compromise-between-virtualenv-and-global-dependencies