Install pandas in a Dockerfile

主宰稳场 提交于 2020-05-28 13:50:14

问题


I am trying to create a Docker image. The Dockerfile is the following:

# Use the official Python 3.6.5 image
FROM python:3.6.5-alpine3.7

# Set the working directory to /app
WORKDIR /app

# Get the 
COPY requirements.txt /app
RUN pip3 install --no-cache-dir -r requirements.txt

# Configuring access to Jupyter
RUN mkdir /notebooks
RUN jupyter notebook --no-browser --ip 0.0.0.0 --port 8888 /notebooks

The requirements.txt file is:

jupyter
numpy==1.14.3
pandas==0.23.0rc2
scipy==1.0.1
scikit-learn==0.19.1
pillow==5.1.1
matplotlib==2.2.2
seaborn==0.8.1

Running the command docker build -t standard . gives me an error when docker it trying to install pandas. The error is the following:

Collecting pandas==0.23.0rc2 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/46/5c/a883712dad8484ef907a2f42992b122acf2bcecbb5c2aa751d1033908502/pandas-0.23.0rc2.tar.gz (12.5MB)
    Complete output from command python setup.py egg_info:
    /bin/sh: svnversion: not found
    /bin/sh: svnversion: not found
    non-existing path in 'numpy/distutils': 'site.cfg'
    Could not locate executable gfortran
    ... (loads of other stuff)
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-xb6f6a5o/pandas/
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1

When I try to install a lower version of pandas==0.22.0, I get this error:

Step 5/7 : RUN pip3 install --no-cache-dir -r requirements.txt
 ---> Running in 5810ea896689
Collecting jupyter (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/83/df/0f5dd132200728a86190397e1ea87cd76244e42d39ec5e88efd25b2abd7e/jupyter-1.0.0-py2.py3-none-any.whl
Collecting numpy==1.14.3 (from -r requirements.txt (line 2))
  Downloading https://files.pythonhosted.org/packages/b0/2b/497c2bb7c660b2606d4a96e2035e92554429e139c6c71cdff67af66b58d2/numpy-1.14.3.zip (4.9MB)
Collecting pandas==0.22.0 (from -r requirements.txt (line 3))
  Downloading https://files.pythonhosted.org/packages/08/01/803834bc8a4e708aedebb133095a88a4dad9f45bbaf5ad777d2bea543c7e/pandas-0.22.0.tar.gz (11.3MB)
  Could not find a version that satisfies the requirement Cython (from versions: )
No matching distribution found for Cython
The command '/bin/sh -c pip3 install --no-cache-dir -r requirements.txt' returned a non-zero code: 1

I also tried to install Cyphon and setuptools before pandas, but it gave the same No matching distribution found for Cython error at the pip3 install pandas line.

How could I get pandas installed.


回答1:


Alpine don't contain build tools by default. Install build tool and create symbolic link for locale:

$ apk add --update curl gcc g++
$ ln -s /usr/include/locale.h /usr/include/xlocale.h
$ pip install numpy

Based on https://wired-world.com/?p=100




回答2:


I realize this question has been answered, but I have recently had a similar issue with numpy and pandas dependancies with a dockerized project. That being said, I hope that this will be of benefit to someone in the future.

My solution:

As pointed out by Aviv Sela, Alpine does not contain build tools by default and will need to be added though the Dockerfile. Thus see below my Dockerfile with the build packages required for numpy and pandas for be successfully installed on Alpine for the container.

FROM python:3.6-alpine3.7

RUN apk add --no-cache --update \
    python3 python3-dev gcc \
    gfortran musl-dev g++ \
    libffi-dev openssl-dev \
    libxml2 libxml2-dev \
    libxslt libxslt-dev \
    libjpeg-turbo-dev zlib-dev

RUN pip install --upgrade pip

ADD requirements.txt .
RUN pip install -r requirements.txt

The requirements.txt

numpy==1.17.1
pandas==0.25.1



回答3:


I could create the Docker image now. There must have been some version incompatibilities between FROM python:3.6.5-alpine3.7 and pandas. I changed the Python version to FROM python:3, then it worked fine (also had to downgrade the pillow version to 5.1.0).




回答4:


You're probably going to be better off building from a pandas image instead of base python. This will make iteration must faster and easier, because you won't ever have to reinstall pandas. I like amancevince/pandas ( https://hub.docker.com/r/amancevice/pandas/tags ). There are Alpine and Debian images available for every pandas tag, although I think they may all be python 3.7 now.




回答5:


Using a new version of python that is not yet supported with pandas will result in problems.

I found it does not work with a development version of Python:

FROM python:3.9.0a6-buster


RUN apt-get update && \
    apt-get -y install python3-pandas

COPY requirements.txt ./ 
RUN pip3 install --no-cache-dir -r 

requirements.txt:

numpy==1.18
pandas

I found it DOES work with an officially released version of Python:

FROM python:3.8-buster


来源:https://stackoverflow.com/questions/50190676/install-pandas-in-a-dockerfile

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!