I\'ve noticed that installing Pandas and Numpy (it\'s dependency) in a Docker container using the base OS Alpine vs. CentOS or Debian takes much longer. I created a little t
Just going to bring some of these answers together in one answer and add a detail I think was missed. The reason certain python libraries, particularly optimized math and data libraries, take so long to build on alpine is because the pip wheels for these libraries include binaries precompiled from c/c++ and linked against glibc
, a common set of c standard libraries. Debian, Fedora, CentOS all (typically) use glibc
, but alpine, in order to stay lightweight, uses musl-libc
instead. c/c++ binaries build on a glibc
system will not work on a system without glibc
and the same goes for musl
.
Pip looks first for a wheel with the correct binaries, if it can't find one, it tries to compile the binaries from the c/c++ source and links them against musl. In many cases, this won't even work unless you have the python headers from python3-dev
or build tools like make
.
Now the silver lining, as others have mentioned, there are apk
packages with the proper binaries provided by the community, using these will save you the (sometimes lengthy) process of building the binaries.
pandas
is considered a community supported package, so the answers pointing to edge/testing
are not going to work as Alpine does not officially support pandas as a core package (it still works, it's just not supported by the core Alpine developers).
Try this Dockerfile:
FROM python:3.8-alpine
RUN echo "@community http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories \
&& apk add py3-pandas@community
This works for the vanilla Alpine image too, using FROM alpine:3.12
.