I need to create a Docker image (and consequently containers from that image) that use large files (containing genomic data, thus reaching ~10GB in size).
How am I suppo
Is there a better way of referencing such files?
If you already have some way to distribute the data I would use a "bind mount" to attach a volume to the containers.
docker run -v /path/to/data/on/host:/path/to/data/in/container ...
That way you can change the image and you won't have to re-download the large data set each time.
If you wanted to use the registry to distribute the large data set, but want to manage changes to the data set separately, you could use a data volume container with a Dockerfile
like this:
FROM tianon/true
COPY dataset /dataset
VOLUME /dataset
From your application container you can attach that volume using:
docker run -d --name dataset
docker run --volumes-from dataset ...
Either way, I think https://docs.docker.com/engine/tutorials/dockervolumes/ are what you want.