We can have a data volume in docker:
$ docker run -v /path/to/data/in/container --name test_container debian
$ docker inspect test_container
...
Mounts\": [
The difference between host directory and a data volume is in that that Docker manages the latter by placing it into the $DOCKER-DATA-DIR/volumes
directory and attaching a reference to it (names or randomly generated ids). That is you get a little bit of convenience.
Both host directories and data volumes are directories on the host. Both are host dependent. You can't reference either of them in a Dockerfile
; the VOLUME
directive creates a new nameless (with randomly generated id) volume every time you launch a new container and cannot reference an existing volume.
* $DOCKER-DATA-DIR
is /var/lib/docker
here unless you changed the defaults.
is it any different from having the data in a folder mounted using -v /path/to/data/in/container:/home/user/a_good_place_to_have_data?
It is because, as mentioned in "Mount a host directory as a data volume"
The host directory is, by its nature, host-dependent. For this reason, you can’t mount a host directory from Dockerfile because built images should be portable. A host directory wouldn’t be available on all potential hosts.
If you have some persistent data that you want to share between containers, or want to use from non-persistent containers, it’s best to create a named Data Volume Container, and then to mount the data from it.
You can combine both approaches:
docker run --volumes-from dbdata -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata
Here we’ve launched a new container and mounted the volume from the
dbdata
container.
We’ve then mounted a local host directory as/backup
.
Finally, we’ve passed a command that usestar
to backup the contents of thedbdata
volume to abackup.tar
file inside our/backup
directory. When the command completes and the container stops we’ll be left with a backup of ourdbdata
volume.
Although when using them it feels the same, with the only change of the location of the directory, there is a different.
Volumes vs Bind Mounts
Volumes advantages over bind mounts:
EDIT (9.9.2019):
According to @Sebi2020 comment, Bind mounts are much easier to backup. Docker doesn't provide any command to backup volumes. You have to use temporary containers with a bind mount to create backups.
Volumes
Created and managed by Docker. You can create a volume explicitly using the docker volume create command, or Docker can create a volume during container or service creation.
When you create a volume, it is stored within a directory on the Docker host. When you mount the volume into a container, this directory is what is mounted into the container. This is similar to the way that bind mounts work, except that volumes are managed by Docker and are isolated from the core functionality of the host machine.
A given volume can be mounted into multiple containers simultaneously. When no running container is using a volume, the volume is still available to Docker and is not removed automatically. You can remove unused volumes using docker volume prune.
When you mount a volume, it may be named or anonymous. Anonymous volumes are not given an explicit name when they are first mounted into a container, so Docker gives them a random name that is guaranteed to be unique within a given Docker host. Besides the name, named and anonymous volumes behave in the same ways.
Volumes also support the use of volume drivers, which allow you to store your data on remote hosts or cloud providers, among other possibilities.
Bind mounts
Available since the early days of Docker. Bind mounts have limited functionality compared to volumes. When you use a bind mount, a file or directory on the host machine is mounted into a container. The file or directory is referenced by its full path on the host machine. The file or directory does not need to exist on the Docker host already. It is created on demand if it does not yet exist. Bind mounts are very performant, but they rely on the host machine’s filesystem having a specific directory structure available. If you are developing new Docker applications, consider using named volumes instead. You can’t use Docker CLI commands to directly manage bind mounts.
There is also tmpfs mounts
.
tmpfs mounts
A tmpfs mount is not persisted on disk, either on the Docker host or within a container. It can be used by a container during the lifetime of the container, to store non-persistent state or sensitive information. For instance, internally, swarm services use tmpfs mounts to mount secrets into a service’s containers.
Reference:
https://docs.docker.com/storage/
Yes, this is quiet different from a few perspectives. Like you wrote in the question's title, it is all about understanding why we need data volumes vs bind mount to host.
Lets take 2 scenarios.
Case 1: Web server.
We want to provide our web server a configuration file that might change frequently.
For example: exposing ports according to the current environment.
We can rebuild the image each time with the relevant setup or create 2 different images for each environment. Both of this solutions aren’t very efficient.
With Bind mounts Docker mounts the given source directory into a location inside the container.
(The original directory / file in the read-only layer inside the union file system will simply be overridden).
For example - binding a dynamic port to nginx:
version: "3.7"
services:
web:
image: nginx:alpine
volumes:
- type: bind #<-----Notice the type
source: ./mysite.template
target: /etc/nginx/conf.d/mysite.template
ports:
- "9090:8080"
environment:
- PORT=8080
command: /bin/sh -c "envsubst < /etc/nginx/conf.d/mysite.template >
/etc/nginx/conf.d/default.conf && exec nginx -g 'daemon off;'"
(*) Notice that this example could also be solved using Volumes.
Case 2 : Databases.
Docker containers do not store persistent data: any data that will be written to the writable layer in container’s union file system will be lost once the container stop running.
But what if we have a database running on a container, and the container stops - that means that all the data will be lost?
Volumes to the rescue.
Those are named file system trees which are managed for us by Docker.
For example - persisting Postgres SQL data:
services:
db:
image: postgres:latest
volumes:
- "dbdata:/var/lib/postgresql/data"
volumes:
- type: volume #<-----Notice the type
source: dbdata
target: /var/lib/postgresql/data
volumes:
dbdata:
Notice that in this case, for named volumes, the source is the name of the volume (For anonymous volumes, this field is omitted).
Differences in management and isolation on the host
Bind mounts exist on the host file system and being managed by the host maintainer.
Applications / processes outside of Docker can also modify it.
Volumes can also be implemented on the host, but Docker will manage them for us and they can not be accessed outside of Docker.
Volumes are a much wider solution
Although both solutions help us to separate the data lifecycle from containers, by using Volumes you gain much more power and flexibility over your system.
With Volumes we can design our data effectively and decouple it from other parts of the system by storing it dedicated remote locations (Cloud for example) and integrate it with external services like backups, monitoring, encryption and hardware management.