I am evaluating Kubernetes as a platform for our new application. For now, it looks all very exciting! However, I’m running into a problem: I’m hosting my cluster on GCE an
Update: The best choice is probably Cloud Filestore, a managed NFS system. This gives you full random read/write access to files, unlike GCS which only supports upload/download. See docs here.
Original: Have you tried Google Cloud Storage? You might even be able to use the FUSE adapter to map it like a network disk.
I just achieve this with an application made with 3 containerized micro-services, I have one of this that is responsible to store and share files, so the application is storing files and retrieving them on a folder, this folder is passed via application property. There is a secured rest entry point that is allowing submission and retrieving of files (basically at every submission it is creating a unique id that is returned and can be used to scan the folder for a file). Passing this application from docker-compose to kubernetes I had your same problem : I need a global disk so I can have multiple replica of the container, so when the other micro-services will send a request to one of the replica, they will always be able to send any submitted file, not only the replica file managed at submission. I solved by creating a persistent volume, associated to an persistent volume claim, this volume claim is associated to a deployment (note: not a Statefulset, that it will create a disk for every pod), at this point you have to associate the mounted volume path with the container storing folder path.
So what is important is just the persistent volume claim name and the fact that PV has more available memory of PVC, and obviously the matching with the deployment with the right labels. Then in the deployment you can pass in the spec:
volumes:
- name: store-folder
persistentVolumeClaim:
claimName: [pvc_name]
into the container settings:
volumeMounts:
- name: store-folder
mountPath: "/stored-files"
and in env. block:
containers:
....
- env:
- name: any-property-used-inside-the-application-for-saving-files
value: /stored-files
So, from volume, you bind the pvc to the deployment, than from volume mounts, you bind the disk to a directory, than via environment variable you are able to pass the persistent disk directory. It is fundamental that your declare both PVC and PV, without PV it will work like any pods has its own folder.
If it is logs that you are looking to write to disk, I suggest you look at logspout https://github.com/gliderlabs/logspout. This will collect each pod's logging and then you can use google cloud platforms' fairly new logging service that uses fluentd. That way all logs from each pod are collected into a single place.
If it is data that would normally write to a database or something of that nature, I recommend having a separate server outside of the kubernetes cluster that runs the database.
EDIT
For sharing files amongst pods, I recommend mounting a google cloud storage drive to each node in your kubernetes cluster, then setting that up as a volume into each pod that mounts to that mounted directory on the node and not directly to the drive. Having it mount to each node is good because pods do not run on designated nodes, so it's best to centralize it in that case.
NFS is a built-in volume plugin and supports multiple pod writers. There are no special build options to get NFS working in Kube.
I work at Red Hat on Kubernetes, focused mainly on storage.
@Marco - in regards to the Maven related question my advice would be to stop looking at this as a centralized storage problem and perhaps think of it as a service issue.
I've run Maven repositories under HTTP in the past (read-only). I would simply create a Maven repo and expose it over Apache/Nginx in its own pod (docker container) with what ever dedicated storage you need for just that pod and then use service discovery to link it to your application and build systems.
Have you looked at kubernetes Volumes ? You are probably looking at creating a gcePersistentDisk
A gcePersistentDisk volume mounts a Google Compute Engine (GCE) Persistent Disk into your pod. Unlike emptyDir, which is erased when a Pod is removed, the contents of a PD are preserved and the volume is merely unmounted. This means that a PD can be pre-populated with data, and that data can be “handed off” between pods. Important: You must create a PD using gcloud or the GCE API or UI before you can use it There are some restrictions when using a gcePersistentDisk: the nodes on which pods are running must be GCE VMs those VMs need to be in the same GCE project and zone as the PD A feature of PD is that they can be mounted as read-only by multiple consumers simultaneously. This means that you can pre-populate a PD with your dataset and then serve it in parallel from as many pods as you need. Unfortunately, PDs can only be mounted by a single consumer in read-write mode - no simultaneous writers allowed. Using a PD on a pod controlled by a ReplicationController will fail unless the PD is read-only or the replica count is 0 or 1.
To support multiple writes from various pods you will probably need to create one beefy pod which exposes a thrift or socket types service which exposes readFromDisk and WriteToDisk methods.