问题
I am trying to deploy Weaviate on Azure Kubernetes Service. During the helm deployment run into a problem where I get the following error message:
Multi-Attach error for volume "pvc-69db6155-4f28-11ea-b829-b2b3d6f12b6f" Volume is already exclusively attached to one node and can't be attached to another
Unable to mount volumes for pod "esvector-master-0_weaviate(20dafc44-4f58-11ea-b829-b2b3d6f12b6f)": timeout expired waiting for volumes to attach or mount for pod "weaviate"/"esvector-master-0". list of unmounted volumes=[esvector-master]. list of unattached volumes=[esvector-master default-token-ckf7v]
The only thing I changed in values.yaml is the Storage Class Name:
pvc:
size: 2Gi
storageClassName: default
I made this change as Azure does not have an NFS class installed. Instead I used the default kubernetes class which is levering Azure Managed disks.
Does anyone have an idea on how to solve this issue? Thanks!
回答1:
We've updated our docs as they weren't complete around the topic of etcd disaster recovery in the helm chart. With the updated docs in mind, let me try to explain what's going on here:
No nfs
volumes required by default
By default Weaviate uses Persistent Volumes for its backing databases. The storage classes for those use the defaults, i.e. not nfs
. Therefore when using the default values.yaml
no nfs
support is required on the cluster.
etcd Disaster Recovery
At the time of writing this answer, one of the storage backends for Weaviate is etcd
. We use the bitnami etcd chart which is referenced in the Weaviate Chart as a subchart. Etcd does not survive a failure of a quorum of nodes (Source). Especially in a small deployment (e.g. 3 or fewer etcd pods), regular Kubernetes maintenance can easily lead to a disastrous etcd failure. To combat this, the above mentioned chart from Bitnami contains a disaster recovery mode.
Note that etcd.disasterRecovery.enabled
defaults to false, but we recommend setting it to true
in production.
Deploy an nfs
provisioner, if etcd disaster recovery is required.
The etcd disaster recovery feature which is part of the bitnami etcd helm chart requires ReadWriteMany
access for the snapshot volumes. The recommendation is to use an nfs
provisioner as outlined in the Weaviate Helm Docs.
Why is the nfs-provisioner
not part of the Weaviate chart?
It might seem counter-intuitive that the disaster recovery is a crucial part for a stable production setup, yet the provisioner is not included in the weaviate chart as a sub-chart. This has multiple reasons:
- A mix of concerns: The Weaviate chart installs Weaviate with the goal to isolate all effects to a single namespace. The
nfs-provisioner
makes cluster wide-changes that might not be entirely obvious - Multi-tenancy: We can make no assumption that your Kubernetes cluster runs only a single Weaviate instance or even only Weaviate instances. It might be a big shared cluster with multiple tenants. In this case bundling the provisioner would lead to the installation of multiple provisioners, when the cluster can and should have only a single one
- Different Lifecycles lead to circular dependencies: If the provisioner were bundled up with Weaviate it would become impossible to delete the Weaviate chart. This is because deleting the Weaviate chart, also deletes the
etcd
subchart. The latter removes thenfs
volumes used for snapshotting. However, if the bundler was part of the chart it would already have been deleting, rendering the cluster unable to deletenfs
volumes.
tl;dr: Deploy the provisioner once in a different namespace, deploy as many Weaviate instances as you like in separate namespaces. This avoids lifecycle differences, issues with multi-tenancy and circular dependencies.
来源:https://stackoverflow.com/questions/60231923/issues-while-deploying-weaviate-on-aks-azure-kubernetes-service