Issues while deploying Weaviate on AKS (Azure Kubernetes Service)

问题

I am trying to deploy Weaviate on Azure Kubernetes Service. During the helm deployment run into a problem where I get the following error message:

Multi-Attach error for volume "pvc-69db6155-4f28-11ea-b829-b2b3d6f12b6f" Volume is already exclusively attached to one node and can't be attached to another
Unable to mount volumes for pod "esvector-master-0_weaviate(20dafc44-4f58-11ea-b829-b2b3d6f12b6f)": timeout expired waiting for volumes to attach or mount for pod "weaviate"/"esvector-master-0". list of unmounted volumes=[esvector-master]. list of unattached volumes=[esvector-master default-token-ckf7v]

The only thing I changed in values.yaml is the Storage Class Name:

pvc:
  size: 2Gi
  storageClassName: default

I made this change as Azure does not have an NFS class installed. Instead I used the default kubernetes class which is levering Azure Managed disks.

Does anyone have an idea on how to solve this issue? Thanks!

回答1:

We've updated our docs as they weren't complete around the topic of etcd disaster recovery in the helm chart. With the updated docs in mind, let me try to explain what's going on here:

No `nfs` volumes required by default

By default Weaviate uses Persistent Volumes for its backing databases. The storage classes for those use the defaults, i.e. not nfs. Therefore when using the default values.yaml no nfs support is required on the cluster.

etcd Disaster Recovery

At the time of writing this answer, one of the storage backends for Weaviate is etcd. We use the bitnami etcd chart which is referenced in the Weaviate Chart as a subchart. Etcd does not survive a failure of a quorum of nodes (Source). Especially in a small deployment (e.g. 3 or fewer etcd pods), regular Kubernetes maintenance can easily lead to a disastrous etcd failure. To combat this, the above mentioned chart from Bitnami contains a disaster recovery mode.

Note that etcd.disasterRecovery.enabled defaults to false, but we recommend setting it to true in production.

Deploy an `nfs` provisioner, if etcd disaster recovery is required.

The etcd disaster recovery feature which is part of the bitnami etcd helm chart requires ReadWriteMany access for the snapshot volumes. The recommendation is to use an nfs provisioner as outlined in the Weaviate Helm Docs.

Why is the `nfs-provisioner` not part of the Weaviate chart?

It might seem counter-intuitive that the disaster recovery is a crucial part for a stable production setup, yet the provisioner is not included in the weaviate chart as a sub-chart. This has multiple reasons:

A mix of concerns: The Weaviate chart installs Weaviate with the goal to isolate all effects to a single namespace. The nfs-provisioner makes cluster wide-changes that might not be entirely obvious
Multi-tenancy: We can make no assumption that your Kubernetes cluster runs only a single Weaviate instance or even only Weaviate instances. It might be a big shared cluster with multiple tenants. In this case bundling the provisioner would lead to the installation of multiple provisioners, when the cluster can and should have only a single one
Different Lifecycles lead to circular dependencies: If the provisioner were bundled up with Weaviate it would become impossible to delete the Weaviate chart. This is because deleting the Weaviate chart, also deletes the etcd subchart. The latter removes the nfs volumes used for snapshotting. However, if the bundler was part of the chart it would already have been deleting, rendering the cluster unable to delete nfs volumes.

tl;dr: Deploy the provisioner once in a different namespace, deploy as many Weaviate instances as you like in separate namespaces. This avoids lifecycle differences, issues with multi-tenancy and circular dependencies.

来源：https://stackoverflow.com/questions/60231923/issues-while-deploying-weaviate-on-aks-azure-kubernetes-service

标签

weaviate

Issues while deploying Weaviate on AKS (Azure Kubernetes Service)

问题

回答1:

No nfs volumes required by default

etcd Disaster Recovery

Deploy an nfs provisioner, if etcd disaster recovery is required.

Why is the nfs-provisioner not part of the Weaviate chart?

No `nfs` volumes required by default

Deploy an `nfs` provisioner, if etcd disaster recovery is required.

Why is the `nfs-provisioner` not part of the Weaviate chart?