Issues while deploying Weaviate on AKS (Azure Kubernetes Service)

Deadly 提交于 2020-04-16 06:08:21

问题


I am trying to deploy Weaviate on Azure Kubernetes Service. During the helm deployment run into a problem where I get the following error message:

Multi-Attach error for volume "pvc-69db6155-4f28-11ea-b829-b2b3d6f12b6f" Volume is already exclusively attached to one node and can't be attached to another
Unable to mount volumes for pod "esvector-master-0_weaviate(20dafc44-4f58-11ea-b829-b2b3d6f12b6f)": timeout expired waiting for volumes to attach or mount for pod "weaviate"/"esvector-master-0". list of unmounted volumes=[esvector-master]. list of unattached volumes=[esvector-master default-token-ckf7v]

The only thing I changed in values.yaml is the Storage Class Name:

pvc:
  size: 2Gi
  storageClassName: default

I made this change as Azure does not have an NFS class installed. Instead I used the default kubernetes class which is levering Azure Managed disks.

Does anyone have an idea on how to solve this issue? Thanks!


回答1:


We've updated our docs as they weren't complete around the topic of etcd disaster recovery in the helm chart. With the updated docs in mind, let me try to explain what's going on here:

No nfs volumes required by default

By default Weaviate uses Persistent Volumes for its backing databases. The storage classes for those use the defaults, i.e. not nfs. Therefore when using the default values.yaml no nfs support is required on the cluster.

etcd Disaster Recovery

At the time of writing this answer, one of the storage backends for Weaviate is etcd. We use the bitnami etcd chart which is referenced in the Weaviate Chart as a subchart. Etcd does not survive a failure of a quorum of nodes (Source). Especially in a small deployment (e.g. 3 or fewer etcd pods), regular Kubernetes maintenance can easily lead to a disastrous etcd failure. To combat this, the above mentioned chart from Bitnami contains a disaster recovery mode.

Note that etcd.disasterRecovery.enabled defaults to false, but we recommend setting it to true in production.

Deploy an nfs provisioner, if etcd disaster recovery is required.

The etcd disaster recovery feature which is part of the bitnami etcd helm chart requires ReadWriteMany access for the snapshot volumes. The recommendation is to use an nfs provisioner as outlined in the Weaviate Helm Docs.

Why is the nfs-provisioner not part of the Weaviate chart?

It might seem counter-intuitive that the disaster recovery is a crucial part for a stable production setup, yet the provisioner is not included in the weaviate chart as a sub-chart. This has multiple reasons:

  • A mix of concerns: The Weaviate chart installs Weaviate with the goal to isolate all effects to a single namespace. The nfs-provisioner makes cluster wide-changes that might not be entirely obvious
  • Multi-tenancy: We can make no assumption that your Kubernetes cluster runs only a single Weaviate instance or even only Weaviate instances. It might be a big shared cluster with multiple tenants. In this case bundling the provisioner would lead to the installation of multiple provisioners, when the cluster can and should have only a single one
  • Different Lifecycles lead to circular dependencies: If the provisioner were bundled up with Weaviate it would become impossible to delete the Weaviate chart. This is because deleting the Weaviate chart, also deletes the etcd subchart. The latter removes the nfs volumes used for snapshotting. However, if the bundler was part of the chart it would already have been deleting, rendering the cluster unable to delete nfs volumes.

tl;dr: Deploy the provisioner once in a different namespace, deploy as many Weaviate instances as you like in separate namespaces. This avoids lifecycle differences, issues with multi-tenancy and circular dependencies.



来源:https://stackoverflow.com/questions/60231923/issues-while-deploying-weaviate-on-aks-azure-kubernetes-service

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!