I updated a GKE cluster from 1.13 to 1.15.9-gke.12. In the process I switched from legacy logging to Stackdriver Kubernetes Engine Monitoring. Now I have the problem that the
The issue is being caused because the LIMIT set on the metadata-agent
deployment is too low on resources so the POD is being killed (OOM killed) since the POD requires more memory to properly work.
There is a workaround for this issue until it is fixed.
You can overwrite the base resources in the configmap of the metadata-agent
with:
kubectl edit cm -n kube-system metadata-agent-config
Setting baseMemory: 50Mi
should be enough, if it doesn't work use higher value 100Mi
or 200Mi
.
So metadata-agent-config
configmap should look something like this:
apiVersion: v1
data:
NannyConfiguration: |-
apiVersion: nannyconfig/v1alpha1
kind: NannyConfiguration
baseMemory: 50Mi
kind: ConfigMap
Note also that You need to restart the deployment, as the config map doesn't get picked up automatically:
kubectl delete deployment -n kube-system stackdriver-metadata-agent-cluster-level
For more details look into addon-resizer Documentation.