Monitoring and alerting on pod status or restart with Google Container Engine (GKE) and Stackdriver

邮差的信 提交于 2020-08-01 03:07:52

问题


Is there a way to monitor the pod status and restart count of pods running in a GKE cluster with Stackdriver?

While I can see CPU, memory and disk usage metrics for all pods in Stackdriver there seems to be no way of getting metrics about crashing pods or pods in a replica set being restarted due to crashes.

I'm using a Kubernetes replica set to manage the pods, hence they are respawned and created with a new name when they crash. As far as I can tell the metrics in Stackdriver appear by pod-name (which is unique for the lifetime of the pod) which doesn't sound really sensible.

Alerting upon pod failures sounds like such a natural thing that it sounds hard to believe that this is not supported at the moment. The monitoring and alerting capabilities that I get from Stackdriver for Google Container Engine as they stand seem to be rather useless as they are all bound to pods whose lifetime can be very short.

So if this doesn't work out of the box are there known workarounds or best practices on how to monitor for continuously crashing pods?


回答1:


In my cluster (a bare-metal k8s cluster),I use kube-state-metrics https://github.com/kubernetes/kube-state-metrics to do what you want. This project belongs to kubernetes repo and it is quite easy to use. Once deployed u can use kube_pod_container_status_restarts this metrics to know if a container restarts




回答2:


You can achieve this manually with the following:

1) In Logs Viewer, creating the following filter:

resource.labels.project_id="<PROJECT_ID>"
resource.labels.cluster_name="<CLUSTER_NAME>"
resource.labels.namespace_name="<NAMESPACE, or default>"
jsonPayload.message:"Killing container"

2) Create a metric by clicking on the Create Metric button above the filter input and filling up the details.

3) You may now track this metric in Stackdriver.

Would be happy to be informed of a built-in metric instead of this.




回答3:


Remember that, you can always raise feature request if the options available are not enough.



来源:https://stackoverflow.com/questions/43789276/monitoring-and-alerting-on-pod-status-or-restart-with-google-container-engine-g

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!