问题
I was originally trying to run a Job that seemed to get stuck in a CrashBackoffLoop. Here was the service file:
apiVersion: batch/v1
kind: Job
metadata:
name: es-setup-indexes
namespace: elk-test
spec:
template:
metadata:
name: es-setup-indexes
spec:
containers:
- name: es-setup-indexes
image: appropriate/curl
command: ['curl -H "Content-Type: application/json" -XPUT http://elasticsearch.elk-test.svc.cluster.local:9200/_template/filebeat -d@/etc/filebeat/filebeat.template.json']
volumeMounts:
- name: configmap-volume
mountPath: /etc/filebeat/filebeat.template.json
subPath: filebeat.template.json
restartPolicy: Never
volumes:
- name: configmap-volume
configMap:
name: elasticsearch-configmap-indexes
I tried deleting the job but it would only work if I ran the following command:
kubectl delete job es-setup-indexes --cascade=false
After that I noticed when running:
kubectl get pods -w
I would get a TON of pods in an Error state and I see no way to clean them up. Here is just a small sample of the output when I run get pods:
es-setup-indexes-zvx9c 0/1 Error 0 20h
es-setup-indexes-zw23w 0/1 Error 0 15h
es-setup-indexes-zw57h 0/1 Error 0 21h
es-setup-indexes-zw6l9 0/1 Error 0 16h
es-setup-indexes-zw7fc 0/1 Error 0 22h
es-setup-indexes-zw9bw 0/1 Error 0 12h
es-setup-indexes-zw9ck 0/1 Error 0 1d
es-setup-indexes-zwf54 0/1 Error 0 18h
es-setup-indexes-zwlmg 0/1 Error 0 16h
es-setup-indexes-zwmsm 0/1 Error 0 21h
es-setup-indexes-zwp37 0/1 Error 0 22h
es-setup-indexes-zwzln 0/1 Error 0 22h
es-setup-indexes-zx4g3 0/1 Error 0 11h
es-setup-indexes-zx4hd 0/1 Error 0 21h
es-setup-indexes-zx512 0/1 Error 0 1d
es-setup-indexes-zx638 0/1 Error 0 17h
es-setup-indexes-zx64c 0/1 Error 0 21h
es-setup-indexes-zxczt 0/1 Error 0 15h
es-setup-indexes-zxdzf 0/1 Error 0 14h
es-setup-indexes-zxf56 0/1 Error 0 1d
es-setup-indexes-zxf9r 0/1 Error 0 16h
es-setup-indexes-zxg0m 0/1 Error 0 14h
es-setup-indexes-zxg71 0/1 Error 0 1d
es-setup-indexes-zxgwz 0/1 Error 0 19h
es-setup-indexes-zxkpm 0/1 Error 0 23h
es-setup-indexes-zxkvb 0/1 Error 0 15h
es-setup-indexes-zxpgg 0/1 Error 0 20h
es-setup-indexes-zxqh3 0/1 Error 0 1d
es-setup-indexes-zxr7f 0/1 Error 0 22h
es-setup-indexes-zxxbs 0/1 Error 0 13h
es-setup-indexes-zz7xr 0/1 Error 0 12h
es-setup-indexes-zzbjq 0/1 Error 0 13h
es-setup-indexes-zzc0z 0/1 Error 0 16h
es-setup-indexes-zzdb6 0/1 Error 0 1d
es-setup-indexes-zzjh2 0/1 Error 0 21h
es-setup-indexes-zzm77 0/1 Error 0 1d
es-setup-indexes-zzqt5 0/1 Error 0 12h
es-setup-indexes-zzr79 0/1 Error 0 16h
es-setup-indexes-zzsfx 0/1 Error 0 1d
es-setup-indexes-zzx1r 0/1 Error 0 21h
es-setup-indexes-zzx6j 0/1 Error 0 1d
kibana-kq51v 1/1 Running 0 10h
But if I look at the jobs I get nothing related to that anymore:
$ kubectl get jobs --all-namespaces
NAMESPACE NAME DESIRED SUCCESSFUL AGE
kube-system configure-calico 1 1 46d
I've also noticed that kubectl seems much slow to respond. I don't know if the pods are continuously trying to be restarted or in some broken state but would be great if someone could let me know how to troubleshoot as I have not come across another issue like this in kubernetes.
Kube info:
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:44:38Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"6", GitVersion:"v1.6.1", GitCommit:"b0b7a323cc5a4a2019b2e9520c21c7830b7f708e", GitTreeState:"clean", BuildDate:"2017-04-03T20:33:27Z", GoVersion:"go1.7.5", Compiler:"gc", Platform:"linux/amd64"}
回答1:
Here you are a quick way to fix it :)
kubectl get pods | grep Error | cut -d' ' -f 1 | xargs kubectl delete pod
Edit: Add flag -a
if you are using an old version of k8s
回答2:
kubectl delete pods --field-selector status.phase=Failed -n <your-namespace>
...cleans up any failed pods in your-namespace.
回答3:
I usually remove all the Error
pods with this command.
kubectl delete pod `kubectl get pods --namespace <yournamespace> | awk '$3 == "Error" {print $1}'` --namespace <yournamespace>
回答4:
The solution was as @johnharris85 mentioned in the comment. I had to manually delete all the pods. To do that I ran the following:
kubectl get pods -w | tee all-pods.txt
That dumped all my pods, then to filter and delete on only what I wanted.
kubectl delete pod $(more all-pods.txt | grep es-setup-index | awk '{print $1}')
Note: I had about 9292 pods, it took about 1-2 hours to delete them all.
来源:https://stackoverflow.com/questions/44379805/kubernetes-has-a-ton-of-pods-in-error-state-that-cant-seem-to-be-cleared