问题
I use Google Kubernetes Engine and I intentionally put an error in the code. I was hoping the rolling update will stop when it discovers the status is CrashLoopBackOff
, but it wasn't.
In this page, they say..
The Deployment controller will stop the bad rollout automatically, and will stop scaling up the new ReplicaSet. This depends on the rollingUpdate parameters (maxUnavailable specifically) that you have specified.
But it's not happening, is it only if the status ImagePullBackOff
?
Below is my configuration.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: volume-service
labels:
group: volume
tier: service
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 2
maxSurge: 2
template:
metadata:
labels:
group: volume
tier: service
spec:
containers:
- name: volume-service
image: gcr.io/example/volume-service:latest
P.S. I already read liveness/readiness probes, but I don't think it can stop a rolling update? or is it?
回答1:
Turns out I just need to set minReadySeconds
and it stops the rolling update when the new replicaSet has status CrashLoopBackOff
or something like Exited with status code 1
. So now the old replicaSet still available and not updated.
Here is the new config.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: volume-service
labels:
group: volume
tier: service
spec:
replicas: 4
minReadySeconds: 60
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 2
maxSurge: 2
template:
metadata:
labels:
group: volume
tier: service
spec:
containers:
- name: volume-service
image: gcr.io/example/volume-service:latest
Thank you for averyone help!
回答2:
The explanation you quoted is correct, and it means that the new replicaSet (the one with the error) will not proceed to completion, but it will be stopped in its progression to the maxSurge
+maxUnavailable
count. And the old replicaSet will be present too.
Here the example I tried with:
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
And these are the results:
NAME READY STATUS RESTARTS AGE
pod/volume-service-6bb8dd677f-2xpwn 0/1 ImagePullBackOff 0 42s
pod/volume-service-6bb8dd677f-gcwj6 0/1 ImagePullBackOff 0 42s
pod/volume-service-c98fd8d-kfff2 1/1 Running 0 59s
pod/volume-service-c98fd8d-wcjkz 1/1 Running 0 28m
pod/volume-service-c98fd8d-xvhbm 1/1 Running 0 28m
NAME DESIRED CURRENT READY AGE
replicaset.extensions/volume-service-6bb8dd677f 2 2 0 26m
replicaset.extensions/volume-service-c98fd8d 3 3 3 28m
My new replicaSet will start only 2 new pods (1 slot from the maxUnavailable
and 1 slot from the maxSurge
).
The old replicaSet will keep running 3 pods (4 - 1 unAvailable
).
The two params you set in the rollingUpdate
section are the key point, but you can play also with other factors like readinessProbe
, livenessProbe
, minReadySeconds
, progressDeadlineSeconds
.
For them, here the reference.
回答3:
I agree with @Nicola_Ben - I would also consider changing to the setup below:
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 <----- I want at least (4)-[1] = 3 available pods.
maxSurge: 1 <----- I want maximum (4)+[1] = 5 total running pods.
Or even change maxSurge
to 0
.
This will help us to expose less possibly nonfunctional pods (like we would do in canary release).
Like @Hana_Alaydrus suggested its important to setup minReadySeconds
.
With addition to that, sometimes we need to take more actions after the rollout execution.
(For example, there are cases when the new pods not functioning properly but the process running inside the container haven't crash).
A suggestion for a general debug process:
1 ) First of all, pause the rollout with:
kubectl rollout pause deployment <name>
.
2 ) Debug the relevant pods and decide how to continue (maybe we can continue with with the new release, maybe not).
3 ) We would have to resume the rollout with: kubectl rollout resume deployment <name>
because even if we decided to return to previous release with the undo
command (4.B) we need first to resume
the rollout.
4.A ) Continue with new release.
4.B ) Return to previous release with: kubectl rollout undo deployment <name>
.
Below is a visual summary (click inside in order to view the comments):
来源:https://stackoverflow.com/questions/52121422/how-to-automatically-stop-rolling-update-when-crashloopbackoff