问题
I have a Kubernetes Cronjob that runs on GKE and runs Cucumber JVM tests. In case a Step fails due to assertion failure, some resource being unavailable, etc., Cucumber rightly throws an exception which leads the Cronjob job to fail and the Kubernetes pod's status changes to ERROR
. This leads to creation of a new pod that tries to run the same Cucumber tests again, which fails again and retries again.
I don't want any of these retries to happen. If a Cronjob job fails, I want it to remain in the failed status and not retry at all. Based on this, I have already tried setting backoffLimit: 0
in combination with restartPolicy: Never
in combination with concurrencyPolicy: Forbid
, but it still retries by creating new pods and running the tests again.
What am I missing? Here's my kube manifest for the Cronjob:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: quality-apatha
namespace: default
labels:
app: quality-apatha
spec:
schedule: "*/1 * * * *"
concurrencyPolicy: Forbid
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
containers:
- name: quality-apatha
image: FOO-IMAGE-PATH
imagePullPolicy: "Always"
resources:
limits:
cpu: 500m
memory: 512Mi
env:
- name: FOO
value: BAR
volumeMounts:
- name: FOO
mountPath: BAR
args:
- java
- -cp
- qe_java.job.jar:qe_java-1.0-SNAPSHOT-tests.jar
- org.junit.runner.JUnitCore
- com.liveramp.qe_java.RunCucumberTest
restartPolicy: Never
volumes:
- name: FOO
secret:
secretName: BAR
Is there any other Kubernetes Kind
I can use to stop the retrying?
Thank you!
回答1:
To make things as simple as possible I tested it using this example from the official kubernetes documentation, applying to it minor modifications to illustrate what really happens in different scenarios.
I can confirm that when backoffLimit
is set to 0
and restartPolicy
to Never
everything works exactly as expected and there are no retries. Note that every single run of your Job
which in your example is scheduled to run at intervals of 60 seconds (schedule: "*/1 * * * *"
) IS NOT considerd a retry.
Let's take a closer look at the following example (base yaml
avialable here):
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
backoffLimit: 0
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- non-existing-command
restartPolicy: Never
It spawns new cron job every 60 seconds
according to the schedule
, no matter if it fails or runs successfully. In this particular example it is configured to fail as we are trying to run non-existing-command
.
You can check what's happening by running:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-1587558720-pgqq9 0/1 Error 0 61s
hello-1587558780-gpzxl 0/1 ContainerCreating 0 1s
As you can see there are no retries. Although the first Pod
failed, the new one is spawned exactly 60 seconds later according to our specification. I'd like to emphasize it again. This is not a retry.
On the other hand when we modify the above example and set backoffLimit: 3
, we can observe the retries. As you can see, now new Pods
are created much more often than every 60 seconds. This are retries.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-1587565260-7db6j 0/1 Error 0 106s
hello-1587565260-tcqhv 0/1 Error 0 104s
hello-1587565260-vnbcl 0/1 Error 0 94s
hello-1587565320-7nc6z 0/1 Error 0 44s
hello-1587565320-l4p8r 0/1 Error 0 14s
hello-1587565320-mjnb6 0/1 Error 0 46s
hello-1587565320-wqbm2 0/1 Error 0 34s
What we can see above are 3 retries (Pod
creation attempts), related with hello-1587565260
job and 4 retries (including the orignal 1st try not counted in backoffLimit: 3
) related with hello-1587565320
job.
As you can see the jobs themselves are still run according to the schedule, at 60 second intervals:
kubectl get jobs
NAME COMPLETIONS DURATION AGE
hello-1587565260 0/1 2m12s 2m12s
hello-1587565320 0/1 72s 72s
hello-1587565380 0/1 11s 11s
However due to our backoffLimit
set this time to 3
, every time the Pod
responsible for running the job fails, 3 additional retries occur.
I hope this helped to dispel any possible confusions about running cronJobs
in kubernetes.
If you are rather interested in running something just once, not at regular intervals, take a look at simple Job instead of CronJob
.
Also consider changing your Cron configuration if you still want to run this particular job on regular basis but let's say once in 24 h, not every minute.
来源:https://stackoverflow.com/questions/61355744/how-do-i-make-sure-my-cronjob-job-does-not-retry-on-failure