问题
I have a Kubernetes Cron Job for running a scheduled task every 5 minutes. I want to make sure that when a new pod is created at next schedule time, the earlier pod should have been terminated. The earlier pod should get terminated before creation of new. Can Kubernetes terminate the earlier pod before creation of new?
My yaml is:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: my-scheduled
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: cmm-callout
env:
- name: SCHEDULED
value: "true"
livenessProbe:
httpGet:
path: /myapp/status
port: 7070
scheme: HTTPS
initialDelaySeconds: 120
timeoutSeconds: 30
periodSeconds: 120
image: gcr.io/projectid/folder/my-app:9.0.8000.34
restartPolicy: Never
How can I make sure the earlier pod is terminated before new is created?
回答1:
If i understood your case correctly (the earlier pod should have been terminated before creation of new one).
1. Please use spec.jobTemplate.spec.activeDeadlineSeconds instead.
By setting this parameter once a Job reaches activeDeadlineSeconds - all of running Pods will be terminated and the Job status will become type: Failed with reason DeadlineExceeded.
example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
activeDeadlineSeconds: 60
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster && sleep 420
restartPolicy: Never
2. The second solution is to set-up concurrencyPolicy. and replace the currently running job with a new job.
example:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/2 * * * *"
concurrencyPolicy: Replace
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster && sleep 420
restartPolicy: Never
Resources:
Job Termination
Concurrency Policy
回答2:
Did you try to set the concurrencyPolicy to Replace? Forbid means to skip the new job run if the previous one hasn't finished yet.
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#concurrency-policy
Allow (default): The cron job allows concurrently running jobs
Forbid: The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run
Replace: If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run
回答3:
I'm using Mark's solution with spec.jobTemplate.spec.activeDeadlineSeconds.
Just that there's one more thing into it. From the K8S docs:
Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.
What actually happens when the pod is terminated is that K8S triggers a SIGTERM against the POD's container process pid 0. It's not waiting for the actual process to terminate. If your container does not gracefully terminate, it's going to stay into terminating state for 30 seconds, after which K8S triggers a SIGKILL. In the meantime, K8S potentially schedules another pod so the terminating one overlaps with the new scheduled one for at most 30 seconds.
This is easily reproducible with this CronJob definition:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cj-sleep
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 5
jobTemplate:
metadata:
creationTimestamp: null
spec:
activeDeadlineSeconds: 50
template:
metadata:
creationTimestamp: null
spec:
containers:
- command:
- "/usr/local/bin/bash"
- "-c"
- "--"
args:
- "tail -f /dev/null & wait $!"
image: bash
imagePullPolicy: IfNotPresent
name: cj-sleep
dnsPolicy: ClusterFirst
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '* * * * *'
startingDeadlineSeconds: 100
successfulJobsHistoryLimit: 5
this is how the scheduling happens:
while true; do date; kubectl get pods -A | grep cj-sleep; sleep 1; done
Thu Sep 3 09:50:51 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Running 0 49s
Thu Sep 3 09:50:53 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 50s
Thu Sep 3 09:50:54 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 51s
Thu Sep 3 09:50:55 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 52s
Thu Sep 3 09:50:56 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 54s
Thu Sep 3 09:50:58 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 56s
Thu Sep 3 09:51:00 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 57s
Thu Sep 3 09:51:01 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 58s
Thu Sep 3 09:51:02 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 59s
Thu Sep 3 09:51:03 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 60s
default cj-sleep-1599126660-l69gd 0/1 ContainerCreating 0 0s
Thu Sep 3 09:51:04 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 61s
default cj-sleep-1599126660-l69gd 0/1 ContainerCreating 0 1s
Thu Sep 3 09:51:05 UTC 2020
default cj-sleep-1599126600-kzzxg 1/1 Terminating 0 62s
default cj-sleep-1599126660-l69gd 1/1 Running 0 2s
....
Thu Sep 3 09:51:29 UTC 2020
default cj-sleep-1599126600-kzzxg 0/1 Terminating 0 86s
default cj-sleep-1599126660-l69gd 1/1 Running 0 26s
Thu Sep 3 09:51:30 UTC 2020
default cj-sleep-1599126660-l69gd 1/1 Running 0 28s
Thu Sep 3 09:51:32 UTC 2020
default cj-sleep-1599126660-l69gd 1/1 Running 0 29s
There is a detail specific to init 0 processes, they don't handle SIGTERM by default, you have to provide your own handler. In case of bash, it's by adding a trap:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: cj-sleep
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 5
jobTemplate:
metadata:
creationTimestamp: null
spec:
activeDeadlineSeconds: 50
template:
metadata:
creationTimestamp: null
spec:
containers:
- command:
- "/usr/local/bin/bash"
- "-c"
- "--"
args:
- "trap 'exit' SIGTERM; tail -f /dev/null & wait $!"
image: bash
imagePullPolicy: IfNotPresent
name: cj-sleep
dnsPolicy: ClusterFirst
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '* * * * *'
startingDeadlineSeconds: 100
successfulJobsHistoryLimit: 5
And now this is how the scheduling happens:
Thu Sep 3 09:47:54 UTC 2020
default cj-sleep-1599126420-sm887 1/1 Terminating 0 52s
Thu Sep 3 09:47:56 UTC 2020
default cj-sleep-1599126420-sm887 0/1 Terminating 0 54s
Thu Sep 3 09:47:57 UTC 2020
default cj-sleep-1599126420-sm887 0/1 Terminating 0 55s
Thu Sep 3 09:47:58 UTC 2020
default cj-sleep-1599126420-sm887 0/1 Terminating 0 56s
Thu Sep 3 09:47:59 UTC 2020
default cj-sleep-1599126420-sm887 0/1 Terminating 0 57s
Thu Sep 3 09:48:00 UTC 2020
default cj-sleep-1599126420-sm887 0/1 Terminating 0 58s
Thu Sep 3 09:48:01 UTC 2020
Thu Sep 3 09:48:02 UTC 2020
default cj-sleep-1599126480-rlhlw 0/1 ContainerCreating 0 1s
Thu Sep 3 09:48:04 UTC 2020
default cj-sleep-1599126480-rlhlw 0/1 ContainerCreating 0 2s
Thu Sep 3 09:48:05 UTC 2020
default cj-sleep-1599126480-rlhlw 0/1 ContainerCreating 0 3s
Thu Sep 3 09:48:06 UTC 2020
default cj-sleep-1599126480-rlhlw 1/1 Running 0 4s
来源:https://stackoverflow.com/questions/57255323/kubernetes-cron-job-terminate-pod-before-creation-of-next-schedule