Kubernetes Cron Job Terminate Pod before creation of next schedule

问题

I have a Kubernetes Cron Job for running a scheduled task every 5 minutes. I want to make sure that when a new pod is created at next schedule time, the earlier pod should have been terminated. The earlier pod should get terminated before creation of new. Can Kubernetes terminate the earlier pod before creation of new?

My yaml is:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: my-scheduled
spec:
  schedule: "*/5 * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 1
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cmm-callout
            env:
              - name: SCHEDULED
                value: "true"
            livenessProbe:
              httpGet:
                path: /myapp/status
                port: 7070
                scheme: HTTPS
              initialDelaySeconds: 120
              timeoutSeconds: 30
              periodSeconds: 120                
            image: gcr.io/projectid/folder/my-app:9.0.8000.34
          restartPolicy: Never

How can I make sure the earlier pod is terminated before new is created?

回答1:

If i understood your case correctly (the earlier pod should have been terminated before creation of new one).

1. Please use spec.jobTemplate.spec.activeDeadlineSeconds instead.

By setting this parameter once a Job reaches activeDeadlineSeconds - all of running Pods will be terminated and the Job status will become type: Failed with reason DeadlineExceeded.

example:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      activeDeadlineSeconds: 60
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster && sleep 420
          restartPolicy: Never

2. The second solution is to set-up concurrencyPolicy. and replace the currently running job with a new job.

example:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/2 * * * *"
  concurrencyPolicy: Replace
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            args:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster && sleep 420
          restartPolicy: Never

Resources:

Job Termination
Concurrency Policy

回答2:

Did you try to set the concurrencyPolicy to Replace? Forbid means to skip the new job run if the previous one hasn't finished yet.

https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#concurrency-policy

Allow (default): The cron job allows concurrently running jobs

Forbid: The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run

Replace: If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run

回答3:

I'm using Mark's solution with spec.jobTemplate.spec.activeDeadlineSeconds.

Just that there's one more thing into it. From the K8S docs:

Once a Job reaches activeDeadlineSeconds, all of its running Pods are terminated and the Job status will become type: Failed with reason: DeadlineExceeded.

What actually happens when the pod is terminated is that K8S triggers a SIGTERM against the POD's container process pid 0. It's not waiting for the actual process to terminate. If your container does not gracefully terminate, it's going to stay into terminating state for 30 seconds, after which K8S triggers a SIGKILL. In the meantime, K8S potentially schedules another pod so the terminating one overlaps with the new scheduled one for at most 30 seconds.

This is easily reproducible with this CronJob definition:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: cj-sleep
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 5
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      activeDeadlineSeconds: 50
      template:
        metadata:
          creationTimestamp: null
        spec:
          containers:
          - command:
                - "/usr/local/bin/bash"
                - "-c"
                - "--"
            args:
                - "tail -f /dev/null & wait $!"
            image: bash
            imagePullPolicy: IfNotPresent
            name: cj-sleep
          dnsPolicy: ClusterFirst
          restartPolicy: OnFailure
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  schedule: '* * * * *'
  startingDeadlineSeconds: 100
  successfulJobsHistoryLimit: 5

this is how the scheduling happens:

while true; do date; kubectl get pods -A | grep cj-sleep; sleep 1; done
    
Thu Sep  3 09:50:51 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Running            0          49s
Thu Sep  3 09:50:53 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          50s
Thu Sep  3 09:50:54 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          51s
Thu Sep  3 09:50:55 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          52s
Thu Sep  3 09:50:56 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          54s
Thu Sep  3 09:50:58 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          56s
Thu Sep  3 09:51:00 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          57s
Thu Sep  3 09:51:01 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          58s
Thu Sep  3 09:51:02 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          59s
Thu Sep  3 09:51:03 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating         0          60s
default                                     cj-sleep-1599126660-l69gd                                         0/1     ContainerCreating   0          0s
Thu Sep  3 09:51:04 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating         0          61s
default                                     cj-sleep-1599126660-l69gd                                         0/1     ContainerCreating   0          1s
Thu Sep  3 09:51:05 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         1/1     Terminating        0          62s
default                                     cj-sleep-1599126660-l69gd                                         1/1     Running            0          2s
    
    ....
Thu Sep  3 09:51:29 UTC 2020
default                                     cj-sleep-1599126600-kzzxg                                         0/1     Terminating        0          86s
default                                     cj-sleep-1599126660-l69gd                                         1/1     Running            0          26s
Thu Sep  3 09:51:30 UTC 2020
default                                     cj-sleep-1599126660-l69gd                                         1/1     Running            0          28s
Thu Sep  3 09:51:32 UTC 2020
default                                     cj-sleep-1599126660-l69gd                                         1/1     Running            0          29s

There is a detail specific to init 0 processes, they don't handle SIGTERM by default, you have to provide your own handler. In case of bash, it's by adding a trap:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: cj-sleep
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 5
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      activeDeadlineSeconds: 50
      template:
        metadata:
          creationTimestamp: null
        spec:
          containers:
          - command:
                - "/usr/local/bin/bash"
                - "-c"
                - "--"
            args:
                - "trap 'exit' SIGTERM; tail -f /dev/null & wait $!"
            image: bash
            imagePullPolicy: IfNotPresent
            name: cj-sleep
          dnsPolicy: ClusterFirst
          restartPolicy: OnFailure
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  schedule: '* * * * *'
  startingDeadlineSeconds: 100
  successfulJobsHistoryLimit: 5

And now this is how the scheduling happens:

Thu Sep  3 09:47:54 UTC 2020
default                                     cj-sleep-1599126420-sm887                                         1/1     Terminating        0          52s
Thu Sep  3 09:47:56 UTC 2020
default                                     cj-sleep-1599126420-sm887                                         0/1     Terminating        0          54s
Thu Sep  3 09:47:57 UTC 2020
default                                     cj-sleep-1599126420-sm887                                         0/1     Terminating        0          55s
Thu Sep  3 09:47:58 UTC 2020
default                                     cj-sleep-1599126420-sm887                                         0/1     Terminating        0          56s
Thu Sep  3 09:47:59 UTC 2020
default                                     cj-sleep-1599126420-sm887                                         0/1     Terminating        0          57s
Thu Sep  3 09:48:00 UTC 2020
default                                     cj-sleep-1599126420-sm887                                         0/1     Terminating        0          58s
Thu Sep  3 09:48:01 UTC 2020
Thu Sep  3 09:48:02 UTC 2020
default                                     cj-sleep-1599126480-rlhlw                                         0/1     ContainerCreating   0          1s
Thu Sep  3 09:48:04 UTC 2020
default                                     cj-sleep-1599126480-rlhlw                                         0/1     ContainerCreating   0          2s
Thu Sep  3 09:48:05 UTC 2020
default                                     cj-sleep-1599126480-rlhlw                                         0/1     ContainerCreating   0          3s
Thu Sep  3 09:48:06 UTC 2020
default                                     cj-sleep-1599126480-rlhlw                                         1/1     Running            0          4s

来源：https://stackoverflow.com/questions/57255323/kubernetes-cron-job-terminate-pod-before-creation-of-next-schedule

标签

Kubernetes

google-cloud-platform

YAML

google-kubernetes-engine