Prevent killing some pods when scaling down possible?

后端 未结 2 1825
[愿得一人]
[愿得一人] 2020-12-31 12:53

I need to scale a set of pods that run queue-based workers. Jobs for workers can run for a long time (hours) and should not get interrupted. The number of pods is based on t

相关标签:
2条回答
  • 2020-12-31 13:11

    During the process of termination of a pod, Kubernetes sends a SIGTERM signal to the container of your pod. You can use that signal to gracefully shutdown your app. The problem is that Kubernetes does not wait forever for your application to finish and in your case your app may take a long time to exit.
    In this case I recommend you use a preStop hook, which is completed before Kubernetes sends the KILL signal to the container. There is an example here on how to use handlers:

    apiVersion: v1
    kind: Pod
    metadata:
      name: lifecycle-demo
    spec:
      containers:
      - name: lifecycle-demo-container
        image: nginx
        lifecycle:
          postStart:
            exec:
              command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
          preStop:
            exec:
              command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]
    
    0 讨论(0)
  • 2020-12-31 13:20

    There is a kind of workaround that can give some control over the pod termination. Not quite sure if it the best practice, but at least you can try it and test if it suits your app.

    1. Increase the Deployment grace period with terminationGracePeriodSeconds: 3600 where 3600 is the time in seconds of the longest possible task in the app. This makes sure that the pods will not be terminated by the end of the grace period. Read the docs about the pod termination process in detail.
    2. Define a preStop handler. More details about lifecycle hooks can be found in docs as well as in the example. In my case, I've used the script below to create the file which will later be used as a trigger to terminate the pod (probably there are more elegant solutions).
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "touch /home/node/app/preStop"]
      
      
    3. Stop your app running as soon as the condition is met. When the app exits the pod terminates as well. It is not possible to end the process with PID 1 from preStop shell script so you need to add some logic to the app to terminate itself. In my case, it is a NodeJS app, there is a scheduler that is running every 30 seconds and checks whether two conditions are met. !isNodeBusy identifies whether it is allowed to finish the app and fs.existsSync('/home/node/app/preStop') whether preStop hook was triggered. It might be different logic for your app but you get the basic idea.
      schedule.scheduleJob('*/30 * * * * *', () => {
        if(!isNodeBusy && fs.existsSync('/home/node/app/preStop')){
          process.exit();
        }
      });
      

    Keep in mind that this workaround works only with voluntary disruptions and obviously not helpful with involuntary disruptions. More info in docs.

    0 讨论(0)
提交回复
热议问题