Wait for kubernetes job to complete on either failure/success using command line

前端 未结 3 1693
暖寄归人
暖寄归人 2021-02-18 17:39

What is the best way to wait for kubernetes job to be complete? I noticed a lot of suggestions to use:

kubectl wait --for=condition=complete job/myjob


        
相关标签:
3条回答
  • 2021-02-18 18:02

    You can leverage the behaviour when --timeout=0.

    In this scenario, the command line returns immediately with either result code 0 or 1. Here's an example:

    retval_complete=1
    retval_failed=1
    while [[ $retval_complete -ne 0 ]] && [[ $retval_failed -ne 0 ]]; do
      sleep 5
      output=$(kubectl wait --for=condition=failed job/job-name --timeout=0 2>&1)
      retval_failed=$?
      output=$(kubectl wait --for=condition=complete job/job-name --timeout=0 2>&1)
      retval_complete=$?
    done
    
    if [ $retval_failed -eq 0 ]; then
        echo "Job failed. Please check logs."
        exit 1
    fi
    

    So when either condition=failed or condition=complete is true, execution will exit the while loop (retval_complete or retval_failed will be 0).

    Next, you only need to check and act on the condition you want. In my case, I want to fail fast and stop execution when the job fails.

    0 讨论(0)
  • 2021-02-18 18:03

    Run the first wait condition as a subprocess and capture its PID. If the condition is met, this process will exit with an exit code of 0.

    kubectl wait --for=condition=complete job/myjob &
    completion_pid=$!
    

    Do the same for the failure wait condition. The trick here is to add && exit 1 so that the subprocess returns a non-zero exit code when the job fails.

    kubectl wait --for=condition=failed job/myjob && exit 1 &
    failure_pid=$! 
    

    Then use the Bash builtin wait -n $PID1 $PID2 to wait for one of the conditions to succeed. The command will capture the exit code of the first process to exit:

    wait -n $completion_pid $failure_pid
    

    Finally, you can check the actual exit code of wait -n to see whether the job failed or not:

    exit_code=$?
    
    if (( $exit_code == 0 )); then
      echo "Job completed"
    else
      echo "Job failed with exit code ${exit_code}, exiting..."
    fi
    
    exit $exit_code
    

    Complete example:

    # wait for completion as background process - capture PID
    kubectl wait --for=condition=complete job/myjob &
    completion_pid=$!
    
    # wait for failure as background process - capture PID
    kubectl wait --for=condition=failed job/myjob && exit 1 &
    failure_pid=$! 
    
    # capture exit code of the first subprocess to exit
    wait -n $completion_pid $failure_pid
    
    # store exit code in variable
    exit_code=$?
    
    if (( $exit_code == 0 )); then
      echo "Job completed"
    else
      echo "Job failed with exit code ${exit_code}, exiting..."
    fi
    
    exit $exit_code
    
    0 讨论(0)
  • 2021-02-18 18:09

    kubectl wait --for=condition=<condition name is waiting for a specific condition, so afaik it can not specify multiple conditions at the moment.

    My workaround is using oc get --wait, --wait is closed the command if the target resource is updated. I will monitor status section of the job using oc get --wait until status is updated. Update of status section is meaning the Job is complete with some status conditions.

    If the job complete successfully, then status.conditions.type is updated immediately as Complete. But if the job is failed then the job pod will be restarted automatically regardless restartPolicy is OnFailure or Never. But we can deem the job is Failed status if not to updated as Complete after first update.

    Look the my test evidence as follows.

    • Job yaml for testing successful complete
        # vim job.yml
        apiVersion: batch/v1
        kind: Job
        metadata:
          name: pi
        spec:
          parallelism: 1
          completions: 1
          template:
            metadata:
              name: pi
            spec:
              containers:
              - name: pi
                image: perl
                command: ["perl",  "-wle", "exit 0"]
              restartPolicy: Never
    
    • It will show you Complete if it complete the job successfully.
        # oc create -f job.yml &&
          oc get job/pi -o=jsonpath='{.status}' -w &&
          oc get job/pi -o=jsonpath='{.status.conditions[*].type}' | grep -i -E 'failed|complete' || echo "Failed" 
    
        job.batch/pi created
        map[startTime:2019-03-09T12:30:16Z active:1]Complete
    
    • Job yaml for testing failed complete
        # vim job.yml
        apiVersion: batch/v1
        kind: Job
        metadata:
          name: pi
        spec:
          parallelism: 1
          completions: 1
          template:
            metadata:
              name: pi
            spec:
              containers:
              - name: pi
                image: perl
                command: ["perl",  "-wle", "exit 1"]
              restartPolicy: Never
    
    • It will show you Failed if the first job update is not Complete. Test if after delete the existing job resource.
        # oc delete job pi
        job.batch "pi" deleted
    
        # oc create -f job.yml &&
          oc get job/pi -o=jsonpath='{.status}' -w &&
          oc get job/pi -o=jsonpath='{.status.conditions[*].type}' | grep -i -E 'failed|complete' || echo "Failed" 
    
        job.batch/pi created
        map[active:1 startTime:2019-03-09T12:31:05Z]Failed
    

    I hope it help you. :)

    0 讨论(0)
提交回复
热议问题