Cannot understand the deadlock scenario in `When should you use a startup probe` of kubernetes?

允我心安 提交于 2021-02-19 05:26:11

问题


In kubernetes official docs, I was reading this page (about the container probes and why we should use startup-probe)
when-should-you-use-a-startup probe, they stated like:

If your container usually starts in more than initialDelaySeconds + failureThreshold × periodSeconds, you should specify a startup probe that checks the same endpoint as the liveness probe. The default for periodSeconds is 10s. You should then set its failureThreshold high enough to allow the container to start, without changing the default values of the liveness probe. This helps to protect against deadlocks.

I understood the whole things that why we need to use startup probe (what i understood that why we need to use startup probe is that: Startup probes are useful for Pods that have containers that take a long time to come into service. As we know, all other probes are disabled if a startup probe is provided, until it succeeds. So if the container takes longer time to start up then we will use startup probe so that until the container start the other two probes remain disabled).

But here I did not get the scenario of deadlock, where and why the deadlock is happening? can anyone explain the scenario of the deadlock that they are talking about? which deadlock are we preventing by using startup probe?


回答1:


The startup probe is designed to be performed only once after container start.

Readiness probe and liveness Probe are performed not only the startup.

If a startup probe exceeds the configured failureThreshold without succeeding, the container is killed and restarted, subject to the pod's restartPolicy, a behavior analogous to the liveness probe.

Readiness probe may be used by load balancer to determine when it can send traffic.

Startup probe use-cases

The example reason to use startup probe is:

Your application is starting for a long time. You can increase delays for readiness probe and liveness probe but you do not know when your container is ready because those probes are not performed for delay time.

So startup probe is used commonly with readines and liveness probes. It is performed until your container is ready(till your startup probe returns the Success status), so you do not need delays anymore.

External dependencies

Let's say your application is starting for 1-3 minutes(it may depend on external API, resources, slow network etc.). You can put delays to 190 seconds, but you can waste at least 2 minutes if your container is ready after 60 seconds. To solve that issue startup probe was designed.

First initialization

Sometimes, you have to deal with legacy applications that might require an additional startup time on their first initialization. In such cases, it can be tricky to set up liveness probe parameters without compromising the fast response to deadlocks that motivated such a probe. The trick is to set up a startup probe with the same command, HTTP or TCP check, with a failureThreshold * periodSeconds long enough to cover the worse case startup time.

Your question

The deadlock is situation, when your container is not ready but liveness probe is performing and it exceed failure treshold, because of too short delay time. In this situation your container keeps restarting. To prevent that you should use startup probe and put your threshold high enough.




回答2:


I am now fully clear about my question. So I would like to explain the full scenario that i understood (hope it will help others in future). The answer of @Daniel is correct, but i just want to explain it in more comprehensively.

Explanation of the Terms:

  1. initialDelaySeconds: Number of seconds after the container has started before the probe is scheduled, which means after this the defined probes will schedule.
  2. failureThreshold: The number of times that the probe is allowed to fail before the liveness probe restarts the container (or in the case of a readiness probe marks the pod as unavailable)
  3. periodSeconds: It means in every periodSeconds the kubelet will perform the scheduled probe.
  4. initialDelaySeconds + failureThreshold × periodSeconds: total time, after that the scheduled probe will take action according to their characteristics (liveness probe restarts the container, or in the case of a readiness probe marks the pod as unavailable)

as from @Daniel comment, Remember also, that all probes has separated failureThreshold and periodSeconds. So for liveness probe those values can be small to kill container as fast as it is not working properly, For starup probe values can be higher to wait long enough for startup.

How deadlock is happening

Now, if the startup probe is not used and the container takes longer than the total (initialDelaySeconds + failureThreshold × periodSeconds) time to start then before the container get started up then liveness probe will restart the container through kubelet as long as initialDelaySeconds + failureThreshold × periodSeconds time passed, because If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy.

And same scenario will happen again while restarting the container, and it will happen again and again and the container won't be able to start up in every time. Here the deadlock happens.

So the deadlock is happening because the container was taking longer than initialDelaySeconds + failureThreshold × periodSeconds time to start up and we did not used startup probe here.

Preventing the deadlock

Now to prevent the deadlock we can do two things:

  1. We can give high liveness interval, but as the time that container can take is not fixed that is why this approach is not better approach.

  2. We can use startup probe, As we know, all other probes are disabled if a startup probe is provided, until it succeeds. So if we use startup probe we don't need to thing about the deadlock that was happening before.

Now only another thing is left, that is we need to give high failureThreshold because the startup probe can also fail if the contianer take longer than initialDelaySeconds + failureThreshold × periodSeconds time to start up (here one thing need to be clear that initialDelaySeconds + failureThreshold × periodSeconds is general formula and it is calculated for all the probes respectively). So we also need to set high failureThreshold while using startup probe. By this way, we can completely solve the deadlock problem and also can guarantee that the container will get enough time to start up.



来源:https://stackoverflow.com/questions/65846097/cannot-understand-the-deadlock-scenario-in-when-should-you-use-a-startup-probe

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!