handling failure after maximum number of retries in google app engine task queues

问题

I am using google-api-python-client and I am using google app engine task queues for some async operations.

For the specific task queue, I am also setting max number of times that the task should be retried(In my case retries are less likely to be successful, so I want to limit them).

Is there a way to write a handler which can handle the case where the task is still failing even after the specified number of retries?

Basically if my retry limit is 5, after 5 unsuccessful retries, I want to move the task to a different queue where it can be retried more number of times with a larger interval between the retries, that way it is more likely to succeed.

From here I believe that I can use X-AppEngine-TaskExecutionCount header in each retry and write some custom logic to know when the task is going to execute for the last time and achieve this but I am trying find out if there is any cleaner way.

By the way X-AppEngine-TaskExecutionCount specifies(from the doc), The number of times this task has previously failed during the execution phase. This number does not include failures due to a lack of available instance.

回答1:

At least presently there is no support for automatically moving a task from one queue to another.

One option is to keep the task on the same queue, increase the max number of retries and use the retry_parameters to customize the retry backoff policy (i.e. the increase of time between retries):

retry_parameters

Optional. Configures retry attempts for failed tasks. This addition allows you to specify the maximum number of times to retry failed tasks in a specific queue. You can also set a time limit for retry attempts and control the interval between attempts.

The retry parameters can contain the following subelements:

task_retry_limit

The maximum number of retry attempts for a failed task. If specified with task_age_limit, App Engine retries the task until both limits are reached. If 0 is specified, the task will not be retried.

task_age_limit (push queues)

The time limit for retrying a failed task, measured from when the task was first run. The value is a number followed by a unit of time, where the unit is s for seconds, m for minutes, h for hours, or d for days. For example, the value 5d specifies a limit of five days after the task's first execution attempt. If specified with task_retry_limit, App Engine retries the task until both limits are reached.

min_backoff_seconds (push queues)

The minimum number of seconds to wait before retrying a task after it fails.

max_backoff_seconds (push queues)

The maximum number of seconds to wait before retrying a task after it fails.

max_doublings (push queues)

The maximum number of times that the interval between failed task retries will be doubled before the increase becomes constant. The constant is: 2**max_doublings * min_backoff_seconds**.

But the pattern of the increase will be gradual - doubling after each failure, you can't get a significant "step"-like increase of the time between retries. Still, it may be a good enough solution for which no additional coding is required. Personally I'd go for this approach.

Another approach is to add that logic to determine if that execution is the final retry of the original task and, if so, enqueue a new corresponding task on a different queue which has the desired "slower" retry policy. I'm unsure if this is what you were referring to in the question and wanted to avoid.

来源：https://stackoverflow.com/questions/46979539/handling-failure-after-maximum-number-of-retries-in-google-app-engine-task-queue

标签

python-2.7

google-app-engine

task-queue