Error conditions and retries in gearman?

血红的双手。 提交于 2019-12-04 17:16:10

问题


Can someone guide me on how gearman does retries when exceptions are thrown or when errors occur?

I use the python gearman client in a Django app and my workers are initiated as a Django command. I read from this blog post that retries from error conditions are not straight forward and that it requires sys.exit from the worker side.

Has this been fixed to retry perhaps with sendFail or sendException? Also does gearman support retries with exponentials algorithm – say if an SMTP failure happens its retries after 2,4,8,16 seconds etc?


回答1:


To my understanding, Gearman employs a very "it's not my business" approach - e.g., it does not intervene with jobs performed, unless workers crash. Any success / failure messages are supposed to be handled by the client, not Gearman server itself.

In foreground jobs, this implies that all sendFail() / sendException() and other send*() are directed to the client and it's up to the client to decide whether to retry the job or not. This makes sense as sometimes you might not need to retry.

In background jobs, all the send*() functions lose their meaning, as there is no client that would be listening to the callbacks. As a result, the messages sent will be just ignored by Gearman. The only condition on which the job will be retried is when the worker crashes (which can by emulated with a exit(XX) command, where XX is a non-zero value). This, of course, is not something you want to do, because workers are usually supposed to be long-running processes, not the ones that have to be restarted after each unsuccessful job.

Personally, I have solved this problem by extending the default GearmanJob class, where I intercept the calls to send*() functions and then implementing the retry mechanism myself. Essentially, I pass all the retry-related data (max number of retries, times already retried) together with a workload and then handle everything myself. It is a bit cumbersome, but I understand why Gearman works this way - it just allows you to handle all the application logic.

Finally, regarding the ability to retry jobs with exponential timeout (or any timeout for that matter). Gearman has a feature to add delayed jobs (look for SUBMIT_JOB_EPOCH in the protocol documentation), yet I am not sure about its status - the PHP extension and, I think, the Python module do not support it and the docs say it can be removed in the future. But I understand it works at the moment - you just need to submit raw socket requests to Gearman to make it happen (and the exponential part should be implemented on your side, too).

However, this blog post argues that SUBMIT_JOB_EPOCH implementation does not scale well. He uses node.js and setTimeout() to make it work, I've seen others use the unix utility at to do the same. In any way - Gearman will not do it for you. It will focus on reliability, but will let you focus on all the logic.



来源:https://stackoverflow.com/questions/8870132/error-conditions-and-retries-in-gearman

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!