How do I detect unexpected worker role failures and reprocess data in those cases?

后端 未结 4 1338
梦谈多话
梦谈多话 2021-01-21 23:01

I want to create a web service hosted in Windows Azure. The clients will upload files for processing, the cloud will process those files, produce resulting files, the client wil

4条回答
  •  生来不讨喜
    2021-01-21 23:25

    I believe this problem is non technology specific.
    Since your processing jobs are long running, I suggest these jobs should report their progress during execution. In this way a job which has not reported progress for a substantial substantial duration becomes a clear candidate for cleanup and then can be restarted on another worker role.
    How you record progress and do job swapping is upto you. One approach is to use database as recording mechanism and creating an agent worker process that pings the job progress table. In case the worker process determines any problems it can take corrective actions.

    Other approach would be to associate the worker role identification with the long running process. The worker roles can communicate their health status using some sort of heart beat.
    Had the jobs not been long running you could have flagged the start time of job instead on status flag and could have used the timeout mechanism to determine whether the processing has failed.

提交回复
热议问题