How do I detect unexpected worker role failures and reprocess data in those cases?

后端 未结 4 1351
梦谈多话
梦谈多话 2021-01-21 23:01

I want to create a web service hosted in Windows Azure. The clients will upload files for processing, the cloud will process those files, produce resulting files, the client wil

4条回答
  •  猫巷女王i
    2021-01-21 23:31

    The problem you describe is best handled with Azure Queues, as Azure Table Storage won't give you any type of management mechanism.

    Using Azure Queues, you set a timeout when you get an item of the queue (default: 30 seconds). Once you read a queue item (e.g. "process file x waiting for you in blob at url y"), that queue item becomes invisible for the time period specified. This means that other worker role instances won't try to grab it at the same time. Once you complete processing, you simply delete the queue item.

    Now: Let's say you're almost done and haven't deleted the queue item yet. All of a sudden, your role instance unexpectedly crashes (or the hardware fails, or you're rebooted for some reason). The queue-item processing code has now stopped. Eventually, when time passes since originally reading the queue item, equivalent to the timeout value you set, that queue item becomes visible again. One of your worker role instances will once again read the queue item and can process it.

    A few things to keep in mind:

    • Queue items have a dequeue count. Pay attention to this. Once you hit a certain number of dequeues for a specific queue item (I like to use 3 times as my limit), you should move this queue item to a 'poison queue' or table storage for offline evaluation - there could be something wrong with the message or the process around handling that message.
    • Make sure your processing is idempotent (e.g. you can process the same message multiple times with no side-effects)
    • Because a queue item can go invisible and then return to visibility later, queue items don't necessary get processed in FIFO order.

    EDIT: Per Ryan's answer - Azure queue messages max out at a 2-hour timeout. Service Bus queue messages have a far-greater timeout. This feature just went CTP a few days ago.

提交回复
热议问题