问题
I'm using TPL dataflow to process items off a queue in an Azure worker role. Should I have a single long running dataflow, or spawn a new flow for every messages I receive?
If an error is thrown in a block, that block will stop accepting new messages. That means if there is an exception in a block, the whole dataflow will stop processing.
I need to be able to withstand exception from something like invalid queue inputs without locking my dataflow. I see one of two options:
- I have a start a single dataflow and send messages to it as they come off the queue. The contents of each block is wrapped in a try-catch block that log the exception, then continue processing. This seems clumsy and I assume there's a better way.
- For each message I start a new dataflow and process the queue message. If an exception is thrown in any block, the dataflow will complete, and I only recover a single message. Most Dataflow examples I've seen send multiple messages, so this doesn't feel right either.
I've seen lots of documentation on how to complete a dataflow after an exception, but very little on how to recover from exceptions.
回答1:
You should definitely go with the first option and have only one flow.
In the second option there isn't any added value of using a dataflow over just calling several methods one after the other. There is also an overhead of creating a full dataflow flow for each and every item.
It's better to build the flow once, and use it throughout the app's lifetime. I don't think there's anything wrong with handling exceptions per blocks, but if you want to can let the whole flow fail and only then create a new one.
来源:https://stackoverflow.com/questions/23961620/multiple-short-lived-tpl-dataflows-versus-single-long-running-flow