Multiple Short-lived TPL Dataflows versus Single Long-Running Flow

问题

I'm using TPL dataflow to process items off a queue in an Azure worker role. Should I have a single long running dataflow, or spawn a new flow for every messages I receive?

If an error is thrown in a block, that block will stop accepting new messages. That means if there is an exception in a block, the whole dataflow will stop processing.

I need to be able to withstand exception from something like invalid queue inputs without locking my dataflow. I see one of two options:

I have a start a single dataflow and send messages to it as they come off the queue. The contents of each block is wrapped in a try-catch block that log the exception, then continue processing. This seems clumsy and I assume there's a better way.
For each message I start a new dataflow and process the queue message. If an exception is thrown in any block, the dataflow will complete, and I only recover a single message. Most Dataflow examples I've seen send multiple messages, so this doesn't feel right either.

I've seen lots of documentation on how to complete a dataflow after an exception, but very little on how to recover from exceptions.

回答1:

You should definitely go with the first option and have only one flow.

In the second option there isn't any added value of using a dataflow over just calling several methods one after the other. There is also an overhead of creating a full dataflow flow for each and every item.

It's better to build the flow once, and use it throughout the app's lifetime. I don't think there's anything wrong with handling exceptions per blocks, but if you want to can let the whole flow fail and only then create a new one.

来源：https://stackoverflow.com/questions/23961620/multiple-short-lived-tpl-dataflows-versus-single-long-running-flow

标签

.net

task-parallel-library

async-await

tpl-dataflow