For a TPL Dataflow: How do I get my hands on all the output produced by a TransformBlock while blocking until all inputs have been processed?

问题

I'm submitting a series of select statements (queries - thousands of them) to a single database synchronously and getting back one DataTable per query (Note: This program is such that it has knowledge of the DB schema it is scanning only at run time, hence the use of DataTables). The program runs on a client machine and connects to DBs on a remote machine. It takes a long time to run so many queries. So, assuming that executing them async or in parallel will speed things up, I'm exploring TPL Dataflow (TDF). I want to use the TDF library because it seems to handle all of the concerns related to writing multi-threaded code that would otherwise need to be done by hand.

The code shown is based on http://blog.i3arnon.com/2016/05/23/tpl-dataflow/. Its minimal and is just to help me understand the basic operations of TDF. Please do know I've read many blogs and coded many iterations trying to crack this nut.

None-the-less, with this current iteration, I have one problem and a question:

Problem

The code is inside a button click method (Using a UI, a user selects a machine, a sql instance, and a database, and then kicks off the scan). The two lines with the await operator return an error at build time: The 'await' operator can only be used within an async method. Consider marking this method with the 'async' modifier and changing its return type to 'Task'. I can't change the return type of the button click method. Do I need to somehow isolate the button click method from the async-await code?

Question

Although I've found beau-coup write-ups describing the basics of TDF, I can't find an example of how to get my hands on the output that each invocation of the TransformBlock produces (i.e., a DataTable). Although I want to submit the queries async, I do need to block until all queries submitted to the TransformBlock are completed. How do I get my hands on the series of DataTables produced by the TransformBlock and block until all queries are complete?

Note: I acknowledge that I have only one block now. At a minimum, I'll be adding a cancellation block and so do need/want to use TPL.

private async Task ToolStripButtonStart_Click(object sender, EventArgs e)
{

    UserInput userInput = new UserInput
    {
        MachineName = "gat-admin",
        InstanceName = "",
        DbName = "AdventureWorks2014",
    };

    DataAccessLayer dataAccessLayer = new DataAccessLayer(userInput.MachineName, userInput.InstanceName);

    //CreateTableQueryList gets a list of all tables from the DB and returns a list of 
    // select statements, one per table, e.g., SELECT * from [schemaname].[tablename]
    IList<String> tableQueryList = CreateTableQueryList(userInput);

    // Define a block that accepts a select statement and returns a DataTable of results
    // where each returned record is: schemaname + tablename + columnname + column datatype + field data
    // e.g., if the select query returns one record with 5 columns, then a datatable with 5 
    // records (one per field) will come back 

    var transformBlock_SubmitTableQuery = new TransformBlock<String, Task<DataTable>>(
        async tableQuery => await dataAccessLayer._SubmitSelectStatement(tableQuery),
        new ExecutionDataflowBlockOptions
        {
            MaxDegreeOfParallelism = 2,
        });

    // Add items to the block and start processing
    foreach (String tableQuery in tableQueryList)
    {
        await transformBlock_SubmitTableQuery.SendAsync(tableQuery);
    }

    // Enable the Cancel button and disable the Start button.
    toolStripButtonStart.Enabled = false;
    toolStripButtonStop.Enabled = true;

    //shut down the block (no more inputs or outputs)
    transformBlock_SubmitTableQuery.Complete();

    //await the completion of the task that procduces the output DataTable
    await transformBlock_SubmitTableQuery.Completion;
}

public async Task<DataTable> _SubmitSelectStatement(string queryString )
{
    try
    {

        .
        .
        await Task.Run(() => sqlDataAdapter.Fill(dt));

        // process dt into the output DataTable I need

        return outputDt;
    }
    catch
    {
        throw;
    }

}

回答1:

As it turns out, to meet my requirements, TPL Dataflow is a bit overkill. I was able to meet my requirements using async/await and Task.WhenAll. I used the Microsoft How-To How to: Extend the async Walkthrough by Using Task.WhenAll (C#) as a model.

Regarding my "Problem"

My "problem" is not a problem. An event method signature (in my case, a "Start" button click method that initiates my search) can be modified to be async. In the Microsoft How-To GetURLContentsAsync solution, see the startButton_Click method signature:

private async void startButton_Click(object sender, RoutedEventArgs e)  
{  
    .
    .
}

Regarding my question

Using Task.WhenAll, I can wait for all my queries to finish then process all the outputs for use on my UI. In the Microsoft How-To GetURLContentsAsync solution, see the SumPageSizesAsync method, i.e,, the array of int named lengths is the sum of all outputs.

private async Task SumPageSizesAsync()  
{  
    .
    .
    // Create a query.   
    IEnumerable<Task<int>> downloadTasksQuery = from url in urlList select ProcessURLAsync(url);  

    // Use ToArray to execute the query and start the download tasks.  
    Task<int>[] downloadTasks = downloadTasksQuery.ToArray();  

    // Await the completion of all the running tasks.  
    Task<int[]> whenAllTask = Task.WhenAll(downloadTasks);  

    int[] lengths = await whenAllTask;  
    .
    .
}

回答2:

The correct way to retrieve the output of a TransformBlock is to perform a nested loop using the methods OutputAvailableAsync and TryReceive. It is a bit messy, so you could consider hiding this complexity from your application code by copy-pasting the extension method below in some static class of your project:

public static async Task<List<T>> ToListAsync<T>(this IReceivableSourceBlock<T> block,
    CancellationToken cancellationToken = default)
{
    var list = new List<T>();
    while (await block.OutputAvailableAsync(cancellationToken).ConfigureAwait(false))
    {
        while (block.TryReceive(out var item))
        {
            list.Add(item);
        }
    }
    await block.Completion.ConfigureAwait(false); // Propagate possible exception
    return list;
}

Then you could use the ToListAsync method like this:

private async Task ToolStripButtonStart_Click(object sender, EventArgs e)
{
    var transformBlock = new TransformBlock<string, DataTable>(async query => //...
    //...
    transformBlock.Complete();

    foreach (DataTable dataTable in await transformBlock.ToListAsync())
    {
        // Do something with the dataTable
    }
}

If you have upgraded your project to C# 8 then you have also the option to retrieve the output in a streaming fashion, as an IAsyncEnumerable:

public static async IAsyncEnumerable<T> ToAsyncEnumerable<T>(
    this IReceivableSourceBlock<T> block,
    [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
    while (await block.OutputAvailableAsync(cancellationToken).ConfigureAwait(false))
    {
        while (block.TryReceive(out var item))
        {
            yield return item;
        }
    }
    await block.Completion.ConfigureAwait(false); // Propagate possible exception
}

This way you will be able to get your hands to each DataTable immediately after it has been cooked, without having to wait for the processing of all queries. To consume an IAsyncEnumerable you simply move the await before the foreach:

await foreach (DataTable dataTable in transformBlock.ToAsyncEnumerable())
{
    // Do something with the dataTable
}

来源：https://stackoverflow.com/questions/49389273/for-a-tpl-dataflow-how-do-i-get-my-hands-on-all-the-output-produced-by-a-transf

标签

task-parallel-library

tpl-dataflow