How to use C#8 IAsyncEnumerable to async-enumerate tasks run in parallel

后端 未结 5 1840
孤独总比滥情好
孤独总比滥情好 2021-02-09 15:00

If possible I want to create an async-enumerator for tasks launched in parallel. So first to complete is first element of the enumeration, second to finish is second element of

相关标签:
5条回答
  • 2021-02-09 15:14

    Is this what you're looking for?

    public static async IAsyncEnumerable<T> ParallelEnumerateAsync<T>(
        this IEnumerable<Task<T>> tasks)
    {
        var remaining = new List<Task<T>>(tasks);
    
        while (remaining.Count != 0)
        {
            var task = await Task.WhenAny(remaining);
            remaining.Remove(task);
            yield return (await task);
        }
    }
    
    0 讨论(0)
  • 2021-02-09 15:19

    In case you wanna take an async stream (IAsyncEnumerable) and run Select in parallel so the first to finish is the first to come out:

    /// <summary>
    /// Runs the selectors in parallel and yields in completion order
    /// </summary>
    public static async IAsyncEnumerable<TOut> SelectParallel<TIn, TOut>(
        this IAsyncEnumerable<TIn> source,
        Func<TIn, Task<TOut>> selector)
    {
        if (source == null)
        {
            throw new InvalidOperationException("Source is null");
        }
    
        var enumerator = source.GetAsyncEnumerator();
    
        var sourceFinished = false;
        var tasks = new HashSet<Task<TOut>>();
    
        Task<bool> sourceMoveTask = null;
        Task<Task<TOut>> pipeCompletionTask = null;
    
        try
        {
            while (!sourceFinished || tasks.Any())
            {
                if (sourceMoveTask == null && !sourceFinished)
                {
                    sourceMoveTask = enumerator.MoveNextAsync().AsTask();
                }
    
                if (pipeCompletionTask == null && tasks.Any())
                {
                    pipeCompletionTask = Task.WhenAny<TOut>(tasks);
                }
    
                var coreTasks = new Task[] { pipeCompletionTask, sourceMoveTask }
                    .Where(t => t != null)
                    .ToList();
    
                if (!coreTasks.Any())
                {
                    break;
                }
    
                await Task.WhenAny(coreTasks);
    
                if (sourceMoveTask != null && sourceMoveTask.IsCompleted)
                {
                    sourceFinished = !sourceMoveTask.Result;
    
                    if (!sourceFinished)
                    {
                        try
                        {
                            tasks.Add(selector(enumerator.Current));
                        }
                        catch { }
                    }
    
                    sourceMoveTask = null;
                }
                
                if (pipeCompletionTask != null && pipeCompletionTask.IsCompleted)
                {
                    var completedTask = pipeCompletionTask.Result;
    
                    if (completedTask.IsCompletedSuccessfully)
                    {
                        yield return completedTask.Result;
                    }
    
                    tasks.Remove(completedTask);
                    pipeCompletionTask = null;
                }
            }
        }
        finally
        {
            await enumerator.DisposeAsync();
        }
    }
    

    Can be used like the following:

        static async Task Main(string[] args)
        {
            var source = GetIds();
            var strs = source.SelectParallel(Map);
    
            await foreach (var str in strs)
            {
                Console.WriteLine(str);
            }
        }
    
        static async IAsyncEnumerable<int> GetIds()
        {
            foreach (var i in Enumerable.Range(1, 20))
            {
                await Task.Delay(200);
                yield return i;
            }
        }
    
        static async Task<string> Map(int id)
        {
            await Task.Delay(rnd.Next(1000, 2000));
            return $"{id}_{Thread.CurrentThread.ManagedThreadId}";
        }
    

    Possible output:

    [6:31:03 PM] 1_5
    [6:31:03 PM] 2_6
    [6:31:04 PM] 3_6
    [6:31:04 PM] 6_4
    [6:31:04 PM] 5_4
    [6:31:04 PM] 4_5
    [6:31:05 PM] 8_6
    [6:31:05 PM] 7_6
    [6:31:05 PM] 11_6
    [6:31:05 PM] 10_4
    [6:31:05 PM] 9_6
    [6:31:06 PM] 14_6
    [6:31:06 PM] 12_4
    [6:31:06 PM] 13_4
    [6:31:06 PM] 15_4
    [6:31:07 PM] 17_4
    [6:31:07 PM] 20_4
    [6:31:07 PM] 16_6
    [6:31:07 PM] 18_6
    [6:31:08 PM] 19_6
    
    0 讨论(0)
  • 2021-02-09 15:22

    Here is a version that also allows to specify the maximum degree of parallelism. The idea is that the tasks are enumerated with a lag. For example for degreeOfParallelism: 4 the first 4 tasks are enumerated immediately, causing them to be created, and then the first one of these is awaited. Next the 5th task is enumerated and the 2nd is awaited, and so on.

    To keep things tidy, the Lag method is embedded inside the ParallelEnumerateAsync method as a static local function (new feature of C# 8).

    public static async IAsyncEnumerable<TResult> ParallelEnumerateAsync<TResult>(
        this IEnumerable<Task<TResult>> tasks, int degreeOfParallelism)
    {
        if (degreeOfParallelism < 1)
            throw new ArgumentOutOfRangeException(nameof(degreeOfParallelism));
    
        if (tasks is ICollection<Task<TResult>>) throw new ArgumentException(
            "The enumerable should not be materialized.", nameof(tasks));
    
        foreach (var task in Lag(tasks, degreeOfParallelism - 1))
        {
            yield return await task.ConfigureAwait(false);
        }
    
        static IEnumerable<T> Lag<T>(IEnumerable<T> source, int count)
        {
            var queue = new Queue<T>();
            using (var enumerator = source.GetEnumerator())
            {
                int index = 0;
                while (enumerator.MoveNext())
                {
                    queue.Enqueue(enumerator.Current);
                    index++;
                    if (index > count) yield return queue.Dequeue();
                }
            }
            while (queue.Count > 0) yield return queue.Dequeue();
        }
    }
    

    Note: this implementation is flawed regarding maintaining a consistent degree of parallelism. It depends on all tasks having similar completion durations. A single long running task will eventually drop the degree of parallelism to one, until it is completed.

    0 讨论(0)
  • 2021-02-09 15:27

    My take on this task. Borrowed heavily from other answers in this topic, but with (hopefully) some enhancements. So the idea is to start tasks and put them in a queue, same as in the other answers, but like Theodor Zoulias, I'm also trying to limit the max degree of parallelism. However I tried to overcome the limitation he mentioned in his comment by using task continuation to queue the next task as soon as any of the previous tasks completes. This way we are maximizing the number of simultaneously running tasks, within the configured limit, of course.

    I'm not an async expert, this solution might have multithreading deadlocks and other Heisenbugs, I did not test exception handling etc, so you've been warned.

    public static async IAsyncEnumerable<TResult> ExecuteParallelAsync<TResult>(IEnumerable<Task<TResult>> coldTasks, int degreeOfParallelism)
    {
        if (degreeOfParallelism < 1)
            throw new ArgumentOutOfRangeException(nameof(degreeOfParallelism));
    
        if (coldTasks is ICollection<Task<TResult>>) throw new ArgumentException(
            "The enumerable should not be materialized.", nameof(coldTasks));
    
        var queue = new ConcurrentQueue<Task<TResult>>();
    
        using var enumerator = coldTasks.GetEnumerator();
        
        for (var index = 0; index < degreeOfParallelism && EnqueueNextTask(); index++) ;
    
        while (queue.TryDequeue(out var nextTask)) yield return await nextTask;
    
        bool EnqueueNextTask()
        {
            lock (enumerator)
            {
                if (!enumerator.MoveNext()) return false;
    
                var nextTask = enumerator.Current
                    .ContinueWith(t =>
                    {
                        EnqueueNextTask();
                        return t.Result;
                    });
                queue.Enqueue(nextTask);
                return true;
            }
        }
    }
    

    We use this method to generate testing tasks (borrowed from DK's answer):

    IEnumerable<Task<int>> GenerateTasks(int count)
    {
        return Enumerable.Range(1, count).Select(async n =>
        {
            Console.WriteLine($"#{n} started");
            await Task.Delay(new Random().Next(100, 1000));
            Console.WriteLine($"#{n} completed");
            return n;
        });
    }
    

    And also his(or her) test runner:

    async void Main()
    {
        await foreach (var n in ExecuteParallelAsync(GenerateTasks(9),3))
        {
            Console.WriteLine($"#{n} returned");
        }
    }
    

    And we get this result in LinqPad (which is awesome, BTW)

    #1 started
    #2 started
    #3 started
    #3 is complete
    #4 started
    #2 is complete
    #5 started
    #1 is complete
    #6 started
    #1 is returned
    #2 is returned
    #3 is returned
    #4 is complete
    #7 started
    #4 is returned
    #6 is complete
    #8 started
    #7 is complete
    #9 started
    #8 is complete
    #5 is complete
    #5 is returned
    #6 is returned
    #7 is returned
    #8 is returned
    #9 is complete
    #9 is returned
    

    Note how the next task starts as soon as any of the previous tasks completes, and how the order in which they return is still preserved.

    0 讨论(0)
  • 2021-02-09 15:32

    If I understand your question right, your focus is to launch all tasks, let them all run in parallel, but make sure the return values are processed in the same order as the tasks were launched.

    Checking out the specs, with C# 8.0 Asynchronous Streams task queuing for parallel execution but sequential return can look like this.

    /// Demonstrates Parallel Execution - Sequential Results with test tasks
    async Task RunAsyncStreams()
    {
        await foreach (var n in RunAndPreserveOrderAsync(GenerateTasks(6)))
        {
            Console.WriteLine($"#{n} is returned");
        }
    }
    
    /// Returns an enumerator that will produce a number of test tasks running
    /// for a random time.
    IEnumerable<Task<int>> GenerateTasks(int count)
    {
        return Enumerable.Range(1, count).Select(async n =>
        {
            await Task.Delay(new Random().Next(100, 1000));
            Console.WriteLine($"#{n} is complete");
            return n;
        });
    }
    
    /// Launches all tasks in order of enumeration, then waits for the results
    /// in the same order: Parallel Execution - Sequential Results.
    async IAsyncEnumerable<T> RunAndPreserveOrderAsync<T>(IEnumerable<Task<T>> tasks)
    {
        var queue = new Queue<Task<T>>(tasks);
        while (queue.Count > 0) yield return await queue.Dequeue();
    }
    

    Possible output:

    #5 is complete
    #1 is complete
    #1 is returned
    #3 is complete
    #6 is complete
    #2 is complete
    #2 is returned
    #3 is returned
    #4 is complete
    #4 is returned
    #5 is returned
    #6 is returned
    

    On a practical note, there doesn't seem to be any new language-level support for this pattern, and besides since the asynchronous streams deal with IAsyncEnumerable<T>, it means that a base Task would not work here and all the worker async methods should have the same Task<T> return type, which somewhat limits asynchronous streams-based design.

    Because of this and depending on your situation (Do you want to be able to cancel long-running tasks? Is per-task exception handling required? Should there be a limit to the number of concurrent tasks?) it might make sense to check out @TheGeneral 's suggestions up there.

    Update:

    Note that RunAndPreserveOrderAsync<T> does not necessarily have to use a Queue of tasks - this was only chosen to better show coding intentions.

    var queue = new Queue<Task<T>>(tasks);
    while (queue.Count > 0) yield return await queue.Dequeue();
    

    Converting an enumerator to List would produce the same result; the body of RunAndPreserveOrderAsync<T> can be replaced with one line here

    foreach(var task in tasks.ToList()) yield return await task;
    

    In this implementation it is important that all the tasks are generated and launched first, which is done along with Queue initialization or a conversion of tasks enumerable to List. However, it might be hard to resist simplifying the above foreach line like this

    foreach(var task in tasks) yield return await task;
    

    which would cause the tasks being executed sequentially and not running in parallel.

    0 讨论(0)
提交回复
热议问题