Create batches in linq

前端 未结 16 1866
傲寒
傲寒 2020-11-22 02:50

Can someone suggest a way to create batches of a certain size in linq?

Ideally I want to be able to perform operations in chunks of some configurable amount.

相关标签:
16条回答
  • 2020-11-22 03:27

    I know everybody used complex systems to do this work, and I really don't get it why. Take and skip will allow all those operations using the common select with Func<TSource,Int32,TResult> transform function. Like:

    public IEnumerable<IEnumerable<T>> Buffer<T>(IEnumerable<T> source, int size)=>
        source.Select((item, index) => source.Skip(size * index).Take(size)).TakeWhile(bucket => bucket.Any());
    
    0 讨论(0)
  • 2020-11-22 03:28

    You don't need to write any code. Use MoreLINQ Batch method, which batches the source sequence into sized buckets (MoreLINQ is available as a NuGet package you can install):

    int size = 10;
    var batches = sequence.Batch(size);
    

    Which is implemented as:

    public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
                      this IEnumerable<TSource> source, int size)
    {
        TSource[] bucket = null;
        var count = 0;
    
        foreach (var item in source)
        {
            if (bucket == null)
                bucket = new TSource[size];
    
            bucket[count++] = item;
            if (count != size)
                continue;
    
            yield return bucket;
    
            bucket = null;
            count = 0;
        }
    
        if (bucket != null && count > 0)
            yield return bucket.Take(count).ToArray();
    }
    
    0 讨论(0)
  • 2020-11-22 03:28

    I wonder why nobody has ever posted an old school for-loop solution. Here is one:

    List<int> source = Enumerable.Range(1,23).ToList();
    int batchsize = 10;
    for (int i = 0; i < source.Count; i+= batchsize)
    {
        var batch = source.Skip(i).Take(batchsize);
    }
    

    This simplicity is possible because the Take method:

    ... enumerates source and yields elements until count elements have been yielded or source contains no more elements. If count exceeds the number of elements in source, all elements of source are returned

    Disclaimer:

    Using Skip and Take inside the loop means that the enumerable will be enumerated multiple times. This is dangerous if the enumerable is deferred. It may result in multiple executions of a database query, or a web request, or a file read. This example is explicitly for the usage of a List which is not deferred, so it is less of a problem. It is still a slow solution since skip will enumerate the collection each time it is called.

    This can also be solved using the GetRange method, but it requires an extra calculation to extract a possible rest batch:

    for (int i = 0; i < source.Count; i += batchsize)
    {
        int remaining = source.Count - i;
        var batch = remaining > batchsize  ? source.GetRange(i, batchsize) : source.GetRange(i, remaining);
    }
    

    Here is a third way to handle this, which works with 2 loops. This ensures that the collection is enumerated only 1 time!:

    int batchsize = 10;
    List<int> batch = new List<int>(batchsize);
    
    for (int i = 0; i < source.Count; i += batchsize)
    {
        // calculated the remaining items to avoid an OutOfRangeException
        batchsize = source.Count - i > batchsize ? batchsize : source.Count - i;
        for (int j = i; j < i + batchsize; j++)
        {
            batch.Add(source[j]);
        }           
        batch.Clear();
    }
    
    0 讨论(0)
  • 2020-11-22 03:29

    Same approach as MoreLINQ, but using List instead of Array. I haven't done benchmarking, but readability matters more to some people:

        public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> source, int size)
        {
            List<T> batch = new List<T>();
    
            foreach (var item in source)
            {
                batch.Add(item);
    
                if (batch.Count >= size)
                {
                    yield return batch;
                    batch.Clear();
                }
            }
    
            if (batch.Count > 0)
            {
                yield return batch;
            }
        }
    
    0 讨论(0)
  • 2020-11-22 03:32
    public static class MyExtensions
    {
        public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> items,
                                                           int maxItems)
        {
            return items.Select((item, inx) => new { item, inx })
                        .GroupBy(x => x.inx / maxItems)
                        .Select(g => g.Select(x => x.item));
        }
    }
    

    and the usage would be:

    List<int> list = new List<int>() { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    
    foreach(var batch in list.Batch(3))
    {
        Console.WriteLine(String.Join(",",batch));
    }
    

    OUTPUT:

    0,1,2
    3,4,5
    6,7,8
    9
    
    0 讨论(0)
  • 2020-11-22 03:33

    I wrote a custom IEnumerable implementation that works without linq and guarantees a single enumeration over the data. It also accomplishes all this without requiring backing lists or arrays that cause memory explosions over large data sets.

    Here are some basic tests:

        [Fact]
        public void ShouldPartition()
        {
            var ints = new List<int> {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
            var data = ints.PartitionByMaxGroupSize(3);
            data.Count().Should().Be(4);
    
            data.Skip(0).First().Count().Should().Be(3);
            data.Skip(0).First().ToList()[0].Should().Be(0);
            data.Skip(0).First().ToList()[1].Should().Be(1);
            data.Skip(0).First().ToList()[2].Should().Be(2);
    
            data.Skip(1).First().Count().Should().Be(3);
            data.Skip(1).First().ToList()[0].Should().Be(3);
            data.Skip(1).First().ToList()[1].Should().Be(4);
            data.Skip(1).First().ToList()[2].Should().Be(5);
    
            data.Skip(2).First().Count().Should().Be(3);
            data.Skip(2).First().ToList()[0].Should().Be(6);
            data.Skip(2).First().ToList()[1].Should().Be(7);
            data.Skip(2).First().ToList()[2].Should().Be(8);
    
            data.Skip(3).First().Count().Should().Be(1);
            data.Skip(3).First().ToList()[0].Should().Be(9);
        }
    

    The Extension Method to partition the data.

    /// <summary>
    /// A set of extension methods for <see cref="IEnumerable{T}"/>. 
    /// </summary>
    public static class EnumerableExtender
    {
        /// <summary>
        /// Splits an enumerable into chucks, by a maximum group size.
        /// </summary>
        /// <param name="source">The source to split</param>
        /// <param name="maxSize">The maximum number of items per group.</param>
        /// <typeparam name="T">The type of item to split</typeparam>
        /// <returns>A list of lists of the original items.</returns>
        public static IEnumerable<IEnumerable<T>> PartitionByMaxGroupSize<T>(this IEnumerable<T> source, int maxSize)
        {
            return new SplittingEnumerable<T>(source, maxSize);
        }
    }
    

    This is the implementing class

        using System.Collections;
        using System.Collections.Generic;
    
        internal class SplittingEnumerable<T> : IEnumerable<IEnumerable<T>>
        {
            private readonly IEnumerable<T> backing;
            private readonly int maxSize;
            private bool hasCurrent;
            private T lastItem;
    
            public SplittingEnumerable(IEnumerable<T> backing, int maxSize)
            {
                this.backing = backing;
                this.maxSize = maxSize;
            }
    
            public IEnumerator<IEnumerable<T>> GetEnumerator()
            {
                return new Enumerator(this, this.backing.GetEnumerator());
            }
    
            IEnumerator IEnumerable.GetEnumerator()
            {
                return this.GetEnumerator();
            }
    
            private class Enumerator : IEnumerator<IEnumerable<T>>
            {
                private readonly SplittingEnumerable<T> parent;
                private readonly IEnumerator<T> backingEnumerator;
                private NextEnumerable current;
    
                public Enumerator(SplittingEnumerable<T> parent, IEnumerator<T> backingEnumerator)
                {
                    this.parent = parent;
                    this.backingEnumerator = backingEnumerator;
                    this.parent.hasCurrent = this.backingEnumerator.MoveNext();
                    if (this.parent.hasCurrent)
                    {
                        this.parent.lastItem = this.backingEnumerator.Current;
                    }
                }
    
                public bool MoveNext()
                {
                    if (this.current == null)
                    {
                        this.current = new NextEnumerable(this.parent, this.backingEnumerator);
                        return true;
                    }
                    else
                    {
                        if (!this.current.IsComplete)
                        {
                            using (var enumerator = this.current.GetEnumerator())
                            {
                                while (enumerator.MoveNext())
                                {
                                }
                            }
                        }
                    }
    
                    if (!this.parent.hasCurrent)
                    {
                        return false;
                    }
    
                    this.current = new NextEnumerable(this.parent, this.backingEnumerator);
                    return true;
                }
    
                public void Reset()
                {
                    throw new System.NotImplementedException();
                }
    
                public IEnumerable<T> Current
                {
                    get { return this.current; }
                }
    
                object IEnumerator.Current
                {
                    get { return this.Current; }
                }
    
                public void Dispose()
                {
                }
            }
    
            private class NextEnumerable : IEnumerable<T>
            {
                private readonly SplittingEnumerable<T> splitter;
                private readonly IEnumerator<T> backingEnumerator;
                private int currentSize;
    
                public NextEnumerable(SplittingEnumerable<T> splitter, IEnumerator<T> backingEnumerator)
                {
                    this.splitter = splitter;
                    this.backingEnumerator = backingEnumerator;
                }
    
                public bool IsComplete { get; private set; }
    
                public IEnumerator<T> GetEnumerator()
                {
                    return new NextEnumerator(this.splitter, this, this.backingEnumerator);
                }
    
                IEnumerator IEnumerable.GetEnumerator()
                {
                    return this.GetEnumerator();
                }
    
                private class NextEnumerator : IEnumerator<T>
                {
                    private readonly SplittingEnumerable<T> splitter;
                    private readonly NextEnumerable parent;
                    private readonly IEnumerator<T> enumerator;
                    private T currentItem;
    
                    public NextEnumerator(SplittingEnumerable<T> splitter, NextEnumerable parent, IEnumerator<T> enumerator)
                    {
                        this.splitter = splitter;
                        this.parent = parent;
                        this.enumerator = enumerator;
                    }
    
                    public bool MoveNext()
                    {
                        this.parent.currentSize += 1;
                        this.currentItem = this.splitter.lastItem;
                        var hasCcurent = this.splitter.hasCurrent;
    
                        this.parent.IsComplete = this.parent.currentSize > this.splitter.maxSize;
    
                        if (this.parent.IsComplete)
                        {
                            return false;
                        }
    
                        if (hasCcurent)
                        {
                            var result = this.enumerator.MoveNext();
    
                            this.splitter.lastItem = this.enumerator.Current;
                            this.splitter.hasCurrent = result;
                        }
    
                        return hasCcurent;
                    }
    
                    public void Reset()
                    {
                        throw new System.NotImplementedException();
                    }
    
                    public T Current
                    {
                        get { return this.currentItem; }
                    }
    
                    object IEnumerator.Current
                    {
                        get { return this.Current; }
                    }
    
                    public void Dispose()
                    {
                    }
                }
            }
        }
    
    0 讨论(0)
提交回复
热议问题