Can someone suggest a way to create batches of a certain size in linq?
Ideally I want to be able to perform operations in chunks of some configurable amount.
If you start with sequence
defined as an IEnumerable<T>
, and you know that it can safely be enumerated multiple times (e.g. because it is an array or a list), you can just use this simple pattern to process the elements in batches:
while (sequence.Any())
{
var batch = sequence.Take(10);
sequence = sequence.Skip(10);
// do whatever you need to do with each batch here
}
Here is an attempted improvement of Nick Whaley's (link) and infogulch's (link) lazy Batch
implementations. This one is strict. You either enumerate the batches in the correct order, or you get an exception.
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
this IEnumerable<TSource> source, int size)
{
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
using (var enumerator = source.GetEnumerator())
{
int i = 0;
while (enumerator.MoveNext())
{
if (i % size != 0) throw new InvalidOperationException(
"The enumeration is out of order.");
i++;
yield return GetBatch();
}
IEnumerable<TSource> GetBatch()
{
while (true)
{
yield return enumerator.Current;
if (i % size == 0 || !enumerator.MoveNext()) break;
i++;
}
}
}
}
And here is a lazy Batch
implementation for sources of type IList<T>
. This one imposes no restrictions on the enumeration. The batches can be enumerated partially, in any order, and more than once. The restriction of not modifying the collection during the enumeration is still in place though. This is achieved by making a dummy call to enumerator.MoveNext()
before yielding any chunk or element. The downside is that the enumerator is left undisposed, since it is unknown when the enumeration is going to finish.
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
this IList<TSource> source, int size)
{
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
var enumerator = source.GetEnumerator();
for (int i = 0; i < source.Count; i += size)
{
enumerator.MoveNext();
yield return GetChunk(i, Math.Min(i + size, source.Count));
}
IEnumerable<TSource> GetChunk(int from, int toExclusive)
{
for (int j = from; j < toExclusive; j++)
{
enumerator.MoveNext();
yield return source[j];
}
}
}
Another way is using Rx Buffer operator
//using System.Linq;
//using System.Reactive.Linq;
//using System.Reactive.Threading.Tasks;
var observableBatches = anAnumerable.ToObservable().Buffer(size);
var batches = aList.ToObservable().Buffer(size).ToList().ToTask().GetAwaiter().GetResult();
An easy version to use and understand.
public static List<List<T>> chunkList<T>(List<T> listToChunk, int batchSize)
{
List<List<T>> batches = new List<List<T>>();
if (listToChunk.Count == 0) return batches;
bool moreRecords = true;
int fromRecord = 0;
int countRange = 0;
if (listToChunk.Count >= batchSize)
{
countRange = batchSize;
}
else
{
countRange = listToChunk.Count;
}
while (moreRecords)
{
List<T> batch = listToChunk.GetRange(fromRecord, countRange);
batches.Add(batch);
if ((fromRecord + batchSize) >= listToChunk.Count)
{
moreRecords = false;
}
fromRecord = fromRecord + batch.Count;
if ((fromRecord + batchSize) > listToChunk.Count)
{
countRange = listToChunk.Count - fromRecord;
}
else
{
countRange = batchSize;
}
}
return batches;
}