Split a List into smaller lists of N size

前端 未结 17 1644
后悔当初
后悔当初 2020-11-22 16:55

I am attempting to split a list into a series of smaller lists.

My Problem: My function to split lists doesn\'t split them into lists of the correct

相关标签:
17条回答
  • 2020-11-22 17:49
    public static List<List<float[]>> SplitList(List<float[]> locations, int nSize=30)  
    {        
        var list = new List<List<float[]>>(); 
    
        for (int i = 0; i < locations.Count; i += nSize) 
        { 
            list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i))); 
        } 
    
        return list; 
    } 
    

    Generic version:

    public static IEnumerable<List<T>> SplitList<T>(List<T> locations, int nSize=30)  
    {        
        for (int i = 0; i < locations.Count; i += nSize) 
        { 
            yield return locations.GetRange(i, Math.Min(nSize, locations.Count - i)); 
        }  
    } 
    
    0 讨论(0)
  • 2020-11-22 17:51

    One more

    public static IList<IList<T>> SplitList<T>(this IList<T> list, int chunkSize)
    {
        var chunks = new List<IList<T>>();
        List<T> chunk = null;
        for (var i = 0; i < list.Count; i++)
        {
            if (i % chunkSize == 0)
            {
                chunk = new List<T>(chunkSize);
                chunks.Add(chunk);
            }
            chunk.Add(list[i]);
        }
        return chunks;
    }
    
    0 讨论(0)
  • 2020-11-22 17:51
    List<int> orginalList =new List<int>(){1,2,3,4,5,6,7,8,9,10,12};
    Dictionary<int,List<int>> dic = new Dictionary <int,List<int>> ();
    int batchcount = orginalList.Count/2; //To List into two 2 parts if you 
     want three give three
    List<int> lst = new List<int>();
    for (int i=0;i<orginalList.Count; i++)
    {
    lst.Add(orginalList[i]);
    if (i % batchCount == 0 && i!=0)
    {
    Dic.Add(threadId, lst);
    lst = new List<int>();**strong text**
    threadId++;
    }
    }
    if(lst.Count>0)
    Dic.Add(threadId, lst); //in case if any dayleft 
    foreach(int BatchId in Dic.Keys)
    {
      Console.Writeline("BatchId:"+BatchId);
      Console.Writeline('Batch Count:"+Dic[BatchId].Count);
    }
    
    0 讨论(0)
  • 2020-11-22 17:53

    I find accepted answer (Serj-Tm) most robust, but I'd like to suggest a generic version.

    public static List<List<T>> splitList<T>(List<T> locations, int nSize = 30)
    {
        var list = new List<List<T>>();
    
        for (int i = 0; i < locations.Count; i += nSize)
        {
            list.Add(locations.GetRange(i, Math.Min(nSize, locations.Count - i)));
        }
    
        return list;
    }
    
    0 讨论(0)
  • 2020-11-22 17:54

    Addition after very useful comment of mhand at the end

    Original answer

    Although most solutions might work, I think they are not very efficiently. Suppose if you only want the first few items of the first few chunks. Then you wouldn't want to iterate over all (zillion) items in your sequence.

    The following will at utmost enumerate twice: once for the Take and once for the Skip. It won't enumerate over any more elements than you will use:

    public static IEnumerable<IEnumerable<TSource>> ChunkBy<TSource>
        (this IEnumerable<TSource> source, int chunkSize)
    {
        while (source.Any())                     // while there are elements left
        {   // still something to chunk:
            yield return source.Take(chunkSize); // return a chunk of chunkSize
            source = source.Skip(chunkSize);     // skip the returned chunk
        }
    }
    

    How many times will this Enumerate the sequence?

    Suppose you divide your source into chunks of chunkSize. You enumerate only the first N chunks. From every enumerated chunk you'll only enumerate the first M elements.

    While(source.Any())
    {
         ...
    }
    

    the Any will get the Enumerator, do 1 MoveNext() and returns the returned value after Disposing the Enumerator. This will be done N times

    yield return source.Take(chunkSize);
    

    According to the reference source this will do something like:

    public static IEnumerable<TSource> Take<TSource>(this IEnumerable<TSource> source, int count)
    {
        return TakeIterator<TSource>(source, count);
    }
    
    static IEnumerable<TSource> TakeIterator<TSource>(IEnumerable<TSource> source, int count)
    {
        foreach (TSource element in source)
        {
            yield return element;
            if (--count == 0) break;
        }
    }
    

    This doesn't do a lot until you start enumerating over the fetched Chunk. If you fetch several Chunks, but decide not to enumerate over the first Chunk, the foreach is not executed, as your debugger will show you.

    If you decide to take the first M elements of the first chunk then the yield return is executed exactly M times. This means:

    • get the enumerator
    • call MoveNext() and Current M times.
    • Dispose the enumerator

    After the first chunk has been yield returned, we skip this first Chunk:

    source = source.Skip(chunkSize);
    

    Once again: we'll take a look at reference source to find the skipiterator

    static IEnumerable<TSource> SkipIterator<TSource>(IEnumerable<TSource> source, int count)
    {
        using (IEnumerator<TSource> e = source.GetEnumerator()) 
        {
            while (count > 0 && e.MoveNext()) count--;
            if (count <= 0) 
            {
                while (e.MoveNext()) yield return e.Current;
            }
        }
    }
    

    As you see, the SkipIterator calls MoveNext() once for every element in the Chunk. It doesn't call Current.

    So per Chunk we see that the following is done:

    • Any(): GetEnumerator; 1 MoveNext(); Dispose Enumerator;
    • Take():

      • nothing if the content of the chunk is not enumerated.
      • If the content is enumerated: GetEnumerator(), one MoveNext and one Current per enumerated item, Dispose enumerator;

      • Skip(): for every chunk that is enumerated (NOT the contents of the chunk): GetEnumerator(), MoveNext() chunkSize times, no Current! Dispose enumerator

    If you look at what happens with the enumerator, you'll see that there are a lot of calls to MoveNext(), and only calls to Current for the TSource items you actually decide to access.

    If you take N Chunks of size chunkSize, then calls to MoveNext()

    • N times for Any()
    • not yet any time for Take, as long as you don't enumerate the Chunks
    • N times chunkSize for Skip()

    If you decide to enumerate only the first M elements of every fetched chunk, then you need to call MoveNext M times per enumerated Chunk.

    The total

    MoveNext calls: N + N*M + N*chunkSize
    Current calls: N*M; (only the items you really access)
    

    So if you decide to enumerate all elements of all chunks:

    MoveNext: numberOfChunks + all elements + all elements = about twice the sequence
    Current: every item is accessed exactly once
    

    Whether MoveNext is a lot of work or not, depends on the type of source sequence. For lists and arrays it is a simple index increment, with maybe an out of range check.

    But if your IEnumerable is the result of a database query, make sure that the data is really materialized on your computer, otherwise the data will be fetched several times. DbContext and Dapper will properly transfer the data to local process before it can be accessed. If you enumerate the same sequence several times it is not fetched several times. Dapper returns an object that is a List, DbContext remembers that the data is already fetched.

    It depends on your Repository whether it is wise to call AsEnumerable() or ToLists() before you start to divide the items in Chunks

    0 讨论(0)
  • 2020-11-22 17:54
    public static IEnumerable<IEnumerable<T>> SplitIntoSets<T>
        (this IEnumerable<T> source, int itemsPerSet) 
    {
        var sourceList = source as List<T> ?? source.ToList();
        for (var index = 0; index < sourceList.Count; index += itemsPerSet)
        {
            yield return sourceList.Skip(index).Take(itemsPerSet);
        }
    }
    
    0 讨论(0)
提交回复
热议问题