Split List into Sublists with LINQ

前端 未结 30 2384
灰色年华
灰色年华 2020-11-21 06:26

Is there any way I can separate a List into several separate lists of SomeObject, using the item index as the delimiter of each s

相关标签:
30条回答
  • 2020-11-21 06:43

    You could use a number of queries that use Take and Skip, but that would add too many iterations on the original list, I believe.

    Rather, I think you should create an iterator of your own, like so:

    public static IEnumerable<IEnumerable<T>> GetEnumerableOfEnumerables<T>(
      IEnumerable<T> enumerable, int groupSize)
    {
       // The list to return.
       List<T> list = new List<T>(groupSize);
    
       // Cycle through all of the items.
       foreach (T item in enumerable)
       {
         // Add the item.
         list.Add(item);
    
         // If the list has the number of elements, return that.
         if (list.Count == groupSize)
         {
           // Return the list.
           yield return list;
    
           // Set the list to a new list.
           list = new List<T>(groupSize);
         }
       }
    
       // Return the remainder if there is any,
       if (list.Count != 0)
       {
         // Return the list.
         yield return list;
       }
    }
    

    You can then call this and it is LINQ enabled so you can perform other operations on the resulting sequences.


    In light of Sam's answer, I felt there was an easier way to do this without:

    • Iterating through the list again (which I didn't do originally)
    • Materializing the items in groups before releasing the chunk (for large chunks of items, there would be memory issues)
    • All of the code that Sam posted

    That said, here's another pass, which I've codified in an extension method to IEnumerable<T> called Chunk:

    public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, 
        int chunkSize)
    {
        // Validate parameters.
        if (source == null) throw new ArgumentNullException(nameof(source));
        if (chunkSize <= 0) throw new ArgumentOutOfRangeException(nameof(chunkSize),
            "The chunkSize parameter must be a positive value.");
    
        // Call the internal implementation.
        return source.ChunkInternal(chunkSize);
    }
    

    Nothing surprising up there, just basic error checking.

    Moving on to ChunkInternal:

    private static IEnumerable<IEnumerable<T>> ChunkInternal<T>(
        this IEnumerable<T> source, int chunkSize)
    {
        // Validate parameters.
        Debug.Assert(source != null);
        Debug.Assert(chunkSize > 0);
    
        // Get the enumerator.  Dispose of when done.
        using (IEnumerator<T> enumerator = source.GetEnumerator())
        do
        {
            // Move to the next element.  If there's nothing left
            // then get out.
            if (!enumerator.MoveNext()) yield break;
    
            // Return the chunked sequence.
            yield return ChunkSequence(enumerator, chunkSize);
        } while (true);
    }
    

    Basically, it gets the IEnumerator<T> and manually iterates through each item. It checks to see if there any items currently to be enumerated. After each chunk is enumerated through, if there aren't any items left, it breaks out.

    Once it detects there are items in the sequence, it delegates the responsibility for the inner IEnumerable<T> implementation to ChunkSequence:

    private static IEnumerable<T> ChunkSequence<T>(IEnumerator<T> enumerator, 
        int chunkSize)
    {
        // Validate parameters.
        Debug.Assert(enumerator != null);
        Debug.Assert(chunkSize > 0);
    
        // The count.
        int count = 0;
    
        // There is at least one item.  Yield and then continue.
        do
        {
            // Yield the item.
            yield return enumerator.Current;
        } while (++count < chunkSize && enumerator.MoveNext());
    }
    

    Since MoveNext was already called on the IEnumerator<T> passed to ChunkSequence, it yields the item returned by Current and then increments the count, making sure never to return more than chunkSize items and moving to the next item in the sequence after every iteration (but short-circuited if the number of items yielded exceeds the chunk size).

    If there are no items left, then the InternalChunk method will make another pass in the outer loop, but when MoveNext is called a second time, it will still return false, as per the documentation (emphasis mine):

    If MoveNext passes the end of the collection, the enumerator is positioned after the last element in the collection and MoveNext returns false. When the enumerator is at this position, subsequent calls to MoveNext also return false until Reset is called.

    At this point, the loop will break, and the sequence of sequences will terminate.

    This is a simple test:

    static void Main()
    {
        string s = "agewpsqfxyimc";
    
        int count = 0;
    
        // Group by three.
        foreach (IEnumerable<char> g in s.Chunk(3))
        {
            // Print out the group.
            Console.Write("Group: {0} - ", ++count);
    
            // Print the items.
            foreach (char c in g)
            {
                // Print the item.
                Console.Write(c + ", ");
            }
    
            // Finish the line.
            Console.WriteLine();
        }
    }
    

    Output:

    Group: 1 - a, g, e,
    Group: 2 - w, p, s,
    Group: 3 - q, f, x,
    Group: 4 - y, i, m,
    Group: 5 - c,
    

    An important note, this will not work if you don't drain the entire child sequence or break at any point in the parent sequence. This is an important caveat, but if your use case is that you will consume every element of the sequence of sequences, then this will work for you.

    Additionally, it will do strange things if you play with the order, just as Sam's did at one point.

    0 讨论(0)
  • 2020-11-21 06:43

    This is an old question but this is what I ended up with; it enumerates the enumerable only once, but does create lists for each of the partitions. It doesn't suffer from unexpected behavior when ToArray() is called as some of the implementations do:

        public static IEnumerable<IEnumerable<T>> Partition<T>(IEnumerable<T> source, int chunkSize)
        {
            if (source == null)
            {
                throw new ArgumentNullException("source");
            }
    
            if (chunkSize < 1)
            {
                throw new ArgumentException("Invalid chunkSize: " + chunkSize);
            }
    
            using (IEnumerator<T> sourceEnumerator = source.GetEnumerator())
            {
                IList<T> currentChunk = new List<T>();
                while (sourceEnumerator.MoveNext())
                {
                    currentChunk.Add(sourceEnumerator.Current);
                    if (currentChunk.Count == chunkSize)
                    {
                        yield return currentChunk;
                        currentChunk = new List<T>();
                    }
                }
    
                if (currentChunk.Any())
                {
                    yield return currentChunk;
                }
            }
        }
    
    0 讨论(0)
  • 2020-11-21 06:46

    This question is a bit old, but I just wrote this, and I think it's a little more elegant than the other proposed solutions:

    /// <summary>
    /// Break a list of items into chunks of a specific size
    /// </summary>
    public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, int chunksize)
    {
        while (source.Any())
        {
            yield return source.Take(chunksize);
            source = source.Skip(chunksize);
        }
    }
    
    0 讨论(0)
  • 2020-11-21 06:46

    System.Interactive provides Buffer() for this purpose. Some quick testing shows performance is similar to Sam's solution.

    0 讨论(0)
  • 2020-11-21 06:46

    It's an old solution but I had a different approach. I use Skip to move to desired offset and Take to extract desired number of elements:

    public static IEnumerable<IEnumerable<T>> Chunk<T>(this IEnumerable<T> source, 
                                                       int chunkSize)
    {
        if (chunkSize <= 0)
            throw new ArgumentOutOfRangeException($"{nameof(chunkSize)} should be > 0");
    
        var nbChunks = (int)Math.Ceiling((double)source.Count()/chunkSize);
    
        return Enumerable.Range(0, nbChunks)
                         .Select(chunkNb => source.Skip(chunkNb*chunkSize)
                         .Take(chunkSize));
    }
    
    0 讨论(0)
  • 2020-11-21 06:47

    We found David B's solution worked the best. But we adapted it to a more general solution:

    list.GroupBy(item => item.SomeProperty) 
       .Select(group => new List<T>(group)) 
       .ToArray();
    
    0 讨论(0)
提交回复
热议问题