nest yields to return IEnumerable> with lazy evaluation

前端 未结 4 2073
忘掉有多难
忘掉有多难 2021-02-13 16:40

I wrote a LINQ extension method SplitBetween analogous to String.Split.

> new List(){3,4,2,21,3,2,17,16,1}
> .SplitBet         


        
相关标签:
4条回答
  • 2021-02-13 17:00

    Here is a solution that I suppose does what you ask for.

    The problem was that you only had one method with yield and you were manually creating the internal collection while the outer IEnumerable was enumerated. The second problem was that your way of "testing" fails even on mine code below. However, as David B pointed out in his comment, you must go through all the elements to define number of elements of outer IEnumerable. But you can defer creation and population of inner IEnumerables.

    public static IEnumerable<IEnumerable<T>> SplitBetween<T>(this IEnumerable<T> source,Func<T,bool> separatorSelector, bool includeSeparators=false)
    {
        IList<T> sourceList = source.ToList();
        var indexStart = 0;
        var indexOfLastElement = sourceList.Count - 1;
        for(int i = 0; i <= indexOfLastElement; i++)
            if (separatorSelector(sourceList[i]))
            {
                if(includeSeparators)
                    yield return SplitBetweenInner(sourceList, indexStart, i);
                else
                    yield return SplitBetweenInner(sourceList, indexStart, i - 1);
    
                indexStart = i + 1;
            }
            else if(i == indexOfLastElement)
                yield return SplitBetweenInner(sourceList, indexStart, i);        
    }
    
    private static IEnumerable<T> SplitBetweenInner<T>(IList<T> source, int startIndex, int endIndex)
    {
        //throw new Exception("BOOM");
        for(int i = startIndex; i <= endIndex; i++)
            yield return source[i];
    }
    

    Note that it behaves slightly different as your code (it doesn't create another empty List when the last element satisfies the separator condition - it's up to definition what is correct here, but I find that better as the behavior is the same as if the element appears at the beginning of source list)

    If you test the code, you will see that the inner IEnumerable execution is deferred.

    If the throw exception line is uncommented:

    (new List<int>(){3,4,2,21,3,2,17,16,1}).SplitBetween(x=>x>=10, true).Count();
    

    returns 4

    (new List<int>(){3,4,2,21,3,2,17,16,1}).SplitBetween(x=>x>=10, true).First().Count();
    

    throws BOOM

    0 讨论(0)
  • 2021-02-13 17:05

    Edit: There is nothing wrong with your approach, except that a throwing enumerable will really "boom" when you enumerate it. Thats what's its meant for. It doesn't have a proper GetEnumerator defined on it. So your code exhibits no real problem. In the first case by doing First, you're only enumerating till the first result set (just { 1, 2, 3 } ) is obtained and not enumerating the throwing enumerable (which means Concat is not being executed). But in the second example, you're asking for element at 2 after the split, which means it will enumerate the throwing enumerable too and will go "boom". The key here is to understand ElementAt enumerates the collection till the index asked to and is not inherently lazy (it cant be).

    I'm not sure if fully lazy is the way to go here. The problem is that the whole process of splitting lazily into outer and inner sequences runs on one enumerator which can yield different results depending on enumerator state. For instance you enumerate only the outer sequence, the inner sequences no longer will be what you expect. Or if you enumerate only half the outer sequence and one inner sequence, what will be the state of other inner sequences? Your approach is the best.

    The below approach is lazy (still will boom since that's warranted) in that it uses no intermediate concrete implementations, but can be slower than your original approach because it traverses the list more than once:

    public static IEnumerable<IEnumerable<T>> SplitBy<T>(this IEnumerable<T> source, 
                                                         Func<T, bool> separatorPredicate, 
                                                         bool includeEmptyEntries = false,
                                                         bool includeSeparators = false)
    {
        int prevIndex = 0;
        int lastIndex = 0;
        var query = source.Select((t, index) => { lastIndex = index; return new { t, index }; })
                          .Where(a => separatorPredicate(a.t));
        foreach (var item in query)
        {
            if (item.index == prevIndex && !includeEmptyEntries)
            {
                prevIndex++;
                continue;
            }
    
            yield return source.Skip(prevIndex)
                               .Take(item.index - prevIndex + (!includeSeparators ? 0 : 1));
            prevIndex = item.index + 1;
        }
    
        if (prevIndex <= lastIndex)
            yield return source.Skip(prevIndex);
    }
    

    Over all your original approach is the best. If you need something fully lazy, then my below answer fits. Mind you its only meant for things like:

    foreach (var inners in outer)
        foreach (var item in inners)
        { 
        }
    

    and not

    var outer = sequence.Split;
    var inner1 = outer.First;
    var inner2 = outer.ElementAt; //etc
    

    In other words, not fit for multiple iterations on the same inner sequence. If you are fully aware of this dangerous construct:


    Original answer:

    This uses no intermediate concrete collections, no ToList on source enumerable, and is fully lazy/iterator-ish:

    public static IEnumerable<IEnumerable<T>> SplitBy<T>(this IEnumerable<T> source,
                                                         Func<T, bool> separatorPredicate,
                                                         bool includeEmptyEntries = false,
                                                         bool includeSeparator = false)
    {
        using (var x = source.GetEnumerator())
            while (x.MoveNext())
                if (!separatorPredicate(x.Current))
                    yield return x.YieldTill(separatorPredicate, includeSeparator);
                else if (includeEmptyEntries)
                {
                    if (includeSeparator)
                        yield return Enumerable.Repeat(x.Current, 1);
                    else
                        yield return Enumerable.Empty<T>();
                }
    }
    
    static IEnumerable<T> YieldTill<T>(this IEnumerator<T> x, 
                                       Func<T, bool> separatorPredicate,
                                       bool includeSeparator)
    {
        yield return x.Current;
    
        while (x.MoveNext())
            if (!separatorPredicate(x.Current))
                yield return x.Current;
            else
            {
                if (includeSeparator)
                    yield return x.Current;
                yield break;
            }
    }
    

    Short, sweet and simple. I have added an additional flag to denote if you want to return empty sets (by default it ignores). Without that flag, the code is even more concise.

    Thanks for this question, this will be there in my extension methods library! :)

    0 讨论(0)
  • 2021-02-13 17:15

    This one won't use List<>, and won't go BOOM.

    public static IEnumerable<IEnumerable<T>> SplitBetween<T>(this IEnumerable<T> source,
                                                              Func<T,bool> separatorSelector, 
                                                              bool includeSeparators=false) 
    {
        if (source == null)
            throw new ArgumentNullException("source");
    
        return SplitBetweenImpl(source, separatorSelector, includeSeparators);
    }
    
    private static IEnumerable<T> SplitBetweenInner<T>(IEnumerator<T> e,
                                                       Func<T,bool> separatorSelector)
    {
        var first = true;
    
        while(first || e.MoveNext())
        {
            if (separatorSelector((T)e.Current))
                yield break;    
    
            first = false;
            yield return e.Current;
        }
    }
    
    private static IEnumerable<IEnumerable<T>> SplitBetweenImpl<T>(this IEnumerable<T> source,
                                                                   Func<T,bool> separatorSelector, 
                                                                   bool includeSeparators) 
    {
        using (var e = source.GetEnumerator())
            while(e.MoveNext())
            {
                if (separatorSelector((T)e.Current) && includeSeparators)
                    yield return new T[] {(T)e.Current};
                else
                    {
                    yield return SplitBetweenInner(e, separatorSelector);
                    if (separatorSelector((T)e.Current) && includeSeparators)
                        yield return new T[] {(T)e.Current};
                    }
            }
    }
    

    Test:

    void Main()
    {
        var list = new List<int>(){1, 2, 3, 10, 1};
        foreach(var col in list.Concat(Ext.ThrowingEnumerable<int>())
                               .SplitBetween<int>(x=>x>=10).Take(1))
        {
            Console.WriteLine("------");
            foreach(var i in col)
                Console.WriteLine(i);
        }
    }
    

    Output:

    ------
    1
    2
    3
    

    Test2

    var list = new List<int>(){1, 2, 3, 10, 1}
    foreach(var col in list.Concat(Ext.ThrowingEnumerable<int>())
                           .SplitBetween<int>(x=>x>=10).Take(2))
    

    Output:

    ------
    1
    2
    3
    ------
    1
    *Exception*
    

    Here, the exception is caused because the first element of the ThrowingEnumerable-enumeration would go into the same group as the 1.


    Test3:

    var list = new List<int>(){1, 2, 3, 10, 1, 17};
    foreach(var col in list.Concat(Ext.ThrowingEnumerable<int>())
                           .SplitBetween<int>(x=>x>=10, true).Take(4))
    

    Output:

    ------
    1
    2
    3
    ------
    10
    ------
    1
    ------
    17
    

    No problem here, because the Exception element would go into it's own group, and thus is not iterated over due to Take(4):

    0 讨论(0)
  • 2021-02-13 17:17
      public static IEnumerable<IEnumerable<T>> SplitBetween<T>(this IEnumerable<T> source, Func<T, bool> separatorSelector, bool includeSeparators = false)
      {
        var state = new SharedState<T>(source, separatorSelector, includeSeparators);
        state.LastList = state.NewList  = new InnerList<T>(state, 0);
        for (; ; )
        {
          if (state.NewList != null)
          {
            var newList = state.NewList;
            state.NewList = null;
            yield return newList.Items();            
          }
          else if (state.IsEnd)
            break;
          else
            state.CheckNext();
        }
      }
      class SharedState<T>
      {
        public SharedState(IEnumerable<T> source, Func<T, bool> separatorSelector, bool includeSeparators)
        {
          this.source = source;
          this.separatorSelector = separatorSelector;
          this.includeSeparators = includeSeparators;
    
          this.iterator = source.GetEnumerator();
    
          this.data = source as IList<T>;
          if (data == null)
          {
            cache = new List<T>();
            data = cache;
          }
        }
        public readonly IEnumerable<T> source;
        readonly IEnumerator<T> iterator; 
        public readonly IList<T> data;
        readonly List<T> cache;
        public readonly Func<T, bool> separatorSelector;
        public readonly bool includeSeparators;
        public int WaveIndex = -1;
        public bool IsEnd = false;
        public InnerList<T> NewList;
        public InnerList<T> LastList;
    
        public void CheckNext()
        {
          WaveIndex++;
          if (!iterator.MoveNext())
          {
            if (LastList.LastIndex == null)
              LastList.LastIndex = WaveIndex;
            IsEnd = true;
          }
          else
          {
            var item = iterator.Current;
            if (cache != null)
              cache.Add(item);
            if (separatorSelector(item))
            {
              LastList.LastIndex = includeSeparators ? WaveIndex + 1 : WaveIndex;
              LastList = NewList = new InnerList<T>(this, WaveIndex + 1);
            }
          }
        }
      }
    
      class InnerList<T>
      {
        public InnerList(SharedState<T> state, int startIndex)
        {
          this.state = state;
          this.StartIndex = startIndex;
        }
        readonly SharedState<T> state;
        public readonly int StartIndex;
        public int? LastIndex;
    
        public IEnumerable<T> Items()
        {
          for (var i = StartIndex; ; ++i)
          {
            if (LastIndex != null && i >= LastIndex)
              break;
            if (i >= state.WaveIndex)
              state.CheckNext();
            if (LastIndex == null || i < LastIndex)
              yield return state.data[i];
          }
        }
      }
    
    0 讨论(0)
提交回复
热议问题