nest yields to return IEnumerable> with lazy evaluation

前端 未结 4 2083
忘掉有多难
忘掉有多难 2021-02-13 16:40

I wrote a LINQ extension method SplitBetween analogous to String.Split.

> new List(){3,4,2,21,3,2,17,16,1}
> .SplitBet         


        
4条回答
  •  南旧
    南旧 (楼主)
    2021-02-13 17:05

    Edit: There is nothing wrong with your approach, except that a throwing enumerable will really "boom" when you enumerate it. Thats what's its meant for. It doesn't have a proper GetEnumerator defined on it. So your code exhibits no real problem. In the first case by doing First, you're only enumerating till the first result set (just { 1, 2, 3 } ) is obtained and not enumerating the throwing enumerable (which means Concat is not being executed). But in the second example, you're asking for element at 2 after the split, which means it will enumerate the throwing enumerable too and will go "boom". The key here is to understand ElementAt enumerates the collection till the index asked to and is not inherently lazy (it cant be).

    I'm not sure if fully lazy is the way to go here. The problem is that the whole process of splitting lazily into outer and inner sequences runs on one enumerator which can yield different results depending on enumerator state. For instance you enumerate only the outer sequence, the inner sequences no longer will be what you expect. Or if you enumerate only half the outer sequence and one inner sequence, what will be the state of other inner sequences? Your approach is the best.

    The below approach is lazy (still will boom since that's warranted) in that it uses no intermediate concrete implementations, but can be slower than your original approach because it traverses the list more than once:

    public static IEnumerable> SplitBy(this IEnumerable source, 
                                                         Func separatorPredicate, 
                                                         bool includeEmptyEntries = false,
                                                         bool includeSeparators = false)
    {
        int prevIndex = 0;
        int lastIndex = 0;
        var query = source.Select((t, index) => { lastIndex = index; return new { t, index }; })
                          .Where(a => separatorPredicate(a.t));
        foreach (var item in query)
        {
            if (item.index == prevIndex && !includeEmptyEntries)
            {
                prevIndex++;
                continue;
            }
    
            yield return source.Skip(prevIndex)
                               .Take(item.index - prevIndex + (!includeSeparators ? 0 : 1));
            prevIndex = item.index + 1;
        }
    
        if (prevIndex <= lastIndex)
            yield return source.Skip(prevIndex);
    }
    

    Over all your original approach is the best. If you need something fully lazy, then my below answer fits. Mind you its only meant for things like:

    foreach (var inners in outer)
        foreach (var item in inners)
        { 
        }
    

    and not

    var outer = sequence.Split;
    var inner1 = outer.First;
    var inner2 = outer.ElementAt; //etc
    

    In other words, not fit for multiple iterations on the same inner sequence. If you are fully aware of this dangerous construct:


    Original answer:

    This uses no intermediate concrete collections, no ToList on source enumerable, and is fully lazy/iterator-ish:

    public static IEnumerable> SplitBy(this IEnumerable source,
                                                         Func separatorPredicate,
                                                         bool includeEmptyEntries = false,
                                                         bool includeSeparator = false)
    {
        using (var x = source.GetEnumerator())
            while (x.MoveNext())
                if (!separatorPredicate(x.Current))
                    yield return x.YieldTill(separatorPredicate, includeSeparator);
                else if (includeEmptyEntries)
                {
                    if (includeSeparator)
                        yield return Enumerable.Repeat(x.Current, 1);
                    else
                        yield return Enumerable.Empty();
                }
    }
    
    static IEnumerable YieldTill(this IEnumerator x, 
                                       Func separatorPredicate,
                                       bool includeSeparator)
    {
        yield return x.Current;
    
        while (x.MoveNext())
            if (!separatorPredicate(x.Current))
                yield return x.Current;
            else
            {
                if (includeSeparator)
                    yield return x.Current;
                yield break;
            }
    }
    

    Short, sweet and simple. I have added an additional flag to denote if you want to return empty sets (by default it ignores). Without that flag, the code is even more concise.

    Thanks for this question, this will be there in my extension methods library! :)

提交回复
热议问题