Except has similar effect to Distinct?

后端 未结 3 543
说谎
说谎 2020-11-27 07:26

I just discovered that Except() will remove all elements in the second list from the first, but it also has the effect that it makes all elements in the returne

相关标签:
3条回答
  • 2020-11-27 07:59

    You wrote:

    Simple way around I am using is Where(v => !secondList.Contains(v))

    When you do this, there is still Distict done with secondList.

    For example:

    var firstStrings = new [] { "1", null, null, null, "3", "3" };
    var secondStrings = new [] { "1", "1", "1", null, null, "4" };
    var resultStrings = firstStrings.Where(v => !secondStrings.Contains(v)); // 3, 3  
    

    I created an extension method to have no distinct at all. Examle of usage:

    var result2Strings = firstStrings.ExceptAll(secondStrings).ToList(); // null, 3, 3
    

    This is what it does:

    enter image description here

    This is the source:

    public static IEnumerable<TSource> ExceptAll<TSource>(
        this IEnumerable<TSource> first,
        IEnumerable<TSource> second)
    {
        // Do not call reuse the overload method because that is a slower imlementation
        if (first == null) { throw new ArgumentNullException("first"); }
        if (second == null) { throw new ArgumentNullException("second"); }
    
        var secondList = second.ToList();
        return first.Where(s => !secondList.Remove(s));
    }
    
    public static IEnumerable<TSource> ExceptAll<TSource>(
        this IEnumerable<TSource> first,
        IEnumerable<TSource> second,
        IEqualityComparer<TSource> comparer)
    {
        if (first == null) { throw new ArgumentNullException("first"); }
        if (second == null) { throw new ArgumentNullException("second"); }
        var comparerUsed = comparer ?? EqualityComparer<TSource>.Default;
    
        var secondList = second.ToList();
        foreach (var item in first)
        {
            if (secondList.Contains(item, comparerUsed))
            {
                secondList.Remove(item);
            }
            else
            {
                yield return item;
            }
        }
    }
    

    Edit: A faster implemetation, based on the comment of DigEmAll

    public static IEnumerable<TSource> ExceptAll<TSource>(
            this IEnumerable<TSource> first,
            IEnumerable<TSource> second)
    {
        return ExceptAll(first, second, null);
    }
    
    public static IEnumerable<TSource> ExceptAll<TSource>(
        this IEnumerable<TSource> first,
        IEnumerable<TSource> second,
        IEqualityComparer<TSource> comparer)
    {
        if (first == null) { throw new ArgumentNullException("first"); }
        if (second == null) { throw new ArgumentNullException("second"); }
    
    
        var secondCounts = new Dictionary<TSource, int>(comparer ?? EqualityComparer<TSource>.Default);
        int count;
        int nullCount = 0;
    
        // Count the values from second
        foreach (var item in second)
        {
            if (item == null)
            {
                nullCount++;
            }
            else
            {
                if (secondCounts.TryGetValue(item, out count))
                {
                    secondCounts[item] = count + 1;
                }
                else
                {
                    secondCounts.Add(item, 1);
                } 
            }
        }
    
        // Yield the values from first
        foreach (var item in first)
        {
            if (item == null)
            {
                nullCount--;
                if (nullCount < 0)
                {
                    yield return item;
                } 
            }
            else
            {
                if (secondCounts.TryGetValue(item, out count))
                {
                    if (count == 0)
                    {
                        secondCounts.Remove(item);
                        yield return item;
                    }
                    else
                    {
                        secondCounts[item] = count - 1;
                    }
                }
                else
                {
                    yield return item;
                }
            }
        }
    }
    

    More info on my blog (also variant for Intersect and Union)

    0 讨论(0)
  • 2020-11-27 08:11

    The documentation for the Except function states:

    Produces the set difference of two sequences by using the default equality comparer to compare values.

    The set difference of two sets is defined as the members of the first set that do not appear in the second set.

    The important word here is set, which is defined as:

    ...an abstract data structure that can store certain values, without any particular order, and no repeated values...

    Because Except is documented as a set-based operation, it also has the effect of making the resulting values distinct.

    0 讨论(0)
  • 2020-11-27 08:14

    Given A = [1, 2, 2, 3, 3, 3] and B = [3].

    • A.Except(B); returns [1, 2] as Greg Beech explained in his response
    • A.ExceptAll(B); from Alex Siepman response, returns [1, 2, 2, 3, 3] (and I find the name ambiguous).
    • A.Where(v => !B.Contains(v)) from OP work around returns [1, 2, 2]

    I suppose that OP work around is the desired behavior, and this one as not be treated.

    The main issue with OP work around is that List<T>.Contains(T) is O(n) and Where is also O(n) making the solution O(n²) in time (for A and B of equivalent sizes) and O(1) in memory.

    We can make it O(n) in time and O(n) in memory by using hash set:

    // I accept any better name for this method
    public static IEnumerable<TSource> ExceptFrom<TSource>(
        IEnumerable<TSource> first,
        IEnumerable<TSource> second,
        IEqualityComparer<TSource> comparer)
    {
        if (first == null)
            throw new ArgumentNullException(nameof(first));
    
        if (second == null)
            throw new ArgumentNullException(nameof(second));
    
        var secondSet = second as HashSet<TSource> ?? // this trick ignore the comparer
                        second.ToHashSet(comparer ?? EqualityComparer<TSource>.Default);
    
        // Contains is O(1) for HashSet.
        return first.Where(v => !secondSet.Contains(v));
    }
    
    0 讨论(0)
提交回复
热议问题