LINQ - Full Outer Join

后端 未结 16 1563
既然无缘
既然无缘 2020-11-21 22:45

I have a list of people\'s ID and their first name, and a list of people\'s ID and their surname. Some people don\'t have a first name and some don\'t have a surname; I\'d l

相关标签:
16条回答
  • 2020-11-21 22:57

    Full outer join for two or more tables: First extract the column that you want to join on.

    var DatesA = from A in db.T1 select A.Date; 
    var DatesB = from B in db.T2 select B.Date; 
    var DatesC = from C in db.T3 select C.Date;            
    
    var Dates = DatesA.Union(DatesB).Union(DatesC); 
    

    Then use left outer join between the extracted column and main tables.

    var Full_Outer_Join =
    
    (from A in Dates
    join B in db.T1
    on A equals B.Date into AB 
    
    from ab in AB.DefaultIfEmpty()
    join C in db.T2
    on A equals C.Date into ABC 
    
    from abc in ABC.DefaultIfEmpty()
    join D in db.T3
    on A equals D.Date into ABCD
    
    from abcd in ABCD.DefaultIfEmpty() 
    select new { A, ab, abc, abcd })
    .AsEnumerable();
    
    0 讨论(0)
  • 2020-11-21 22:58

    I think that LINQ join clause isn't the correct solution to this problem, because of join clause purpose isn't to accumulate data in such way as required for this task solution. The code to merge created separate collections becomes too complicated, maybe it is OK for learning purposes, but not for real applications. One of the ways how to solve this problem is in the code below:

    class Program
    {
        static void Main(string[] args)
        {
            List<FirstName> firstNames = new List<FirstName>();
            firstNames.Add(new FirstName { ID = 1, Name = "John" });
            firstNames.Add(new FirstName { ID = 2, Name = "Sue" });
    
            List<LastName> lastNames = new List<LastName>();
            lastNames.Add(new LastName { ID = 1, Name = "Doe" });
            lastNames.Add(new LastName { ID = 3, Name = "Smith" });
    
            HashSet<int> ids = new HashSet<int>();
            foreach (var name in firstNames)
            {
                ids.Add(name.ID);
            }
            foreach (var name in lastNames)
            {
                ids.Add(name.ID);
            }
            List<FullName> fullNames = new List<FullName>();
            foreach (int id in ids)
            {
                FullName fullName = new FullName();
                fullName.ID = id;
                FirstName firstName = firstNames.Find(f => f.ID == id);
                fullName.FirstName = firstName != null ? firstName.Name : string.Empty;
                LastName lastName = lastNames.Find(l => l.ID == id);
                fullName.LastName = lastName != null ? lastName.Name : string.Empty;
                fullNames.Add(fullName);
            }
        }
    }
    public class FirstName
    {
        public int ID;
    
        public string Name;
    }
    
    public class LastName
    {
        public int ID;
    
        public string Name;
    }
    class FullName
    {
        public int ID;
    
        public string FirstName;
    
        public string LastName;
    }
    

    If real collections are large for HashSet formation instead foreach loops can be used the code below:

    List<int> firstIds = firstNames.Select(f => f.ID).ToList();
    List<int> LastIds = lastNames.Select(l => l.ID).ToList();
    HashSet<int> ids = new HashSet<int>(firstIds.Union(LastIds));//Only unique IDs will be included in HashSet
    
    0 讨论(0)
  • 2020-11-21 22:59

    I'm guessing @sehe's approach is stronger, but until I understand it better, I find myself leap-frogging off of @MichaelSander's extension. I modified it to match the syntax and return type of the built-in Enumerable.Join() method described here. I appended the "distinct" suffix in respect to @cadrell0's comment under @JeffMercado's solution.

    public static class MyExtensions {
    
        public static IEnumerable<TResult> FullJoinDistinct<TLeft, TRight, TKey, TResult> (
            this IEnumerable<TLeft> leftItems, 
            IEnumerable<TRight> rightItems, 
            Func<TLeft, TKey> leftKeySelector, 
            Func<TRight, TKey> rightKeySelector,
            Func<TLeft, TRight, TResult> resultSelector
        ) {
    
            var leftJoin = 
                from left in leftItems
                join right in rightItems 
                  on leftKeySelector(left) equals rightKeySelector(right) into temp
                from right in temp.DefaultIfEmpty()
                select resultSelector(left, right);
    
            var rightJoin = 
                from right in rightItems
                join left in leftItems 
                  on rightKeySelector(right) equals leftKeySelector(left) into temp
                from left in temp.DefaultIfEmpty()
                select resultSelector(left, right);
    
            return leftJoin.Union(rightJoin);
        }
    
    }
    

    In the example, you would use it like this:

    var test = 
        firstNames
        .FullJoinDistinct(
            lastNames,
            f=> f.ID,
            j=> j.ID,
            (f,j)=> new {
                ID = f == null ? j.ID : f.ID, 
                leftName = f == null ? null : f.Name,
                rightName = j == null ? null : j.Name
            }
        );
    

    In the future, as I learn more, I have a feeling I'll be migrating to @sehe's logic given it's popularity. But even then I'll have to be careful, because I feel it is important to have at least one overload that matches the syntax of the existing ".Join()" method if feasible, for two reasons:

    1. Consistency in methods helps save time, avoid errors, and avoid unintended behavior.
    2. If there ever is an out-of-the-box ".FullJoin()" method in the future, I would imagine it will try to keep to the syntax of the currently existing ".Join()" method if it can. If it does, then if you want to migrate to it, you can simply rename your functions without changing the parameters or worrying about different return types breaking your code.

    I'm still new with generics, extensions, Func statements, and other features, so feedback is certainly welcome.

    EDIT: Didn't take me long to realize there was a problem with my code. I was doing a .Dump() in LINQPad and looking at the return type. It was just IEnumerable, so I tried to match it. But when I actually did a .Where() or .Select() on my extension I got an error: "'System Collections.IEnumerable' does not contain a definition for 'Select' and ...". So in the end I was able to match the input syntax of .Join(), but not the return behavior.

    EDIT: Added "TResult" to the return type for the function. Missed that when reading the Microsoft article, and of course it makes sense. With this fix, it now seems the return behavior is in line with my goals after all.

    0 讨论(0)
  • 2020-11-21 22:59

    Thank You everybody for the interesting posts!

    I modified the code because in my case I needed

    • a personalized join predicate
    • a personalized union distinct comparer

    For the ones interested this is my modified code (in VB, sorry)

        Module MyExtensions
            <Extension()>
            Friend Function FullOuterJoin(Of TA, TB, TResult)(ByVal a As IEnumerable(Of TA), ByVal b As IEnumerable(Of TB), ByVal joinPredicate As Func(Of TA, TB, Boolean), ByVal projection As Func(Of TA, TB, TResult), ByVal comparer As IEqualityComparer(Of TResult)) As IEnumerable(Of TResult)
                Dim joinL =
                    From xa In a
                    From xb In b.Where(Function(x) joinPredicate(xa, x)).DefaultIfEmpty()
                    Select projection(xa, xb)
                Dim joinR =
                    From xb In b
                    From xa In a.Where(Function(x) joinPredicate(x, xb)).DefaultIfEmpty()
                    Select projection(xa, xb)
                Return joinL.Union(joinR, comparer)
            End Function
        End Module
    
        Dim fullOuterJoin = lefts.FullOuterJoin(
            rights,
            Function(left, right) left.Code = right.Code And (left.Amount [...] Or left.Description.Contains [...]),
            Function(left, right) New CompareResult(left, right),
            New MyEqualityComparer
        )
    
        Public Class MyEqualityComparer
            Implements IEqualityComparer(Of CompareResult)
    
            Private Function GetMsg(obj As CompareResult) As String
                Dim msg As String = ""
                msg &= obj.Code & "_"
                [...]
                Return msg
            End Function
    
            Public Overloads Function Equals(x As CompareResult, y As CompareResult) As Boolean Implements IEqualityComparer(Of CompareResult).Equals
                Return Me.GetMsg(x) = Me.GetMsg(y)
            End Function
    
            Public Overloads Function GetHashCode(obj As CompareResult) As Integer Implements IEqualityComparer(Of CompareResult).GetHashCode
                Return Me.GetMsg(obj).GetHashCode
            End Function
        End Class
    
    0 讨论(0)
  • 2020-11-21 23:00

    Here is an extension method doing that:

    public static IEnumerable<KeyValuePair<TLeft, TRight>> FullOuterJoin<TLeft, TRight>(this IEnumerable<TLeft> leftItems, Func<TLeft, object> leftIdSelector, IEnumerable<TRight> rightItems, Func<TRight, object> rightIdSelector)
    {
        var leftOuterJoin = from left in leftItems
            join right in rightItems on leftIdSelector(left) equals rightIdSelector(right) into temp
            from right in temp.DefaultIfEmpty()
            select new { left, right };
    
        var rightOuterJoin = from right in rightItems
            join left in leftItems on rightIdSelector(right) equals leftIdSelector(left) into temp
            from left in temp.DefaultIfEmpty()
            select new { left, right };
    
        var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);
    
        return fullOuterJoin.Select(x => new KeyValuePair<TLeft, TRight>(x.left, x.right));
    }
    
    0 讨论(0)
  • 2020-11-21 23:04

    I don't know if this covers all cases, logically it seems correct. The idea is to take a left outer join and right outer join then take the union of the results.

    var firstNames = new[]
    {
        new { ID = 1, Name = "John" },
        new { ID = 2, Name = "Sue" },
    };
    var lastNames = new[]
    {
        new { ID = 1, Name = "Doe" },
        new { ID = 3, Name = "Smith" },
    };
    var leftOuterJoin =
        from first in firstNames
        join last in lastNames on first.ID equals last.ID into temp
        from last in temp.DefaultIfEmpty()
        select new
        {
            first.ID,
            FirstName = first.Name,
            LastName = last?.Name,
        };
    var rightOuterJoin =
        from last in lastNames
        join first in firstNames on last.ID equals first.ID into temp
        from first in temp.DefaultIfEmpty()
        select new
        {
            last.ID,
            FirstName = first?.Name,
            LastName = last.Name,
        };
    var fullOuterJoin = leftOuterJoin.Union(rightOuterJoin);
    

    This works as written since it is in LINQ to Objects. If LINQ to SQL or other, the query processor might not support safe navigation or other operations. You'd have to use the conditional operator to conditionally get the values.

    i.e.,

    var leftOuterJoin =
        from first in firstNames
        join last in lastNames on first.ID equals last.ID into temp
        from last in temp.DefaultIfEmpty()
        select new
        {
            first.ID,
            FirstName = first.Name,
            LastName = last != null ? last.Name : default,
        };
    
    0 讨论(0)
提交回复
热议问题