Why do multiple-table joins produce duplicate rows?

前端 未结 4 1484

Let\'s say I have three tables A, B, and C. Each has two columns: a primary key and some other piece of data. They each have the same number of rows. If I JOIN

相关标签:
4条回答
  • 2020-12-24 11:54

    This might sound like a really basic "DUH" answer, but make sure that the column you're using to Lookup from on the merging file is actually full of unique values!

    I noticed earlier today that PowerQuery won't throw you an error (like in PowerPivot) and will happily allow you to run a Many-Many merge. This will result in multiple rows being produced for each record that matches with a non-unique value.

    0 讨论(0)
  • 2020-12-24 11:57

    If one of the tables M, S, D, or H has more than one row for a given Id (if just the Id column is not the Primary Key), then the query would result in "duplicate" rows. If you have more than one row for an Id in a table, then the other columns, which would uniquely identify a row, also must be included in the JOIN condition(s).

    References:

    Related Question on MSDN Forum

    0 讨论(0)
  • 2020-12-24 12:02

    Ok in this example you are getting duplicates because you are joining both D and S onto M. I assume you should be joining D.id onto S.id like below:

    SELECT *
    FROM M
    INNER JOIN S
        on M.Id = S.Id
    INNER JOIN D
        ON S.Id = D.Id
    INNER JOIN H
        ON D.Id = H.Id
    
    0 讨论(0)
  • 2020-12-24 12:13

    When you have related tables you often have one-to-many or many-to-many relationships. So when you join to TableB each record in TableA many have multiple records in TableB. This is normal and expected.

    Now at times you only need certain columns and those are all the same for all the records, then you would need to do some sort of group by or distinct to remove the duplicates. Let's look at an example:

    TableA
    Id Field1
    1  test
    2  another test
    
    TableB
    ID Field2 field3
    1  Test1  something
    1  test1  More something
    2  Test2  Anything
    

    So when you join them and select all the files you get:

    select * 
    from tableA a 
    join tableb b on a.id = b.id
    
    a.Id a.Field1        b.id   b.field2  b.field3
    1    test            1      Test1     something
    1    test            1      Test1     More something
    2    another test 2  2      Test2     Anything
    

    These are not duplicates because the values of Field3 are different even though there are repeated values in the earlier fields. Now when you only select certain columns the same number of records are being joined together but since the columns with the different information is not being displayed they look like duplicates.

    select a.Id, a.Field1,  b.field2
    from tableA a 
    join tableb b on a.id = b.id
    
    a.Id a.Field1       b.field2  
    1    test           Test1     
    1    test           Test1 
    2    another test   Test2
    

    This appears to be duplicates but it is not because of the multiple records in TableB.

    You normally fix this by using aggregates and group by, by using distinct or by filtering in the where clause to remove duplicates. How you solve this depends on exactly what your business rule is and how your database is designed and what kind of data is in there.

    0 讨论(0)
提交回复
热议问题