SQL Find duplicate with several field (no unique ID) WORK AROUND

前端 未结 2 747
Happy的楠姐
Happy的楠姐 2021-01-24 04:12

I am trying to find duplicated vendors from a database using several fields from vendor table and vendor_address table. The thing is the more inner join I mak

2条回答
  •  时光取名叫无心
    2021-01-24 05:14

    It seems as if your joins are a bit interesting, for more reasons than one. Firstly, you have inner joins, which will eliminate all but those which have all signs of duplications - this is something which you don't want. Additionally, you seem to have the same alias, oc, on all derived tables - that's not really gonna fly here, and you're not going to get very far with that.

    Instead of doing it this way, I'd suggest that you have your basic query repeated for each of the duplication signs - as follows (I removed the same_address_nb and same_postal_nb fields, and you'll see why):

    select 
        o.vendor_id
        ,o.vndr_name_shrt_user
        ,O.COUNTRY 
        ,O.VENDOR_NAME_SHORT 
        ,B.POSTAL
        ,B.ADDRESS1
        ,OC.SAME_SHORT_NAME
        ,oc.SAME_USER_NUM
    from VENDOR o
    JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID
    WHERE O.COUNTRY ='CANADA'
    AND B.COUNTY = 'CANADA'
    AND ...
    

    For each one of these duplication signs, you'll add a nested query to the ellipses shown above as follows - example shown using the duplicate in vndr_name_shrt_user:

    select 
        o.vendor_id
        ,o.vndr_name_shrt_user
        ,O.COUNTRY 
        ,O.VENDOR_NAME_SHORT 
        ,B.POSTAL
        ,B.ADDRESS1
        ,OC.SAME_SHORT_NAME
        ,oc.SAME_USER_NUM
        ,'SAME_USER_NUM' as duplicateFlag
    from VENDOR o
    JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID
    WHERE O.COUNTRY ='CANADA'
    AND B.COUNTY = 'CANADA'
    AND o.vndr_name_shrt_user in 
    (
        SELECT 
            vndr_name_shrt_user
        FROM VENDOR 
        WHERE COUNTRY = o.country
        AND VENDOR_STATUS = 'A'
        GROUP BY vndr_name_shrt_user
        HAVING COUNT(*) > 1
    ) 
    

    You can UNION ALL these queries together and then see all of your duplicates.

    As a side note, you had a check for the country = 'canada' twice in the last three derived table.

    UPDATE: showing more than one duplicate flag

    select 
        o.vendor_id
        ,o.vndr_name_shrt_user
        ,O.COUNTRY 
        ,O.VENDOR_NAME_SHORT 
        ,B.POSTAL
        ,B.ADDRESS1
        ,OC.SAME_SHORT_NAME
        ,oc.SAME_USER_NUM
        ,'SAME_USER_NUM' as duplicateFlag
    from VENDOR o
    JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID
    WHERE O.COUNTRY ='CANADA'
    AND B.COUNTY = 'CANADA'
    AND o.vndr_name_shrt_user in 
    (
        SELECT 
            vndr_name_shrt_user
        FROM VENDOR 
        WHERE COUNTRY = o.country
        AND VENDOR_STATUS = 'A'
        GROUP BY vndr_name_shrt_user
        HAVING COUNT(*) > 1
    ) 
    
    UNION ALL
    
    select 
        o.vendor_id
        ,o.vndr_name_shrt_user
        ,O.COUNTRY 
        ,O.VENDOR_NAME_SHORT 
        ,B.POSTAL
        ,B.ADDRESS1
        ,OC.SAME_SHORT_NAME
        ,oc.SAME_USER_NUM
        ,'VENDOR_NAME_SHORT' as duplicateFlag
    from VENDOR o
    JOIN vendor_addr B ON o.VENDOR_ID = B.VENDOR_ID
    WHERE O.COUNTRY ='CANADA'
    AND B.COUNTY = 'CANADA'
    AND o.VENDOR_NAME_SHORT in 
    (
        SELECT 
            VENDOR_NAME_SHORT
        FROM VENDOR 
        WHERE COUNTRY = o.country
        AND VENDOR_STATUS = 'A'
        GROUP BY VENDOR_NAME_SHORT
        HAVING COUNT(*) > 1
    ) 
    

提交回复
热议问题