SQL Remove almost duplicate rows

前端 未结 4 1158
伪装坚强ぢ
伪装坚强ぢ 2020-12-30 16:39

I have a table that contains unfortuantely bad data and I\'m trying to filter some out. I am sure that the LName, FName combonation is unique since the data set is small en

相关标签:
4条回答
  • 2020-12-30 17:02

    This drops the null rows if there are any non null values.

    SELECT  lname
            , fname
            , MIN(email)
    FROM    YourTable
    GROUP BY
            lname
            , fname
    

    Test script

    DECLARE @Test TABLE (
      LName VARCHAR(32)
      , FName VARCHAR(32)
      , Email VARCHAR(32)
    )
    
    INSERT INTO @Test
      SELECT 'Smith', 'Bob', 'bsmith@example.com'
      UNION ALL SELECT 'Smith', 'Bob', 'NULL'
      UNION ALL SELECT 'Doe', 'Jane', 'NULL'
      UNION ALL SELECT 'White', 'Don', 'dwhite@example.com'
    
    SELECT  lname
            , fname
            , MIN(Email)        
    FROM    @Test
    GROUP BY
            lname
            , fname
    
    0 讨论(0)
  • 2020-12-30 17:05

    Here is a relatively simple query that uses standard SQL and does just this:

    SELECT * FROM Person P
    WHERE Email IS NOT NULL OR -- Take all people with non-null e-mails
          Email IS NULL AND    -- and all people with null e-mails, as long as
            NOT EXISTS         -- there is no duplicate record of the same person
              (SELECT *        -- with a non-null e-mail
               FROM Person P2 
               WHERE P2.LName=P.LName AND P2.FName=P.FName AND P2.Email IS NOT NULL)
    
    0 讨论(0)
  • 2020-12-30 17:08

    You can use ROW_NUMBER() analytic function:

    SELECT *
      FROM (
                    SELECT a.*, ROW_NUMBER() OVER(PARTITION BY LName, FName ORDER BY Email DESC) rnk
                        FROM <YOUR_TABLE> a
                    ) a
    WHERE RNK = 1
    
    0 讨论(0)
  • 2020-12-30 17:09

    Since there are plenty of SQL solutions posted already, you may want to create a data fix to remove the bad data, then add the necessary constraints to prevent bad data from ever being inserted. Bad data in a database is a side effect of poor design.

    0 讨论(0)
提交回复
热议问题