问题
I have a table that contains unfortuantely bad data and I'm trying to filter some out. I am sure that the LName, FName combonation is unique since the data set is small enough to verify.
LName, FName, Email
----- ----- -----
Smith Bob bsmith@example.com
Smith Bob NULL
Doe Jane NULL
White Don dwhite@example.com
I would like to have the query results bring back the "duplicate" record that does not have a NULL email, yet still bring back a NULL Email when there is not a duplicate.
E.g.
Smith Bob bsmith@example.com
Doe Jane NULL
White Don dwhite@example.com
I think the solution is similar to Sql, remove duplicate rows by value, but I don't really understand if the asker's requirements are the same as mine.
Any suggestions?
Thanks
回答1:
This drops the null rows if there are any non null values.
SELECT lname
, fname
, MIN(email)
FROM YourTable
GROUP BY
lname
, fname
Test script
DECLARE @Test TABLE (
LName VARCHAR(32)
, FName VARCHAR(32)
, Email VARCHAR(32)
)
INSERT INTO @Test
SELECT 'Smith', 'Bob', 'bsmith@example.com'
UNION ALL SELECT 'Smith', 'Bob', 'NULL'
UNION ALL SELECT 'Doe', 'Jane', 'NULL'
UNION ALL SELECT 'White', 'Don', 'dwhite@example.com'
SELECT lname
, fname
, MIN(Email)
FROM @Test
GROUP BY
lname
, fname
回答2:
You can use ROW_NUMBER() analytic function:
SELECT *
FROM (
SELECT a.*, ROW_NUMBER() OVER(PARTITION BY LName, FName ORDER BY Email DESC) rnk
FROM <YOUR_TABLE> a
) a
WHERE RNK = 1
回答3:
Here is a relatively simple query that uses standard SQL and does just this:
SELECT * FROM Person P
WHERE Email IS NOT NULL OR -- Take all people with non-null e-mails
Email IS NULL AND -- and all people with null e-mails, as long as
NOT EXISTS -- there is no duplicate record of the same person
(SELECT * -- with a non-null e-mail
FROM Person P2
WHERE P2.LName=P.LName AND P2.FName=P.FName AND P2.Email IS NOT NULL)
回答4:
Since there are plenty of SQL solutions posted already, you may want to create a data fix to remove the bad data, then add the necessary constraints to prevent bad data from ever being inserted. Bad data in a database is a side effect of poor design.
来源:https://stackoverflow.com/questions/4566591/sql-remove-almost-duplicate-rows