问题
I have a job setup that currently selects records from a table that does not contain a unique index. I realize this could be solved by just putting an index on the table and the relevant columns but, in this scenario for testing purposes, I need to remove the index and then do a select which will also remove duplicates based on 2 columns:
SELECT DISTINCT [author], [pubDate], [dateadded]
FROM [Feeds].[dbo].[socialPosts]
WHERE CAST(FLOOR(CAST(dateadded AS float)) AS datetime) >
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE() - 2), 0)
AND CAST(FLOOR(CAST(dateadded AS float)) AS datetime) <
DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0)
This selects all records from the day before and I want to dedupe the records based on author and pubdate. This could be a post select or done prior but the idea is to find out if it can be done within a select.
回答1:
You can use a GROUP BY
and any aggregate function on the dateadded
column to get unique author, pubdate
results.
SELECT [author]
,[pubDate]
,MAX([dateadded])
FROM [Feeds].[dbo].[socialPosts]
WHERE CAST(FLOOR(CAST(dateadded AS float)) AS datetime) > dateadd(day,datediff(day, 0, getdate()-2), 0)
AND CAST(FLOOR(CAST(dateadded AS float)) AS datetime) < dateadd(day,datediff(day, 0, getDate()), 0)
GROUP BY
[author]
, [pubdate]
来源:https://stackoverflow.com/questions/11402025/sql-select-3-columns-and-dedupe-on-two-columns