问题
I have a table like this.
|-DT--------- |-ID------|
|5/30 12:00pm |10 |
|5/30 01:00pm |30 |
|5/30 02:30pm |30 |
|5/30 03:00pm |50 |
|5/30 04:30pm |10 |
|5/30 05:00pm |10 |
|5/30 06:30pm |10 |
|5/30 07:30pm |10 |
|5/30 08:00pm |50 |
|5/30 09:30pm |10 |
I want to remove any duplicate rows only if the previous row has the same ID as the following row. I want to keep the duplicate row with the datetime furthest in the future. For example the above table would look like this.
|-DT--------- |-ID------|
|5/30 12:00pm |10 |
|5/30 02:30pm |30 |
|5/30 03:00pm |50 |
|5/30 07:30pm |10 |
|5/30 08:00pm |50 |
|5/30 09:30pm |10 |
Can I get any tips on how this can be done?
回答1:
with C as
(
select ID,
row_number() over(order by DT) as rn
from YourTable
)
delete C1
from C as C1
inner join C as C2
on C1.rn = C2.rn-1 and
C1.ID = C2.ID
SE-Data
回答2:
Do these 3 steps: http://www.sqlfiddle.com/#!3/b58b9/19
First make the rows sequential:
with a as
(
select dt, id, row_number() over(order by dt) as rn
from tbl
)
select * from a;
Output:
| DT | ID | RN |
----------------------------------------
| May, 30 2012 12:00:00-0700 | 10 | 1 |
| May, 30 2012 13:00:00-0700 | 30 | 2 |
| May, 30 2012 14:30:00-0700 | 30 | 3 |
| May, 30 2012 15:00:00-0700 | 50 | 4 |
| May, 30 2012 16:30:00-0700 | 10 | 5 |
| May, 30 2012 17:00:00-0700 | 10 | 6 |
| May, 30 2012 18:30:00-0700 | 10 | 7 |
| May, 30 2012 19:30:00-0700 | 10 | 8 |
| May, 30 2012 20:00:00-0700 | 50 | 9 |
| May, 30 2012 21:30:00-0700 | 10 | 10 |
Second, using the sequential numbers, we can find which rows are at the bottom (and also those not at the bottom for that matter):
with a as
(
select dt, id, row_number() over(order by dt) as rn
from tbl
)
select below.*,
case when above.id <> below.id or above.id is null then
1
else
0
end as is_at_bottom
from a below
left join a above on above.rn + 1 = below.rn;
Output:
| DT | ID | RN | IS_AT_BOTTOM |
-------------------------------------------------------
| May, 30 2012 12:00:00-0700 | 10 | 1 | 1 |
| May, 30 2012 13:00:00-0700 | 30 | 2 | 1 |
| May, 30 2012 14:30:00-0700 | 30 | 3 | 0 |
| May, 30 2012 15:00:00-0700 | 50 | 4 | 1 |
| May, 30 2012 16:30:00-0700 | 10 | 5 | 1 |
| May, 30 2012 17:00:00-0700 | 10 | 6 | 0 |
| May, 30 2012 18:30:00-0700 | 10 | 7 | 0 |
| May, 30 2012 19:30:00-0700 | 10 | 8 | 0 |
| May, 30 2012 20:00:00-0700 | 50 | 9 | 1 |
| May, 30 2012 21:30:00-0700 | 10 | 10 | 1 |
Third, delete all rows not at the bottom:
with a as
(
select dt, id, row_number() over(order by dt) as rn
from tbl
)
,b as
(
select below.*,
case when above.id <> below.id or above.id is null then
1
else
0
end as is_at_bottom
from a below
left join a above on above.rn + 1 = below.rn
)
delete a
from a
inner join b on b.rn = a.rn
where b.is_at_bottom = 0;
To verify:
select * from tbl order by dt;
Output:
| DT | ID |
-----------------------------------
| May, 30 2012 12:00:00-0700 | 10 |
| May, 30 2012 13:00:00-0700 | 30 |
| May, 30 2012 15:00:00-0700 | 50 |
| May, 30 2012 16:30:00-0700 | 10 |
| May, 30 2012 20:00:00-0700 | 50 |
| May, 30 2012 21:30:00-0700 | 10 |
You can also simplify the deletion to this: http://www.sqlfiddle.com/#!3/b58b9/20
with a as
(
select dt, id, row_number() over(order by dt, id) as rn
from tbl
)
delete above
from a below
left join a above on above.rn + 1 = below.rn
where case when above.id <> below.id or above.id is null then 1 else 0 end = 0;
Mikael Eriksson's answer is the best though, if I simplify again my simplified query, it will look like his answer ツ For that, I +1'd his answer. I will just make his query a bit more readable though; by swapping the joining order and giving good aliases.
with a as
(
select *, row_number() over(order by dt, id) as rn
from tbl
)
delete above
from a below
join a above on above.rn + 1 = below.rn and above.id = below.id;
Live test: http://www.sqlfiddle.com/#!3/b58b9/24
回答3:
Here you go, simply replace [Table] with the name of your table.
SELECT *
FROM [dbo].[Table]
WHERE [Ident] NOT IN
(
SELECT Extent.[Ident]
FROM
(
SELECT TOP 100 PERCENT T1.[DT],
T1.[ID],
T1.[Ident],
(
SELECT TOP 1 Previous.ID
FROM [dbo].[Table] AS Previous
WHERE Previous.[Ident] > T1.Ident -- this is where the identity seed is important
ORDER BY [Ident] ASC
) AS 'PreviousId'
FROM [dbo].[Table] AS T1
ORDER BY T1.[Ident] DESC
) AS Extent
WHERE [Id] = [PreviousId]
)
Note: You will need an indentity column on the table - use a CTE if you can't change the structure of the table.
回答4:
You can try following Query ...
select * from
(
select *,RANK() OVER (ORDER BY dt,id) AS Rank from test
) as a
where 0 = (
select count(id) from (
select id, RANK() OVER (ORDER BY dt,id) AS Rank from test
)as b where b.id = a.id and b.Rank = a.Rank + 1
) order by dt
Thanks, Mahesh
来源:https://stackoverflow.com/questions/11589499/finding-next-row-in-sql-query-and-deleting-it-only-if-previous-row-matches