PostgreSQL Removing duplicates

后端 未结 4 1746
不思量自难忘°
不思量自难忘° 2021-01-18 12:33

I am working on postgres query to remove duplicates from a table. The following table is dynamically generated and I want to write a select query which will remove the recor

相关标签:
4条回答
  • 2021-01-18 12:39
                 select count(first) as cnt, first, second 
                 from df1 
                 group by first
                 having(count(first) = 1)
    

    if you want to keep one of the rows (sorry, I initially missed it if you wanted that):

                 select first, min(second) 
                 from df1 
                 group by first
    

    Where the table's name is df1 and the columns are named first and second.

    You can actually leave off the count(first) as cnt if you want.

    At the risk of stating the obvious, once you know how to select the data you want (or don't want) the delete the records any of a dozen ways is simple.

    If you want to replace the table or make a new table you can just use create table as for the deletion:

                 create table tmp as 
                 select count(first) as cnt, first, second 
                 from df1 
                 group by first
                 having(count(first) = 1);
    
                 drop table df1;
    
                 create table df1 as select * from tmp;
    

    or using DELETE FROM:

    DELETE FROM df1 WHERE first NOT IN (SELECT first FROM tmp);
    

    You could also use select into, etc, etc.

    0 讨论(0)
  • 2021-01-18 12:40

    There is no need for an intermediate table:

    delete from df1
    where ctid not in (select min(ctid)
                       from df1
                       group by first_column
                       having count(*) > 1);
    

    If you are deleting many rows from a large table, the approach with an intermediate table is probably faster.


    If you just want to get unique values for one column, you can use:

    select distinct on (first_column) *
    from the_table
    order by the_table;
    

    Or simply

    select first_column, min(second_column)
    from the_table
    group by first_column;
    
    0 讨论(0)
  • 2021-01-18 12:54

    So basically I did this

     create temp t1 as 
     select first, min (second) as second
     from df1 
     group by first
    
     select * from df1 
     inner join t1 on t1.first = df1.first and t1.second = df1.second
    

    Its a satisfactory answer. Thanks for your help @Hack-R

    0 讨论(0)
  • 2021-01-18 12:55
    • if you want to SELECT unique rows:

    SELECT * FROM ztable u
    WHERE NOT EXISTS (      -- There is no other record
        SELECT * FROM ztable x
        WHERE x.id = u.id   -- with the same id
        AND x.ctid < u.ctid -- , but with a different(lower) "internal" rowid
        );                  -- so u.* must be unique
    

    • if you want to SELECT the other rows, which were suppressed in the previous query:

    SELECT * FROM ztable nu
    WHERE EXISTS (           -- another record exists
        SELECT * FROM ztable x
        WHERE x.id = nu.id   -- with the same id
        AND x.ctid < nu.ctid -- , but with a different(lower) "internal" rowid
        );
    

    • if you want to DELETE records, making the table unique (but keeping one record per id):

    DELETE FROM ztable d
    WHERE EXISTS (          -- another record exists
        SELECT * FROM ztable x
        WHERE x.id = d.id   -- with the same id
        AND x.ctid < d.ctid -- , but with a different(lower) "internal" rowid
        );
    
    0 讨论(0)
提交回复
热议问题