Find duplicate records in MySQL

后端 未结 23 2848
别跟我提以往
别跟我提以往 2020-11-21 23:12

I want to pull out duplicate records in a MySQL Database. This can be done with:

SELECT address, count(id) as cnt FROM list
GROUP BY address HAVING cnt >         


        
相关标签:
23条回答
  • 2020-11-21 23:45

    Personally this query has solved my problem:

    SELECT `SUB_ID`, COUNT(SRV_KW_ID) as subscriptions FROM `SUB_SUBSCR` group by SUB_ID, SRV_KW_ID HAVING subscriptions > 1;
    

    What this script does is showing all the subscriber ID's that exists more than once into the table and the number of duplicates found.

    This are the table columns:

    | SUB_SUBSCR_ID | int(11)     | NO   | PRI | NULL    | auto_increment |
    | MSI_ALIAS     | varchar(64) | YES  | UNI | NULL    |                |
    | SUB_ID        | int(11)     | NO   | MUL | NULL    |                |    
    | SRV_KW_ID     | int(11)     | NO   | MUL | NULL    |                |
    

    Hope it will be helpful for you either!

    0 讨论(0)
  • 2020-11-21 23:46

    Finding duplicate addresses is much more complex than it seems, especially if you require accuracy. A MySQL query is not enough in this case...

    I work at SmartyStreets, where we do address validation and de-duplication and other stuff, and I've seen a lot of diverse challenges with similar problems.

    There are several third-party services which will flag duplicates in a list for you. Doing this solely with a MySQL subquery will not account for differences in address formats and standards. The USPS (for US address) has certain guidelines to make these standard, but only a handful of vendors are certified to perform such operations.

    So, I would recommend the best answer for you is to export the table into a CSV file, for instance, and submit it to a capable list processor. One such is LiveAddress which will have it done for you in a few seconds to a few minutes automatically. It will flag duplicate rows with a new field called "Duplicate" and a value of Y in it.

    0 讨论(0)
  • 2020-11-21 23:49

    Powerlord answer is indeed the best and I would recommend one more change: use LIMIT to make sure db would not get overloaded:

    SELECT firstname, lastname, list.address FROM list
    INNER JOIN (SELECT address FROM list
    GROUP BY address HAVING count(id) > 1) dup ON list.address = dup.address
    LIMIT 10
    

    It is a good habit to use LIMIT if there is no WHERE and when making joins. Start with small value, check how heavy the query is and then increase the limit.

    0 讨论(0)
  • 2020-11-21 23:50
        Find duplicate Records:
    
        Suppose we have table : Student 
        student_id int
        student_name varchar
        Records:
        +------------+---------------------+
        | student_id | student_name        |
        +------------+---------------------+
        |        101 | usman               |
        |        101 | usman               |
        |        101 | usman               |
        |        102 | usmanyaqoob         |
        |        103 | muhammadusmanyaqoob |
        |        103 | muhammadusmanyaqoob |
        +------------+---------------------+
    
        Now we want to see duplicate records
        Use this query:
    
    
       select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;
    
    +--------------------+------------+---+
    | student_name        | student_id | c |
    +---------------------+------------+---+
    | usman               |        101 | 3 |
    | muhammadusmanyaqoob |        103 | 2 |
    +---------------------+------------+---+
    
    0 讨论(0)
  • 2020-11-21 23:50

    select address from list where address = any (select address from (select address, count(id) cnt from list group by address having cnt > 1 ) as t1) order by address

    the inner sub-query returns rows with duplicate address then the outer sub-query returns the address column for address with duplicates. the outer sub-query must return only one column because it used as operand for the operator '= any'

    0 讨论(0)
  • 2020-11-21 23:53
    select * from table_name t1 inner join (select distinct <attribute list> from table_name as temp)t2 where t1.attribute_name = t2.attribute_name
    

    For your table it would be something like

    select * from list l1 inner join (select distinct address from list as list2)l2 where l1.address=l2.address
    

    This query will give you all the distinct address entries in your list table... I am not sure how this will work if you have any primary key values for name, etc..

    0 讨论(0)
提交回复
热议问题