Finding duplicate values in MySQL

前端未结

关注

 25  2198

I have a table with a varchar column, and I would like to find all the records that have duplicate values in this column. What is the best query I can use to find the duplic

相关标签:

25条回答

深忆病人

2020-11-22 04:35

Select column_name, column_name1,column_name2, count(1) as temp from table_name group by column_name having temp > 1

0 讨论(0)

借酒劲吻你

2020-11-22 04:36
Building off of levik's answer to get the IDs of the duplicate rows you can do a GROUP_CONCAT if your server supports it (this will return a comma separated list of ids).
```
SELECT GROUP_CONCAT(id), name, COUNT(*) c FROM documents GROUP BY name HAVING c > 1;
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
遇见更好的自我

2020-11-22 04:37
Assuming your table is named TableABC and the column which you want is Col and the primary key to T1 is Key.
```
SELECT a.Key, b.Key, a.Col 
FROM TableABC a, TableABC b
WHERE a.Col = b.Col 
AND a.Key <> b.Key
```
The advantage of this approach over the above answer is it gives the Key.
0 讨论(0)
发布评论:

提交评论
- 加载中...

逝去的感伤

2020-11-22 04:37

SELECT DISTINCT a.email FROM `users` a LEFT JOIN `users` b ON a.email = b.email WHERE a.id != b.id;

0 讨论(0)

日久生厌

2020-11-22 04:37
For removing duplicate rows with multiple fields , first cancate them to the new unique key which is specified for the only distinct rows, then use "group by" command to removing duplicate rows with the same new unique key:
```
Create TEMPORARY table tmp select concat(f1,f2) as cfs,t1.* from mytable as t1;
Create index x_tmp_cfs on tmp(cfs);
Create table unduptable select f1,f2,... from tmp group by cfs;
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
离开以前

2020-11-22 04:38
```
SELECT  *
FROM    mytable mto
WHERE   EXISTS
        (
        SELECT  1
        FROM    mytable mti
        WHERE   mti.varchar_column = mto.varchar_column
        LIMIT 1, 1
        )
```
This query returns complete records, not just distinct varchar_column's.

This query doesn't use COUNT(*). If there are lots of duplicates, COUNT(*) is expensive, and you don't need the whole COUNT(*), you just need to know if there are two rows with same value.

This is achieved by the LIMIT 1, 1 at the bottom of the correlated query (essentially meaning "return the second row"). EXISTS would only return true if the aforementioned second row exists (i. e. there are at least two rows with the same value of varchar_column) .

Having an index on varchar_column will, of course, speed up this query greatly.
0 讨论(0)
发布评论:

提交评论
- 加载中...