发表新帖

发表新帖

Find possible duplicates in two columns ignoring case and special characters

后端未结

关注

 3  1046

北海茫月 2021-02-09 05:22

Query

SELECT COUNT(*), name, number
FROM   tbl
GROUP  BY name, number
HAVING COUNT(*) > 1

It sometimes fails to find duplicates between lowe

3条回答

灰色年华 (楼主)

2021-02-09 06:22
lower()/ upper()

Use one of these to fold characters to either lower or upper case. Special characters are not affected:
```
SELECT count(*), lower(name), number
FROM   tbl
GROUP  BY lower(name), number
HAVING count(*) > 1;
```
unaccent()

If you actually want to ignore diacritic signs, like your comments imply, install the additional module unaccent, which provides a text search dictionary that removes accents and also the general purpose function unaccent():
```
CREATE EXTENSION unaccent;
```
Makes it very simple:
```
SELECT lower(unaccent('Büßercafé')) AS norm
```
Result:
```
busercafe
```
This doesn't strip non-letters. Add regexp_replace() like @Craig mentioned for that:
```
SELECT lower(unaccent(regexp_replace('$s^o&f!t Büßercafé', '\W', '', 'g') ))
                                                                     AS norm
```
Result:
```
softbusercafe
```
You can even build a functional index on top of that:
- Does PostgreSQL support "accent insensitive" collations?
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题