问题
I have problem on db search with like and elastic search in Turkish upper and lower case.
For example I have posts table which contains post titled 'DENEME YAZI'
.
If I run this query:
select * from posts where title like '%deneme%';
or:
select * from posts where title like '%YAZI%';
I get correct result but if I run:
select * from posts where title like '%yazı%';
it doesn't return any record. My database encoding is tr_TR.UTF-8
.
How can I get correct results without entering exact word?
回答1:
You must use ILIKE
for case insensitive matches:
select * from posts where title ilike '%yazı%';
However, there is the additional complication of peculiar rules in the Turkish locale. Upper case of 'ı'
is 'I'
. But not the other way round. Lower case of 'I'
is 'i'
:
db=# SELECT lower(upper('ı'));
lower
-------
i
You could solve that by applying upper()
on either side of the LIKE
expression:
select upper('DENEME YAZI') like ('%' || upper('yazı') || '%');
回答2:
Applying just a single UPPER (or LOWER) on either side of the expression is not a solution. You should handle problematic Turkish characters (ıI-iİ) by yourself.
- İ and i are the same letters in Turkish alphabet.
- I and ı are the same letters in Turkish alphabet.
But even using UTF-8, Latin5, Windows 1254 Encoding and collation settings in postgre
- UPPER('İ') returns 'İ' OK
- UPPER('i') return 'I' Not OK
- UPPER('I') returns 'I' OK
- UPPER('ı') return 'İ' Not OK
so
- SELECT ... FROM ... WHERE ... UPPER('İZMİR') like UPPER('izmir') return false
- SELECT ... FROM ... WHERE ... UPPER('ISPARTA') like UPPER('ısparta') return false.
Here's some more precise but not perfect solution because of performance issues
SELECT ... FROM ... WHERE ...
UPPER(REPLACE(REPLACE(COLUMNX, 'i', 'İ'), 'ı', 'I')) = UPPER(REPLACE(REPLACE(myvalue,
'i', 'İ'), 'ı', 'I'))
or
SELECT ... FROM ... WHERE ...
UPPER(TRANSLATE('COLUMNX','ıi','Iİ')) = UPPER(TRANSLATE(myvalue,'ıi','Iİ'))
来源:https://stackoverflow.com/questions/24295566/search-with-turkish-characters