问题
I noticed a strange collation issue with postgresql-9.5 as it was giving different output to a Python script. As I understand it, normally characters are compared one at a time from left to right when sorting:
select 'ab' < 'ac';
t
select 'abX' < 'ac';
t
So it's irrelevant if you add the 'X' to the left hand string above.
So I was surprised when this does not hold for comparison between a space and a dash:
select 'a ' < 'a-';
t
select 'a X' < 'a-';
f
Is it a bug or is there any way around this; I want the last statement above to also return true.
[Edit after accepting @laurenz-albe's answer]
show lc_collate;
en_US.UTF-8
If I do sorting by unicode in Python I get similar results so it's not a bug in postgresql, but a bug in the unicode collation spec in my opinion!:
>>> import locale; locale.setlocale(locale.LC_ALL, "")
'en_US.UTF-8'
>>> l = ['ac', 'ab']; sorted(l) == sorted(l, cmp=locale.strcoll)
True
>>> l = ['ac', 'abX']; sorted(l) == sorted(l, cmp=locale.strcoll)
True
>>> l = ['ac', 'abX']; sorted(l) == sorted(l, cmp=locale.strcoll)
True
>>> l = ['a-', 'a X']; sorted(l) == sorted(l, cmp=locale.strcoll)
False
>>> sorted(l), sorted(l, cmp=locale.strcoll)
(['a X', 'a-'], ['a-', 'a X'])
回答1:
Characters are compared using a collation. You can use the SQL command
SHOW lc_collate;
to see what your default collation is.
PostgreSQL uses the operating system's collations, so the result of a comparison will often depend on the operating system.
To force byte-wise ASCII comparison, use the C collation
test=> SELECT 'a X' COLLATE "C" < 'a-';
?column?
----------
t
(1 row)
or the byte-wise comparison operators
test=> SELECT 'a X' ~<~ 'a-';
?column?
----------
t
(1 row)
来源:https://stackoverflow.com/questions/37836582/strange-collation-with-postgresql