Incorrect sort/collation/order with spaces in Postgresql 9.4

后端 未结 1 1026
北海茫月
北海茫月 2020-12-11 19:05

I\'m using Postgresql 9.4.5. When I go to psql and run \\l I get

Encoding is UTF8
Collate is en_US.UTF-8 
cCtype is en_US.UTF-8
相关标签:
1条回答
  • 2020-12-11 19:52

    On Unix/Linux SE, a friendly expert explained that what you see is the proper way to sort Unicode. Basically, the standard is trying to sort:

    di Silva Fred                  di Silva Fred
    di Silva John                  diSilva Fred
    diSilva Fred                   disílva Fred
    diSilva John         ->        di Silva John
    disílva Fred                   diSilva John
    disílva John                   disílva John
    

    Now if spaces were as important as letters, the sort could not separate the various identical spellings of Fred and John. So what happens is that it first sorts without spaces. Then in a second pass, strings that are the same without whitespace are sorted. (This is a simplification, the real algorithm looks fairly complex, assigning whitespace, accents and non-printable characters various levels of precedence.)

    You can bypass the Unicode collation by setting:

    export LC_ALL=C
    

    Or in Postgres by casting to byte array for sorting:

    order by name::bytea
    

    Or (from Kiln's answer) by specifying the C collation:

    order by name collate "C"
    

    Or by altering the default collation for the column:

    alter table products alter column name type text collate "C";
    
    0 讨论(0)
提交回复
热议问题