I have a PostgreSQL 8.4 database that is created with the da_DK.utf8 locale.
dbname=> show lc_collate;
lc_collate
------------
da_DK.utf8
(1 row)
When I select something from a table where I order on a character varying column I get a strange behaviour IMO. When ordering the result PostgreSQL ignores dashes that prefixes the value, e.g.:
select name from mytable order by name asc;
May return something like
name
----------------
Ad...
Ae...
Ag...
- Ak....
At....
The dash prefix seems to be ignored.
I can fix this issue by converting the column to latin1 when ordering:
select name from mytable order by convert_to(name, 'latin1') asc;
The I get the expected result as:
name
----------------
- Ak....
Ad...
Ae...
Ag...
At....
Why does the dash prefix get ignored by default? Can that behavior be changed?
This is because da_DK.utf8
locale defines it this way. Linux locale aware utilities, for example sort
will also work like this.
Your convert_to(name, 'latin1')
will break if it finds a character which is not in Latin 1 character set, for example €
, so it isn't a good workaround.
You can use order by convert_to(name, 'SQL_ASCII')
, which will ignore locale defined sort and simply use byte values.
Ugly hack edit:
order by
(
ascii(name) between ascii('a') and ascii('z')
or ascii(name) between ascii('A') and ascii('Z')
or ascii(name)>127
),
name;
This will sort first anything which starts with ASCII non-letter. This is very ugly, because sorting further in string would behave strange, but it can be good enough for you.
A workaround that will work in my specific case is to replace dashes with exclamation points. I happen to know that I will never get exclamation points and it will be sorted before any letters or digits.
select name from mytable order by translate(name, '-', '!') asc
It will certainly affect performance so I may look into creating a special column for sorting but I really don't like that either...
I don't know how seems ordering rules for Dutch, but for Polish special characters like space, dashes etc are not "counted" in sorting in most dictionaries. Some good sort routines do the same and ignores such special characters. Probably in Dutch there is similar rule, and this rule is implemented by Ubuntu locale aware sort function.
来源:https://stackoverflow.com/questions/4955386/postgresql-ignores-dashes-when-ordering