Debugging postgresql for where 'A' < 'a'

In a simple comparison test in postgres 9.1 and 8.4 is get the following weird results.

postgres=# select 1 one where 'A' < 'a';
 one 
-----
(0 rows)    // ..... I would have expected 1 row

postgres=# select 1 one where 'A' < 'b';
 one 
-----
   1
(1 row)    // ...... this looks OK

postgres=# select 1 one where 'A' = 'a';
 one 
-----
(0 rows)   // ...... This also looks OK

postgres=# select 1 one where 'A' > 'a';
 one 
-----
   1
(1 row)    // ...... This is inconsistent with the above results

The ascii value of 'A' is 0x41 and 'a' is 0x61 so a straight comparison of ascii values should mean that 'A' is smaller than 'a', or if some case insentive magic then at least A>b and Alocale problem, but then again -- however my local is set to a standard us_EN.utf8 setting using a standard Centos5 and Fedora16 installations with same results.

Attaching a debugger to the postgres process, I have been able to track down that the problem comes from that;

strcoll("A","a") returns 6;

where

strcoll("A","b") returns -1;

However this can only be demonstrated from inside the postgres process (such as when attaching gdb), and an external program like the one below gives perfectly reasonable results.

main()
{
    char *a="a";
    char *b="b";
    char *A="A";

    printf("%s\n",setlocale(2,"us_ENG.utf8"));

    printf("%d\n",strcoll(A,a));
    printf("%d\n",strcoll(A,b));
    printf("%d\n",strcoll(a,a));
    printf("%d\n",strcoll(b,b));

    printf("%d\n",strcoll(a,A));
    printf("%d\n",strcoll(b,A));
    printf("%d\n",strcoll(b,a));
    printf("%d\n",strcoll(A,A));
}

Question is: does anybody have any idea as to what would cause strcoll to return bad values, and any suggestion as how to fix it so my sample SQL will work correctly.

Update: I tried to recreate the database as initdb --locale=C, and the 'A'<'a' give expected results there -- however that does not explain why this fails in a database created as UTF-8.

Ordering depends on your database locale, not system locale. (Though it should be noted that PostgreSQL relies on the OS to provide specifics. More in the Postgres Wiki.)
The ASCII value is only relevant with the non-locale "C".

Take a look at your current settings:

SELECT * FROM pg_settings WHERE name ~~ 'lc%';

In particular, the setting for LC_COLLATE is relevant. You can also:

SHOW lc_collate;

In PostgreSQL 9.1 you can change the applicable collation per statement. Try:

SELECT 1 AS one WHERE 'A' < 'a' COLLATE "C";

In older versions you are (mostly) stuck with the value for LC_COLLATE that you chose when creating your database cluster.

来源：https://stackoverflow.com/questions/9424033/debugging-postgresql-for-where-a-a

标签

Linux

debugging

postgresql

libc