Postgres upper function on turkish character does not return expected result

后端 未结 3 1010
醉话见心
醉话见心 2021-02-04 15:08

It looks like postgres upper/lower function does not handle select characters in Turkish character set.

select upper(\'Aaı\'), lower(\'Aaİ\') from          


        
3条回答
  •  一向
    一向 (楼主)
    2021-02-04 15:42

    Your problem is 100% Windows. (Or rather Microsoft Visual Studio, which PostgreSQL was built with, to be more precise.)

    For the record, SQL UPPER ends up calling Windows' LCMapStringW (via towupper via str_toupper) with almost all the right parameters (locale 1055 Turkish for a UTF-8-encoded, Turkish_Turkey database),

    but

    the Visual Studio Runtime (towupper) does not set the LCMAP_LINGUISTIC_CASING bit in LCMapStringW's dwMapFlags. (I can confirm that setting it does the trick.) This is not considered a bug at Microsoft; it is by design, and will probably not ever be "fixed" (oh the joys of legacy.)

    You have three ways out of this:

    • implement @Sorrow's wrapper solution (or write your own native function replacement (DLL).)
    • run your PostgreSQL instance on e.g. Ubuntu which exhibits the right behaviour for Turkic locales (@Sorrow confirmed that it works for him); this is probably the simplest and cleanest way out.
    • drop in a patched 32-bit MSVCR100.DLL in your PostgreSQL bin directory (but although UPPER and LOWER would work, other things such as collation may continue to fail -- again, at the Windows level. YMMV.)

    For completeness (and nostalgic fun) ONLY, here is the procedure to patch a Windows system (but remember, unless you'll be managing this PostgreSQL instance from cradle to grave you may cause a lot of grief to your successor(s); whenever deploying a new test or backup system from scratch you or your successor(s) would have to remember to apply the patch again -- and if let's say you one day upgrade to PostgreSQL 10, which say uses MSVCR120.DLL instead of MSVCR100.DLL, then you'll have to try your luck with patching the new DLL, too.) On a test system

    • use HxD to open C:\WINDOWS\SYSTEM32\MSVCR100.DLL
    • save the DLL right away with the same name under you PostgreSQL bin directory (do not attempt to copy the file using Explorer or the command line, they might copy the 64bit version)
    • with the file still open in HxD, go to Search > Replace, pick Datatype: Hexvalues, then
      • search for...... 4E 14 33 DB 3B CB 0F 84 41 12 00 00 B8 00 01 00 00
      • replace with... 4E 14 33 DB 3B CB 0F 84 41 12 00 00 B8 00 01 00 01
      • ...then once more...
      • search for...... FC 51 6A 01 8D 4D 08 51 68 00 02 00 00 50 E8 E2
      • replace with... FC 51 6A 01 8D 4D 08 51 68 00 02 00 01 50 E8 E2
    • ...and re-save under the PostgreSQL bin directory, then restart PostgreSQL and re-run your query.
      • if your query still does not work (make sure your database is UTF-8 encoded with Turkish_Turkey for both LC_CTYPE and LC_COLLATE) open postgres.exe in 32-bit Dependency Walker and make sure it indicates it loads MSVCR100.DLL from the PostgreSQL bin directory.
      • if all functions well copy the patched DLL to the production PostgreSQL bin directory and restart.

    BUT REMEMBER, the moment you move the data off the Ubuntu system or off the patched Windows system to an unpatched Windows system you will have the problem again, and you may be unable to import this data back on Ubuntu if the Windows instance introduced duplicates in a citext field or in a UPPER/LOWER-based function index.

提交回复
热议问题