Java cannot retrieve Unicode (Lithuanian) letters from Access via JDBC-ODBC

后端 未结 3 1324
长情又很酷
长情又很酷 2021-01-22 19:15

i have DB where some names are written with Lithuanian letters, but when I try to get them using java it ignores Lithuanian letters

    DbConnection();
    zadan         


        
3条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-01-22 19:59

    Now that the JDBC-ODBC Bridge has been removed from Java 8 this particular question will increasingly become just an item of historical interest, but for the record:

    The JDBC-ODBC Bridge has never worked correctly with the Access ODBC Drivers ("Jet" and "ACE") for Unicode characters above code point U+00FF. That is because Access stores such characters as Unicode but it does not use UTF-8 encoding. Instead, it uses a "compressed" variation of UTF-16LE where characters with code points U+00FF and below are stored as a single byte, while characters above U+00FF are stored as a null byte followed by their UTF-16LE byte pair(s).

    If the string 'Imonė' is stored within the Access database so that it appears properly in Access itself

    accessEncoded.png

    then it is stored as

    I  m  o  n  ė
    -- -- -- -- --------
    49 6D 6F 6E 00 17 01
    

    ('ė' is U+0117).

    The JDBC-ODBC Bridge does not understand what it receives from the Access ODBC driver for that final character, so it just returns

    Imon?
    

    On the other hand, if we try to store the string in the Access database with UTF-8 encoding, as would happen if the JDBC-ODBC Bridge attempted to insert the string itself

    Statement s = con.createStatement();
    s.executeUpdate("UPDATE vocabulary SET word='Imonė' WHERE ID=5");
    

    the string would be UTF-8 encoded as

    I  m  o  n  ė
    -- -- -- -- -----
    49 6D 6F 6E C4 97
    

    and then the Access ODBC Driver will store it in the database as

    I  m  o  n  Ä  —
    -- -- -- -- -- ---------
    49 6D 6F 6E C4 00 14 20
    
    • C4 is 'Ä' in Windows-1252 which is U+00C4 so it is stored as just C4
    • 97 is "em dash" in Windows-1252 which is U+2014 so it is stored as 00 14 20

    Now the JDBC-ODBC Bridge can retrieve it okay (since the Access ODBC Driver "un-mangles" the character back to C4 97 on the way out), but if we open the database in Access we see

    ImonÄ—
    

    utf8Encoded.png

    The JDBC-ODBC Bridge has never and will never be able to provide full native Unicode support for Access databases. Adding various properties to the JDBC connection will not solve the problem.

    For full Unicode character support of Access databases without ODBC, consider using UCanAccess instead. (More details available in another question here.)

提交回复
热议问题