Getting question marks when inserting Hebrew characters into a MySQL table

☆樱花仙子☆ 提交于 2019-12-04 11:17:42
BalusC

You need to tell the JDBC driver to use UTF-8 encoding while decoding the characters representing the SQL query to bytes. You can do that by adding useUnicode=yes and characterEncoding=UTF-8 query parameters to the JDBC connection URL.

jdbc:mysql://localhost:3306/db_name?useUnicode=yes&characterEncoding=UTF-8

It will otherwise use the operating system platform default charset. The MySQL JDBC driver is itself well aware about the encoding used in both the client side (where the JDBC code runs) and the server side (where the DB table is). Any character which is not covered by the charset used by the DB table will be replaced by a question mark.

See also:

You're including your values directly into the SQL. That's always a bad idea. Use a PreparedStatement, parameterized SQL, and set the values as parameters. It may not fix the problem - but it's definitely the first thing to attempt, as you should be using parameterized SQL anyway. (Parameterized SQL avoids SQL injection attacks, separates code from data, and avoids unnecessary conversions.)

Next, you should work out exactly where the problem is really occurring:

  • Make sure that the value you're trying to insert is correct.
  • Check that the value you retrieve is correct.
  • Check what's in your web response using Wireshark - check the declared encoding and what's in the actual data

When checking the values, you should iterate over each character in the string and print out the value as a UTF-16 code unit (either use toCharArray() or use charAt() in a loop). Just printing the value to the console leaves too much chance of other problems.

EDIT: For a little context of why I wrote this as an answer:

  • In my experience, including string values as parameters rather than directly into SQL can sometimes avoid such issues (and is of course better for security reasons etc).
  • In my experience, diagnosing whether the problem is at the database side or the web side is also important. This diagnosis is best done via logging the exact UTF-16 code units being used, not just strings (as otherwise further encoding issues during logging or console output can occur).
  • In my experience, problems like this can easily occur at either insert or read code paths.

All of this is important as a way of moving the OP forward, not just in a comment-like request for more information.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!