问题
I'm using docx4j to convert .docx files into html, then saving that data into a MySQL database. Unfortunately, we've hit a snag. When we convert a doc that includes any characters encoded in utf8mb4, and then try and submit that data to our MySQL server, we're hit with a Generic JDBC Exception which states that it doesn't know how to parse the utf8mb4 characters.
ERROR pool-3-thread-20 org.hibernate.util.JDBCExceptionReporter - Incorrect string value: '\xEF\xBF\xBD???...' for column 'u_content' at row 1
I don't have the 'clearance' to move our MySQL server up to 5.5, so that fix is out.
In Java, can I somehow convert utf-8mb4 back to utf-8 and just convert all utf-8mb4 characters to � or something?
回答1:
You should remove bad characters first then persist your content to database. This will help you:
public static String removeBadChars(String s) {
if (s == null) return null;
StringBuilder sb = new StringBuilder();
for(int i = 0 ; i < s.length() ; i++){
if (Character.isHighSurrogate(s.charAt(i))) continue;
sb.append(s.charAt(i));
}
return sb.toString();
}
来源:https://stackoverflow.com/questions/28488939/how-would-i-convert-utf-8mb4-to-utf-8