How would I convert UTF-8mb4 to UTF-8?

依然范特西╮ 提交于 2020-01-15 09:09:31

问题


I'm using docx4j to convert .docx files into html, then saving that data into a MySQL database. Unfortunately, we've hit a snag. When we convert a doc that includes any characters encoded in utf8mb4, and then try and submit that data to our MySQL server, we're hit with a Generic JDBC Exception which states that it doesn't know how to parse the utf8mb4 characters.

ERROR pool-3-thread-20 org.hibernate.util.JDBCExceptionReporter - Incorrect string value: '\xEF\xBF\xBD???...' for column 'u_content' at row 1

I don't have the 'clearance' to move our MySQL server up to 5.5, so that fix is out.

In Java, can I somehow convert utf-8mb4 back to utf-8 and just convert all utf-8mb4 characters to � or something?


回答1:


You should remove bad characters first then persist your content to database. This will help you:

public static String removeBadChars(String s) {
  if (s == null) return null;
  StringBuilder sb = new StringBuilder();
  for(int i = 0 ; i < s.length() ; i++){ 
    if (Character.isHighSurrogate(s.charAt(i))) continue;
    sb.append(s.charAt(i));
  }
  return sb.toString();
}


来源:https://stackoverflow.com/questions/28488939/how-would-i-convert-utf-8mb4-to-utf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!