How to replace/remove 4(+)-byte characters from a UTF-8 string in Java?

前端 未结 3 930
无人及你
无人及你 2021-02-05 08:02

Because MySQL 5.1 does not support 4 byte UTF-8 sequences, I need to replace/drop the 4 byte sequences in these strings.

I\'m looking a clean way to replace these charac

3条回答
  •  天涯浪人
    2021-02-05 08:13

    Another simple solution is to use regular expression [^\u0000-\uFFFF]. For example in java:

    text.replaceAll("[^\\u0000-\\uFFFF]", "\uFFFD");
    

提交回复
热议问题