How to remove non-valid unicode characters from strings in java

前端 未结 4 2090
感情败类
感情败类 2021-02-15 17:18

I am using the CoreNLP Neural Network Dependency Parser to parse some social media content. Unfortunately, the file contains characters which are, according to fileformat.info,

4条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-02-15 17:34

    Just as You have a String as

    String xml = "...."; xml = xml.replaceAll("[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]", "");

    This will Solve your problem

提交回复
热议问题