I am using the CoreNLP Neural Network Dependency Parser to parse some social media content. Unfortunately, the file contains characters which are, according to fileformat.info,
Just as You have a String as
String xml = "...."; xml = xml.replaceAll("[^\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]", "");
This will Solve your problem