XML spec defines a subset of Unicode characters which are allowed in XML documents: http://www.w3.org/TR/REC-xml/#charsets.
How do I filter out these characters from
It's not trivial to find out all the invalid chars for XML. You need to call or reimplement the XMLChar.isInvalid() from Xerces,
http://kickjava.com/src/org/apache/xerces/util/XMLChar.java.htm