I have an XML document that\'s being generated from some content that people are copy/pasting from all sorts of places (Word documents mostly though).
It looks like
Preprocess the original data, encoding Unicode characters not supported by XML documents yourself. for example, use HTML character encodings:
You'll have to post-process the data when read back in to convert the HTML encoding back to the correct Unicode character.