I have a sample file below which can be parsed using C# code, but when using xerces c it cannot parse due to invalid characters. It can be parsed correctly only after removing ②