How to prevent System.Xml.XmlException: Invalid character in the given encoding

前端未结

关注

 4  1771

I have a Windows desktop app written in C# that loops through a bunch of XML files stored on disk and created by a 3rd party program. Most all the files are loaded and proce

相关标签:

4条回答

我寻月下人不归

2020-11-30 11:40
The referenced file contains a character that is valid for a filename, but invalid in an XML attribute. You have a few options.
1. You could change the filename and rerun your third-party script.
2. You could work with the vendor to provide a patch that safely encodes the offending characters.
3. You could pre-validate the XML documents and remove the offending entries prior to processing.
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光说笑

2020-11-30 11:44

Because XmlDocument loads the entire thing as soon as it runs into an unencoded character it aborts the entire process. If you want to process what you can and skip/log duff bits, look at XmlTextReader. XmlTextReader loaded from a Filestream will load a node at a time, so it will also use a lot less memory. You could even get clever and split the thing up and parallelise the processing.

When I've had this it's been things like accented characters in there: grave, acutes, umlauts, and such.

I don't have any automated processes, so usually I just load the file in Visual Studio and edited the bad guys out until there are no squigglies left. The theory is sound though.

0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2020-11-30 12:02
In order to control the encoding (once you know what it is), you can load the files using the Load method override that accepts a Stream.

Then you can create a new StreamReader against your file specifying the appropriate Encoding in the constructor.

For example, to open the file using Western European encoding, replace the following line of code in the question:
```
XDocument xmlDoc = XDocument.Load(inFileName);
```
with this code:
```
XDocument xmlDoc = null;

using (StreamReader oReader = new StreamReader(inFileName, Encoding.GetEncoding("ISO-8859-1"))) {
    xmlDoc = XDocument.Load(oReader);
}
```
The list of supported encodings can be found in the MSDN documentation.
0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-11-30 12:03

Not sure if this is your case, but this can be related to invalid byte sequences for a given encoding. Example: http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences.

Try filtering invalid sequences from the file while loading.

0 讨论(0)
发布评论:

提交评论
- 加载中...