invalid byte 2 of 2-byte UTF-8 sequence

前端未结

关注

 7  1875

萌比男神i

I am trying to parse an XML file with but ran into an error message invalid byte 2 of 2-byte UTF-8 sequence

相关标签:

7条回答

时光说笑

2020-12-14 16:47

Either the parser is set for UTF-8 even though the file is encoded otherwise, or the file is declared as using UTF-8 but it really doesn't.

0 讨论(0)
发布评论:

提交评论
- 加载中...

旧巷少年郎

2020-12-14 16:47

The switching of the encoding for the input might help in this case:

XMLEventReader eventReader =
                            inputFactory.createXMLEventReader(in, 
                                    "utf-8"
                                    //"windows-1251"
                            );

0 讨论(0)

一个人的身影

2020-12-14 16:49

Most commonly it's due to feeding ISO-8859-x (Latin-x, like Latin-1) but parser thinking it is getting UTF-8. Certain sequences of Latin-1 characters (two consecutive characters with accents or umlauts) form something that is invalid as UTF-8, and specifically such that based on first byte, second byte has unexpected high-order bits.

This can easily occur when some process dumps out XML using Latin-1, but either forgets to output XML declaration (in which case XML parser must default to UTF-8, as per XML specs), or claims it's UTF-8 even when it isn't.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-14 17:02

I had the same problem too when trying import my .xml file into my java tool. And I found a good solution for this: 1. Open the .xml file with Notepad++ then save the .xml file as .rtf file. Then open this file in WordPad application. 2. Save the .rtf file as .txt file, then open it with Notepad, and save it as .xml file again. When saving in Notepad, near the end of the pop-up window, make sure choosing the option "Encoding: UTF-8". It worked for mine, hope it's useful for yours too.

0 讨论(0)
发布评论:

提交评论
- 加载中...
佛祖请我去吃肉

2020-12-14 17:05
For those who still get such mistake.

since UTF-8 is being used check out your xml document for any latin letters or so: I had the same problem and the reason was i had this:
```
<n:name>Åke Jógvan Øyvind</n:name>
```
Hope this helps
0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2020-12-14 17:10

You could try to change default character encoding used by String.getBytes() to utf-8. Use VM option -Dfile.encoding=utf-8.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页