org.xml.sax.SAXParseException: The reference to entity “T” must end with the ';' delimiter

前端未结

关注

 9  1613

I am trying to parse an XML file whcih contains some special characters like \"&\" using DOM parser. I am getting the saxparse exception \"the reference to entity must e

相关标签:

9条回答

别跟我提以往

2020-12-29 07:59
I'm not sure I understand the question. As far as I'm aware, unless you're inside a CDATA, naked & characters without a closing ; are invalid.

If that's not the case for your XML file, then it's invalid, and you'll need to find another way of parsing it, or fixing it before SAX gets a hold of it.

If I'm misunderstanding something here, you should probably post a sample of the actual XML so we can hep further.

Update:

It looks like:
```
Figure ActualText="&T "
```
is the offending line. Is this section within a CDATA or not? If not, this is not valid XML and you should not expect SAX to be able to handle it.

You'll need to either:
- change the application that created it; or
- fix it before it's loaded by SAX (if you can't change that application) to something like "Figure ActualText="&T ""; or
- find a non-SAX method for parsing.
0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2020-12-29 07:59
It will work if you use below command before publishing.

please put your xml file name in below command
```
sed -i "s/&/;/g" *.xml
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-29 08:01
As others have stated, your XML is definitely invalid. However, if you can't change the generating application and can add a cleaning step then the following should clean up the XML:
```
String clean = xml.replaceAll( "&([^;]+(?!(?:\\w|;)))", "&amp;$1" );
```
What that regex is doing is looking for any badly formed entity references and escaping the ampersand.

Specifically, (?!(?:\\w|;)) is a negative look-ahead that makes that match stop at anything that is not a word character (a-z,0-9) and not a semi-colon. So the whole regex grabs everything from the & that is not a ; up until the first non-word, non-semi-colon character.

It puts everything except the ampersand in the first capture group so that it can be referred to in the replace string. That's the $1.

Note that this won't fix references that look like they are valid but aren't. For example, if you had &T; that would throw a different kind of error altogether unless the XML actually defines the entity.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2