How to make SAXParser ignore escape codes

后端 未结 4 664
没有蜡笔的小新
没有蜡笔的小新 2021-01-22 22:06

I am writing a Java program to read and XML file, actually an iTunes library which is XML plist format. I have managed to get round most obstacles that this format throws up exc

4条回答
  •  隐瞒了意图╮
    2021-01-22 22:18

    Do you have an excerpt for us? Is the file itunes-generated? If so, it sounds like a bug in iTunes to me, that forgot to encode the ampersand correctly. I would not be surprised: they clearly didn't get XML in the first place, their schema of [key][value] must make the XML inventors puke.

    You might want to use a different, more robust, parser. SAX is great as long as the file is well-formed. I do however not know how robust dom4j and jdom are. Just give them a try. For python, I know that I would recomment ElementTree or BeautifulSoup which are very robust.

    Also have a look at http://code.google.com/p/xmlwise/ which I found mentioned here in stackoverflow (did you use search?).

    Update: (as per updated question) You need to understand the role of entities in XML and thus SAX. They by default a separate nodes, just like text nodes. So you will likely need to join them with adjacent text nodes to get the full value. Do you use a DTD in your parser? Using a proper DTD - with entity definitions - can help parsing a lot, as it can contain mappings from entities such as & to the characters they represent &, and the parser may be able to do the merging for you. (At least the python XML-pull parser I like to use for large files does when materializing subtrees.)

提交回复
热议问题