I want to parse some data from an xml file using SAX parser. My xml is as follows:
<categories>
<cat>Pies & past</cat>
<cat>Fruits</cat>
</categories>
In order to parse this data I extend DefaultHandler.
The output after parsing is:
cat 1 = Pies
cat 2 = &
cat 3 = past
cat 4 = Fruits
Why is this happening instead of getting:
cat 1 = Pies & past
cat 2 = Fruits
My guess is that you are treating each call to characters
as delivering the complete text for a cat
element. You should code your handler so that successive calls to characters
accumulate the text, and you only capture it on the endElement
event:
public class CatHandler extends DefaultHandler {
private StringBuilder chars = new StringBuilder();
public void startElement(String uri, String lName, String qName, Attributes a)
{
final String name = qName == null ? lName : qName;
if ("cat".equals(name)) {
chars.setLength(0);
} else . . .
}
public void endElement(String uri, String lName, String qName) {
final String name = qName == null ? lName : qName;
if ("cat".equals(name)) {
String catName = chars.toString();
// do something with cat name
} else . . .
}
public void characters(char[] ch, int start, int length) {
chars.append(ch, start, length);
}
The characters()
method doesn't have to return the complete text element. Rather you should collate the text available in each characters()
call, and concatenate these upon the corresponding endElement()
call.
From the doc:
The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks
(my emphasis)
来源:https://stackoverflow.com/questions/13336140/sax-parsing-and-special-characters