问题
While reading an XML file using StAX and XMLStreamReader, I encountered a weird problem. Not sure if its an error or I am doing something wrong. Still learning StAX.
So the problem is,
- In
XMLStreamConstants.CHARACTERS
event, when I collect node text asXMLStreamReader.getText()
method. - If there is &, <, > or even something hidden for instance in node text, it returns only the first part of the text string.
e.g.
ABC & XYZ
returns onlyABC
Simplified Java Source:
// Start StaX reader
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
try {
XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(inStream);
int event = xmlStreamReader.getEventType();
while (true) {
switch (event) {
case XMLStreamConstants.START_ELEMENT:
switch (xmlStreamReader.getLocalName()) {
case "group":
// Do something
break;
case "source":
isSource = true;
break;
case "target":
isTarget = true;
break;
default:
isSource = false;
isTrans = false;
break;
}
break;
case XMLStreamConstants.CHARACTERS:
if (srcData != null) {
String srcTrns = xmlStreamReader.getText();
if (srcTrns != null) {
if (isSource) {
// Set source text
isSource = false;
} else if (isTrans) {
// Set target text
isTrans = false;
}
}
}
break;
case XMLStreamConstants.END_ELEMENT:
if (xmlStreamReader.getLocalName().equals("group")) {
// Add to return list
}
break;
}
if (!xmlStreamReader.hasNext()) {
break;
}
event = xmlStreamReader.next();
}
} catch (XMLStreamException ex) {
LOG.log(Level.WARNING, ex.getMessage(), MessageFormat.format("{0} {1}", ex.getCause(), ex.getLocation()));
}
I am not quite sure what exactly I am doing wrong or how to collect complete text of the node.
Any suggestions or tips would be a great help to move on learning StAX more. :-)
回答1:
I have solved the problem after struggling and researching a bit.
It was a problem reading text with escaped entity references. You need to set
XMLInputFactory IS_COALESCING
to true
XMLInputFactory.setProperty(XMLInputFactory.IS_COALESCING, true);
Basically this tells the parser to replace internal entity references with their respective replacement text (in other words, something like decoding) and read them as normal characters.
来源:https://stackoverflow.com/questions/22781292/problems-getting-xml-node-text-in-stax-xmlstreamconstants-characters-event