问题
Right now I'm using SAXParser with my own handler, it can parse all node values except for the one that has type="html"
My characters function is like this:
public void characters(char ch[], int start, int length) throws SAXException {
if(content){
String tmp = new String(ch, start, length);
System.out.println("Content : " + tmp);
content = false;
}
And that particular node has the following format, which my output always just give me a bunch of \n and nothing else.
<content type="html">
<img alt="" src="http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png" />
<p>Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (<i>Lost</i>, <i>Fringe</i>, <i>Star Trek: Into Darkness</i>, <i>Alias</i>,&nbsp;etc.), has released a&nbsp;<a href="http://youtu.be/FWaAZCaQXdo" target="_blank">mysterious new trailer</a> titled "Stranger." The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. "Men are erased and reborn," intones a narrator that sounds a little like Leonard Nimoy.</p>
<p></p>
</content>
回答1:
You might be wrongfully assuming that the characters
callback occurs only once in between startElement
and endElement
callbacks. It is actually called multiple times.
Since you use the content
boolean member to determine whether to print stuff or not and also set this same member to false
inside characters
callback, your condition is bound to be fulfilled only once, until you reset content
(it is not clear where you do that).
Here's an example that works with your XML just fine (assumes non-mixed content and Java programming language):
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class TestSaxParser {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
String xml =
"<content type=\"html\">\n" +
"\n" +
" <img alt=\"\" src=\"http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png\" />\n" +
"\n" +
"\n" +
" <p>Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (<i>Lost</i>, <i>Fringe</i>, <i>Star Trek: Into Darkness</i>, <i>Alias</i>,&nbsp;etc.), has released a&nbsp;<a href=\"http://youtu.be/FWaAZCaQXdo\" target=\"_blank\">mysterious new trailer</a> titled \"Stranger.\" The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. \"Men are erased and reborn,\" intones a narrator that sounds a little like Leonard Nimoy.</p>\n" +
" <p></p>\n" +
"\n" +
"\n" +
"\n" +
" </content>";
MySaxHandler handler = new MySaxHandler();
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
InputSource source = new InputSource(new StringReader(xml));
parser.parse(source, handler);
}
private static class MySaxHandler extends DefaultHandler {
private StringBuilder content = new StringBuilder();
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
content.setLength(0);
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
content.append(ch, start, length);
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println(content.toString());
}
}
}
Output:
<img alt="" src="http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png" />
<p>Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (<i>Lost</i>, <i>Fringe</i>, <i>Star Trek: Into Darkness</i>, <i>Alias</i>, etc.), has released a <a href="http://youtu.be/FWaAZCaQXdo" target="_blank">mysterious new trailer</a> titled "Stranger." The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. "Men are erased and reborn," intones a narrator that sounds a little like Leonard Nimoy.</p>
<p></p>
回答2:
You should use StringBuffer
to store content as it's described in these topics:
SAX parsing and special characters
Unable to read special characters from xml using java
来源:https://stackoverflow.com/questions/18323497/java-parse-xml-file-when-node-inner-text-is-html