问题
Hey everyone I am trying to parse this part of an XML file I have. The problem I am encountering is that the text contains a lot of self-closing tags. I can't remove those tags because they are providing me with some indexing detail. How can I get access to the text without all the "Node" tags?
<TextWithNodes>
<Node id="0"/>A TEENAGER <Node
id="11"/>yesterday<Node id="20"/> accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2<Node id="146"/>.<Node
id="147"/>
</TextWithNodes>
回答1:
Here is some sample code using the idea of using XPATH in Java in answer https://stackoverflow.com/a/49926918/2735286 (credits to @kjhughes):
public static void main(String[] args) throws IOException, ParserConfigurationException, SAXException, XPathExpressionException {
String text = "<TextWithNodes>\n" +
" <Node id=\"0\"/>A TEENAGER <Node\n" +
"id=\"11\"/>yesterday<Node id=\"20\"/> accused his parents of cruelty\n" +
"by feeding him a daily diet of chips which sent his weight\n" +
"ballooning to 22st at the age of l2<Node id=\"146\"/>.<Node\n" +
"id=\"147\"/>\n" +
"</TextWithNodes>";
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(new ByteArrayInputStream(text.getBytes("UTF-8")));
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "//TextWithNodes";
System.out.println(xPath.compile(expression).evaluate(xmlDocument, XPathConstants.STRING));
}
This prints out:
A TEENAGER yesterday accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2.
回答2:
Although odd, this XML is actually well-formed and can be parsed with normal XML tools. The TextWithNodes
element simply has mixed content.
The string-value of the TextWithNodes
can be obtained via a simple XPath,
string(/TextWithNodes)
yielding the text you want, without the other markup (self-closing or otherwise):
A TEENAGER yesterday accused his parents of cruelty
by feeding him a daily diet of chips which sent his weight
ballooning to 22st at the age of l2.
回答3:
Use an XML parser library, like Jsoup. https://jsoup.org/
A how to is provided in the answer to this question: How to parse XML with jsoup
来源:https://stackoverflow.com/questions/49926799/parse-xml-self-closing-tags-with-text