问题
I want to get the whole text of an Element to parse some xhtml:
<div id='asd'>
<pre>skdsk</pre>
</div>
begin E = div element on the above example, I want to get
<pre>skdsk</pre>
How?
回答1:
Strictly speaking:
from xml.dom.minidom import parse, parseString
tree = parseString("<div id='asd'><pre>skdsk</pre></div>")
root = tree.firstChild
node = root.childNodes[0]
print node.toxml()
In practice, though, I'd recommend looking at the http://www.crummy.com/software/BeautifulSoup/ library. Finding the right childNode in an xhtml document, and skipping "whitespace nodes" is a pain. BeautifulSoup is a robust html/xhtml parser with fantastic tree-search capacilities.
Edit: The example above compresses the HTML into one string. If you use the HTML as in the question, the line breaks and so-forth will generate "whitespace" nodes, so the node you want won't be at childNodes[0].
来源:https://stackoverflow.com/questions/666724/how-to-get-whole-text-of-an-element-in-xml-minidom