How to get whole text of an Element in xml.minidom?

问题

I want to get the whole text of an Element to parse some xhtml:

<div id='asd'>
  <pre>skdsk</pre>
</div>

begin E = div element on the above example, I want to get

<pre>skdsk</pre>

How?

回答1:

Strictly speaking:

from xml.dom.minidom import parse, parseString
tree = parseString("<div id='asd'><pre>skdsk</pre></div>")
root = tree.firstChild
node = root.childNodes[0]
print node.toxml()

In practice, though, I'd recommend looking at the http://www.crummy.com/software/BeautifulSoup/ library. Finding the right childNode in an xhtml document, and skipping "whitespace nodes" is a pain. BeautifulSoup is a robust html/xhtml parser with fantastic tree-search capacilities.

Edit: The example above compresses the HTML into one string. If you use the HTML as in the question, the line breaks and so-forth will generate "whitespace" nodes, so the node you want won't be at childNodes[0].

来源：https://stackoverflow.com/questions/666724/how-to-get-whole-text-of-an-element-in-xml-minidom

标签

python

minidom

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!