How to remove empty XML tags, containing whitespace only, in XML?

问题

I need to remove cases like this:

<text> </text>

I have codes that works when there is no whitespace, but what about if there is whitespace?

Code:

doc = etree.XML("""<root><a>1</a><b><c></c></b><d></d></root>""")

def remove_empty_elements(doc):
  for element in doc.xpath('//*[not(node())]'):
    element.getparent().remove(element)

I also need to do it with lxml and not BeautifulSoup.

回答1:

This XPath,

//*[not(*)][not(normalize-space())]

will select all leaf elements with only whitespace content.

For your example specifically,

<root><a>1</a><b><c></c></b><d></d></root>

these elements will be selected: c and d.

For an example that also includes whitespace-only elements,

<root>
  <a>1</a>
  <b>
    <c></c>
  </b>
  <d/>
  <e>     </e>
  <f>
  </f>
</root>

these elements will be selected: c, d, e, and f.

来源：https://stackoverflow.com/questions/61987189/how-to-remove-empty-xml-tags-containing-whitespace-only-in-xml

标签

python

python-3.x

xml

lxml

elementtree

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!