问题
I need to remove cases like this:
<text> </text>
I have codes that works when there is no whitespace, but what about if there is whitespace?
Code:
doc = etree.XML("""<root><a>1</a><b><c></c></b><d></d></root>""")
def remove_empty_elements(doc):
for element in doc.xpath('//*[not(node())]'):
element.getparent().remove(element)
I also need to do it with lxml and not BeautifulSoup.
回答1:
This XPath,
//*[not(*)][not(normalize-space())]
will select all leaf elements with only whitespace content.
For your example specifically,
<root><a>1</a><b><c></c></b><d></d></root>
these elements will be selected: c
and d
.
For an example that also includes whitespace-only elements,
<root>
<a>1</a>
<b>
<c></c>
</b>
<d/>
<e> </e>
<f>
</f>
</root>
these elements will be selected: c
, d
, e
, and f
.
来源:https://stackoverflow.com/questions/61987189/how-to-remove-empty-xml-tags-containing-whitespace-only-in-xml