问题
Provided I have a XML as follows:
<node1>
<text title='book'>
<div chapter='0'>
<div id='theNode'>
<p xml:id="40">
A House that has:
<p xml:id="45">- a window;</p>
<p xml:id="46">- a door</p>
<p xml:id="46">- a door</p>
its a beuatiful house
</p>
</div>
</div>
</text>
</node1>
I would like to locate text title and get all the text from the first p tag appearing inside the text title book node
so far I know:
from lxml import etree
XML_tree = etree.fromstring(XML_content,parser=parser)
text = XML_tree.xpath('//text[@title="book"]/div/div/p/text()')
gets: "A house that has is a beautiful house"
But I would like also all the text of all the possible children and great children of the first
appearing under
basically; look for then look for the first
and give me all the text under that p tag whatever the nesting.
pseudo code:
text = XML_tree.xpath('//text[@title="book"]/... any number of nodes.../p/ ....all text under p')
Thanks.
回答1:
Try using either string() or normalize-space()...
from lxml import etree
XML_content = """
<node1>
<text title='book'>
<div chapter='0'>
<div id='theNode'>
<p xml:id="x40">
A House that has:
<p xml:id="x45">- a window;</p>
<p xml:id="x46">- a door</p>
<p xml:id="x47">- a door</p>
its a beuatiful house
</p>
</div>
</div>
</text>
</node1>
"""
XML_tree = etree.fromstring(XML_content)
text = XML_tree.xpath('string(//text[@title="book"]/div/div/p)')
# text = XML_tree.xpath('normalize-space(//text[@title="book"]/div/div/p)')
print(text)
Output using string()
...
A House that has:
- a window;
- a door
- a door
its a beuatiful house
Output using normalize-space()
...
A House that has: - a window; - a door - a door its a beuatiful house
回答2:
Another option :
XML_tree = etree.fromstring(XML_content)
text = [el.strip() for el in XML_tree.xpath('//text()[ancestor::text[@title="book"]][normalize-space()]')]
print(" ".join(text))
print("\n".join(text))
Output :
A House that has: - a window; - a door - a door its a beuatiful house
A House that has:
- a window;
- a door
- a door
its a beuatiful house
来源:https://stackoverflow.com/questions/62472162/lxml-xpath-expression-for-selecting-all-text-under-a-given-child-node-including