I found an object with specific class:
THREAD = TREE.find_class(\'thread\')[0]
Now I want to get all elements that a
You can try PARENT.getchildren()
>>> root = etree.fromstring(xml)
>>> root.xpath("//div[@class='thread']")[0].getchildren()
[<Element p at 0x10b3110e0>, <Element p at 0x10b311ea8>]
I'm not sure, but it seem that your problem is in HTML itself: note that there are couple Tag omission cases applicable for p nodes, so closing tags of paragraphs
<div class='thread'>
<p>first
<p>second</p>
</p>
</div>
simply ignored by parser and both nodes identified as siblings, but not parent and child, e.g.
<div class='thread'>
<p>first
<p>second
</div>
So XPath //div[@class="thread"]/p
will return you both paragraphs
You can simply replace p
tags with div
tags and you'll see different behaviour:
<div class='thread'>
<div>first
<div>second</div>
</div>
</div>
Here //div[@class="thread"]/div
will return first node only
Please correct me if my assumption is incorrect
Try this XPath expression:
//p[parent::div[@class='thread']]
Or in a complete Python expression:
THREAD.xpath("//p[parent::div[@class='thread']]")
The other (inverse) approach is this XPath expression:
div[@class='thread']/child::p"
which uses the direct child::
axis and only selects the direct child nodes.
Summary:
Which one of both expressions is faster depends on the XPath compiler. child::
is the default axis and is used if no other axis is given.
FYI: XPath counting starts at 1 and not 0.
So concerning your XML example, the following expression
count(//div[@class='thread'][1]/child::p)
does result in a value of 2 - the result of counting <p> <!-- 1 -->
+ <p><!-- 2 --></p>
.