发表新帖

发表新帖

How to find direct children of element in lxml

后端未结

关注

 3  1286

走了就别回头了

I found an object with specific class:

THREAD = TREE.find_class(\'thread\')[0]

Now I want to get all

elements that a

相关标签:

3条回答

梦毁少年i

2021-01-14 09:46
Try this XPath expression:
```
//p[parent::div[@class='thread']]
```
Or in a complete Python expression:
```
THREAD.xpath("//p[parent::div[@class='thread']]")
```
The other (inverse) approach is this XPath expression:
```
div[@class='thread']/child::p"
```
which uses the direct child:: axis and only selects the direct child nodes.

Summary:
Which one of both expressions is faster depends on the XPath compiler. child:: is the default axis and is used if no other axis is given.

FYI: XPath counting starts at 1 and not 0.
So concerning your XML example, the following expression
```
count(//div[@class='thread'][1]/child::p)
```
does result in a value of 2 - the result of counting <p>  + <p></p>.
0 讨论(0)
发布评论:

提交评论
- 加载中...

面向向阳花

2021-01-14 09:54

You can try PARENT.getchildren()

>>> root = etree.fromstring(xml)
>>> root.xpath("//div[@class='thread']")[0].getchildren()
[<Element p at 0x10b3110e0>, <Element p at 0x10b311ea8>]

0 讨论(0)

隐瞒了意图╮

2021-01-14 10:01
I'm not sure, but it seem that your problem is in HTML itself: note that there are couple Tag omission cases applicable for p nodes, so closing tags of paragraphs
```
<div class='thread'>
    <p>first
        <p>second</p>
    </p>
</div>
```
simply ignored by parser and both nodes identified as siblings, but not parent and child, e.g.
```
<div class='thread'>
    <p>first
    <p>second
</div>
```
So XPath //div[@class="thread"]/p will return you both paragraphs

You can simply replace p tags with div tags and you'll see different behaviour:
```
<div class='thread'>
    <div>first
        <div>second</div>
    </div>
</div>
```
Here //div[@class="thread"]/div will return first node only

Please correct me if my assumption is incorrect
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题