How to find direct children of element in lxml

后端 未结 3 1290
情话喂你
情话喂你 2021-01-14 09:32

I found an object with specific class:

THREAD = TREE.find_class(\'thread\')[0]

Now I want to get all

elements that a

相关标签:
3条回答
  • 2021-01-14 09:47

    You can try PARENT.getchildren()

    >>> root = etree.fromstring(xml)
    >>> root.xpath("//div[@class='thread']")[0].getchildren()
    [<Element p at 0x10b3110e0>, <Element p at 0x10b311ea8>]
    
    0 讨论(0)
  • 2021-01-14 09:48

    I'm not sure, but it seem that your problem is in HTML itself: note that there are couple Tag omission cases applicable for p nodes, so closing tags of paragraphs

    <div class='thread'>
        <p>first
            <p>second</p>
        </p>
    </div>
    

    simply ignored by parser and both nodes identified as siblings, but not parent and child, e.g.

    <div class='thread'>
        <p>first
        <p>second
    </div>
    

    So XPath //div[@class="thread"]/p will return you both paragraphs

    You can simply replace p tags with div tags and you'll see different behaviour:

    <div class='thread'>
        <div>first
            <div>second</div>
        </div>
    </div>
    

    Here //div[@class="thread"]/div will return first node only

    Please correct me if my assumption is incorrect

    0 讨论(0)
  • 2021-01-14 09:55

    Try this XPath expression:

    //p[parent::div[@class='thread']]
    

    Or in a complete Python expression:

    THREAD.xpath("//p[parent::div[@class='thread']]")
    

    The other (inverse) approach is this XPath expression:

    div[@class='thread']/child::p"
    

    which uses the direct child:: axis and only selects the direct child nodes.

    Summary:
    Which one of both expressions is faster depends on the XPath compiler. child:: is the default axis and is used if no other axis is given.


    FYI: XPath counting starts at 1 and not 0.
    So concerning your XML example, the following expression

    count(//div[@class='thread'][1]/child::p)
    

    does result in a value of 2 - the result of counting <p> <!-- 1 --> + <p><!-- 2 --></p>.

    0 讨论(0)
提交回复
热议问题