XPath to select between two HTML comments?

后端 未结 1 1545
轮回少年
轮回少年 2021-02-06 12:51

I have a big HTML page. But I want to select certain nodes using Xpath:


 ........

 
some text
&l
相关标签:
1条回答
  • 2021-02-06 13:08

    I would look for elements that are preceded by the first comment and followed by the second comment:

    doc.xpath("//*[preceding::comment()[. = ' begin content ']]
                  [following::comment()[. = ' end content ']]")
    #=> <div>some text</div>
    #=> <div>
    #=>   <p>Some more elements</p>
    #=> </div>
    #=> <p>Some more elements</p>
    

    Note that the above gives you each element in between. This means that if you iterate through each the returned nodes, you will get some duplicated nested nodes - eg the "Some more elements".

    I think you might actually want to just get the top-level nodes in between - ie the siblings of the comments. This can be done using the preceding/following-sibling instead.

    doc.xpath("//*[preceding-sibling::comment()[. = ' begin content ']]
                  [following-sibling::comment()[. = ' end content ']]")
    #=> <div>some text</div>
    #=> <div>
    #=>   <p>Some more elements</p>
    #=> </div>
    

    Update - Including comments

    Using //* only returns element nodes, which does not include comments (and some others). You could change * to node() to return everything.

    puts doc.xpath("//node()[preceding-sibling::comment()[. = 'begin content']]
                            [following-sibling::comment()[. = 'end content']]")
    #=> 
    #=> <!--keywords1: first_keyword-->
    #=> 
    #=> <div>html</div>
    #=> 
    

    If you just want element nodes and comments (ie not everything), you can use the self axis:

    doc.xpath("//node()[self::* or self::comment()]
                       [preceding-sibling::comment()[. = 'begin content']]
                       [following-sibling::comment()[. = 'end content']]")
    #~ #=> <!--keywords1: first_keyword-->
    #~ #=> <div>html</div>
    
    0 讨论(0)
提交回复
热议问题