XPath to select between two HTML comments?

后端 未结 1 1546
轮回少年
轮回少年 2021-02-06 12:51

I have a big HTML page. But I want to select certain nodes using Xpath:


 ........

 
some text
&l
1条回答
  •  遇见更好的自我
    2021-02-06 13:08

    I would look for elements that are preceded by the first comment and followed by the second comment:

    doc.xpath("//*[preceding::comment()[. = ' begin content ']]
                  [following::comment()[. = ' end content ']]")
    #=> 
    some text
    #=>
    #=>

    Some more elements

    #=>
    #=>

    Some more elements

    Note that the above gives you each element in between. This means that if you iterate through each the returned nodes, you will get some duplicated nested nodes - eg the "Some more elements".

    I think you might actually want to just get the top-level nodes in between - ie the siblings of the comments. This can be done using the preceding/following-sibling instead.

    doc.xpath("//*[preceding-sibling::comment()[. = ' begin content ']]
                  [following-sibling::comment()[. = ' end content ']]")
    #=> 
    some text
    #=>
    #=>

    Some more elements

    #=>

    Update - Including comments

    Using //* only returns element nodes, which does not include comments (and some others). You could change * to node() to return everything.

    puts doc.xpath("//node()[preceding-sibling::comment()[. = 'begin content']]
                            [following-sibling::comment()[. = 'end content']]")
    #=> 
    #=> 
    #=> 
    #=> 
    html
    #=>

    If you just want element nodes and comments (ie not everything), you can use the self axis:

    doc.xpath("//node()[self::* or self::comment()]
                       [preceding-sibling::comment()[. = 'begin content']]
                       [following-sibling::comment()[. = 'end content']]")
    #~ #=> 
    #~ #=> 
    html

    0 讨论(0)
提交回复
热议问题