I have a big HTML page. But I want to select certain nodes using Xpath:
........
some text
&l
I would look for elements that are preceded by the first comment and followed by the second comment:
doc.xpath("//*[preceding::comment()[. = ' begin content ']]
[following::comment()[. = ' end content ']]")
#=> some text
#=>
#=> Some more elements
#=>
#=> Some more elements
Note that the above gives you each element in between. This means that if you iterate through each the returned nodes, you will get some duplicated nested nodes - eg the "Some more elements".
I think you might actually want to just get the top-level nodes in between - ie the siblings of the comments. This can be done using the preceding/following-sibling
instead.
doc.xpath("//*[preceding-sibling::comment()[. = ' begin content ']]
[following-sibling::comment()[. = ' end content ']]")
#=> some text
#=>
#=> Some more elements
#=>
Update - Including comments
Using //*
only returns element nodes, which does not include comments (and some others). You could change *
to node()
to return everything.
puts doc.xpath("//node()[preceding-sibling::comment()[. = 'begin content']]
[following-sibling::comment()[. = 'end content']]")
#=>
#=>
#=>
#=> html
#=>
If you just want element nodes and comments (ie not everything), you can use the self
axis:
doc.xpath("//node()[self::* or self::comment()]
[preceding-sibling::comment()[. = 'begin content']]
[following-sibling::comment()[. = 'end content']]")
#~ #=>
#~ #=> html