Using XPath with HTML or XML fragment?

若如初见. 提交于 2020-01-01 19:30:51

问题


I am new to Nokogiri and XPath, and I am trying to access all comments in a HTML or XML fragment. The XPaths .//comment() and //comment() work when I am not using the fragment function, but they do not find anything with a fragment. With a tag instead of a comment, it works with the first XPath.

By trial and error, I realized that in this case comment() finds only top level comments and .//comment() and some others find only inner comments. Am I doing something wrong? What am I missing? Can anyone explain what is happening?

What XPath should I use to get all comments in a HTML fragment parsed by Nokogiri?

This example can help to understand the problem:

str = "<!-- one --><p><!-- two --></p>"

# this works:
Nokogiri::HTML(str).xpath("//comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535d71d5c " one ">, #<Nokogiri::XML::Comment:0x3f8535d71cf8 " two ">]
Nokogiri::HTML(str).xpath(".//comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535cc7974 " one ">, #<Nokogiri::XML::Comment:0x3f8535cc7884 " two ">]

# with fragment, it does not work:
Nokogiri::HTML.fragment(str).xpath("//comment()")
=> []
Nokogiri::HTML.fragment(str).xpath("comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535d681a8 " one ">]
Nokogiri::HTML.fragment(str).xpath(".//comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535d624d8 " two ">]
Nokogiri::HTML.fragment(str).xpath("*//comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535d5cb8c " two ">]
Nokogiri::HTML.fragment(str).xpath("*/comment()")
=> [#<Nokogiri::XML::Comment:0x3f8535d4e104 " two ">]

# however it does if it is a tag instead of a comment:
str = "<a desc='one'/> <p><a>two</a><a desc='three'/></p>"
Nokogiri::HTML.fragment(str).xpath(".//a")
=> [#<Nokogiri::XML::Element:0x3f8535cb44c8 name="a" attributes=[#<Nokogiri::XML::Attr:0x3f8535cb4194 name="desc" value="one">]>, #<Nokogiri::XML::Element:0x3f8535cb4220 name="a" children=[#<Nokogiri::XML::Text:0x3f8535cb3ba4 "two">]>, #<Nokogiri::XML::Element:0x3f8535cb3a3c name="a" attributes=[#<Nokogiri::XML::Attr:0x3f8535cb3960 name="desc" value="three">]>]

PS: Without fragment it does what I want, but it also adds some stuff like "DOCTYPE" and I really have only a fragment of a HTML file that I am editing (removing some tags, replacing others).


回答1:


//comment() is a short form of /descendant-or-self::node()/child::comment()

using this xpath with a fragment ignores the root comments (they are selected by /descendant-or-self::node() but they don't have children).

if you use HTML(str) you create a document node as the root of all other items. therefore, /descendant-or-self::node()/child::comment() does not ignore the top level comments because they are the children of the document node (which itself is selected by /descendant-or-self::node()).

I am not sure why descendant::comment() works in any case, I would say that it should be descendant-or-self::comment(), but never mind.

hope that helps?




回答2:


"descendant::comment()" and "descendant::sometag" works fine in every case, but I still don't understand these differences.



来源:https://stackoverflow.com/questions/3817843/using-xpath-with-html-or-xml-fragment

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!