Can Nokogiri search for “?xml-stylesheet” tags?

前端 未结 2 1214
隐瞒了意图╮
隐瞒了意图╮ 2021-01-19 04:08

I need to parse for an XML style sheet:




        
2条回答
  •  心在旅途
    2021-01-19 04:23

    This is not an XML element; this is an XML "Processing Instruction". That is why you could not find it with your query. To find it you want:

    # Find the first xml-stylesheet PI
    xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
    
    # Find every xml-stylesheet PI
    xsss = doc.xpath('//processing-instruction("xml-stylesheet")')
    

    Seen in action:

    require 'nokogiri'
    xml = <
      
      Hi Mom!
    ENDXML
    doc = Nokogiri.XML(xml)
    xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
    puts xss.name     #=> xml-stylesheet
    puts xss.content  #=> type="text/xsl" href="/templates/disclaimer_en.xsl"
    

    Since a Processing Instruction is not an Element, it does not have attributes; you cannot, for example, ask for xss['type'] or xss['href']; you will need to parse the content as an element if you wish this. One way to do this is:

    class Nokogiri::XML::ProcessingInstruction
      def to_element
        document.parse("<#{name} #{content}/>")
      end
    end
    
    p xss.to_element['href'] #=> "/templates/disclaimer_en.xsl"
    

    Note that there exists a bug in Nokogiri or libxml2 which will cause the XML Declaration to appear in the document as a Processing Instruction if there is at least one character (can be a space) before . This is why in the above we search specifically for processing instructions with the name xml-stylesheet.

    Edit: The XPath expression processing-instruction()[name()="foo"] is equivalent to the expression processing-instruction("foo"). As described in the XPath 1.0 spec:

    The processing-instruction() test may have an argument that is Literal; in this case, it is true for any processing instruction that has a name equal to the value of the Literal.

    I've edited the answer above to use the shorter format.

提交回复
热议问题