Can Nokogiri search for “?xml-stylesheet” tags?

前端 未结 2 1196
隐瞒了意图╮
隐瞒了意图╮ 2021-01-19 04:08

I need to parse for an XML style sheet:




        
相关标签:
2条回答
  • 2021-01-19 04:23

    This is not an XML element; this is an XML "Processing Instruction". That is why you could not find it with your query. To find it you want:

    # Find the first xml-stylesheet PI
    xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
    
    # Find every xml-stylesheet PI
    xsss = doc.xpath('//processing-instruction("xml-stylesheet")')
    

    Seen in action:

    require 'nokogiri'
    xml = <<ENDXML
      <?xml version="1.0" encoding="UTF-8"?>
      <?xml-stylesheet type="text/xsl" href="/templates/disclaimer_en.xsl"?>
      <root>Hi Mom!</root>
    ENDXML
    doc = Nokogiri.XML(xml)
    xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
    puts xss.name     #=> xml-stylesheet
    puts xss.content  #=> type="text/xsl" href="/templates/disclaimer_en.xsl"
    

    Since a Processing Instruction is not an Element, it does not have attributes; you cannot, for example, ask for xss['type'] or xss['href']; you will need to parse the content as an element if you wish this. One way to do this is:

    class Nokogiri::XML::ProcessingInstruction
      def to_element
        document.parse("<#{name} #{content}/>")
      end
    end
    
    p xss.to_element['href'] #=> "/templates/disclaimer_en.xsl"
    

    Note that there exists a bug in Nokogiri or libxml2 which will cause the XML Declaration to appear in the document as a Processing Instruction if there is at least one character (can be a space) before <?xml. This is why in the above we search specifically for processing instructions with the name xml-stylesheet.

    Edit: The XPath expression processing-instruction()[name()="foo"] is equivalent to the expression processing-instruction("foo"). As described in the XPath 1.0 spec:

    The processing-instruction() test may have an argument that is Literal; in this case, it is true for any processing instruction that has a name equal to the value of the Literal.

    I've edited the answer above to use the shorter format.

    0 讨论(0)
  • Nokogiri cannot search for tags that are XML processing instructions. You may access them like this:

    doc.children[0]
    
    0 讨论(0)
提交回复
热议问题