I need to parse for an XML style sheet:
This is not an XML element; this is an XML "Processing Instruction". That is why you could not find it with your query. To find it you want:
# Find the first xml-stylesheet PI
xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
# Find every xml-stylesheet PI
xsss = doc.xpath('//processing-instruction("xml-stylesheet")')
Seen in action:
require 'nokogiri'
xml = <<ENDXML
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/templates/disclaimer_en.xsl"?>
<root>Hi Mom!</root>
ENDXML
doc = Nokogiri.XML(xml)
xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
puts xss.name #=> xml-stylesheet
puts xss.content #=> type="text/xsl" href="/templates/disclaimer_en.xsl"
Since a Processing Instruction is not an Element, it does not have attributes; you cannot, for example, ask for xss['type']
or xss['href']
; you will need to parse the content as an element if you wish this. One way to do this is:
class Nokogiri::XML::ProcessingInstruction
def to_element
document.parse("<#{name} #{content}/>")
end
end
p xss.to_element['href'] #=> "/templates/disclaimer_en.xsl"
Note that there exists a bug in Nokogiri or libxml2 which will cause the XML Declaration to appear in the document as a Processing Instruction if there is at least one character (can be a space) before <?xml
. This is why in the above we search specifically for processing instructions with the name xml-stylesheet
.
Edit: The XPath expression processing-instruction()[name()="foo"]
is equivalent to the expression processing-instruction("foo")
. As described in the XPath 1.0 spec:
The
processing-instruction()
test may have an argument that is Literal; in this case, it is true for any processing instruction that has a name equal to the value of the Literal.
I've edited the answer above to use the shorter format.
Nokogiri cannot search for tags that are XML processing instructions. You may access them like this:
doc.children[0]