I need to parse for an XML style sheet:
This is not an XML element; this is an XML "Processing Instruction". That is why you could not find it with your query. To find it you want:
# Find the first xml-stylesheet PI
xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
# Find every xml-stylesheet PI
xsss = doc.xpath('//processing-instruction("xml-stylesheet")')
Seen in action:
require 'nokogiri'
xml = <
Hi Mom!
ENDXML
doc = Nokogiri.XML(xml)
xss = doc.at_xpath('//processing-instruction("xml-stylesheet")')
puts xss.name #=> xml-stylesheet
puts xss.content #=> type="text/xsl" href="/templates/disclaimer_en.xsl"
Since a Processing Instruction is not an Element, it does not have attributes; you cannot, for example, ask for xss['type']
or xss['href']
; you will need to parse the content as an element if you wish this. One way to do this is:
class Nokogiri::XML::ProcessingInstruction
def to_element
document.parse("<#{name} #{content}/>")
end
end
p xss.to_element['href'] #=> "/templates/disclaimer_en.xsl"
Note that there exists a bug in Nokogiri or libxml2 which will cause the XML Declaration to appear in the document as a Processing Instruction if there is at least one character (can be a space) before . This is why in the above we search specifically for processing instructions with the name
xml-stylesheet
.
Edit: The XPath expression processing-instruction()[name()="foo"]
is equivalent to the expression processing-instruction("foo")
. As described in the XPath 1.0 spec:
The
processing-instruction()
test may have an argument that is Literal; in this case, it is true for any processing instruction that has a name equal to the value of the Literal.
I've edited the answer above to use the shorter format.