How do I validate XHTML with nokogiri?

 ̄綄美尐妖づ 提交于 2019-12-28 15:02:57

问题


I've found a few posts alluding to the fact that you can validate XHTML against its DTD using the nokogiri gem. Whilst I've managed to use it to parse XHTML successfully (looking for 'a' tags etc.), I'm struggling to validate documents.

For me, this:

doc = Nokogiri::XML(Net::HTTP.get(URI.parse("http://www.w3.org")))
puts doc.validate

results in a whole heap of:

[
#<Nokogiri::XML::SyntaxError: No declaration for element html>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute xmlns of element html>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute lang of element html>,  
#<Nokogiri::XML::SyntaxError: No declaration for attribute lang of element html>,
#<Nokogiri::XML::SyntaxError: No declaration for element head>,
#<Nokogiri::XML::SyntaxError: No declaration for attribute profile of element head
[repeat for every tag in the document.]
]

So I'm assuming that's not the right approach. I can't seem to locate any good examples -- can anyone suggest what I'm doing wrong?

I'm running ruby 1.8.6 on Mac OSX 10.5.8. Nokogiri tells me:

nokogiri: 1.3.3
warnings: []

libxml: 
  compiled: 2.6.23
  loaded: 2.6.23
  binding: extension

回答1:


It's not just you. What you're doing is supposed to be the right way to do it, but I've never had any luck with it. As far as I can tell, there's some disconnect somewhere between Nokogiri and libxml which causes it to not load SYSTEM DTDs, or to recognize PUBLIC DTDs. It will work if you define the DTD within the XML file, but good luck doing that with the XHTML DTDs.

The best thing I can recommend is to use the schemas for XHTML instead:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::XML(open('http://www.w3.org'))
xsd = Nokogiri::XML::Schema(open('http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd'))

#this is a true/false validation
xsd.valid?(doc)    # => true

#this gives a listing of errors
xsd.validate(doc)  # => []



回答2:


It works ok if the DTD is embedded in the XML. So if restructuring the data in a single file is ok, either as a general practice, or just for temporary use, that would solve your problem.

I filed an issue with the Nokogiri project at:

https://github.com/sparklemotion/nokogiri/issues/440

Yoko Harada, primary author of JRuby Nokigiri, said:

"Just FYI. Pure Java Nokogiri on master branch (not yet released) doesn't have this problem."

The issue I filed contains links to minimal example files and irb calls to illustrate the issue.

  • Keith


来源:https://stackoverflow.com/questions/1287952/how-do-i-validate-xhtml-with-nokogiri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!