问题
The W3C recommended list of doctype declarations indicates the following doctype for XHTML 1.1:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
This is the same system ID recommended by A List Apart, the Wiley Dummies site, among many others. It was one of the standard system ID for the modular XHTML 1.1 DTD.
Unfortunately this modular DTD refers to other XML entities, some of which the W3C has removed from its site, completely breaking parsing.
You can test this in Java 11. Start with the following XHTML 1.1 file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>XHTML 1.1 Skeleton</title>
</head>
<body>
</body>
</html>
Try to parse it using a standard, built-in Java parser:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
final Document document;
try (InputStream inputStream = new BufferedInputStream(getClass().getResourceAsStream("xhtml-1.1-test.xhtml"))) {
document = documentBuilder.parse(inputStream);
}
Parsing will fail, throwing a java.io.FileNotFoundException
for http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod
. Apparently the W3C has removed this entity from its web site altogether.
If instead http://www.w3.org/MarkUp/DTD/xhtml11.dtd
is used (which appears a a comment in the XHTML 1.1 specification DTD), parsing completes normally (albeit after about 10 minutes).
Why does the W3C make insufficient entities available at the http://www.w3.org/TR/xhtml11/DTD/
collection, breaking XHTML 1.1 parsing with a standard system ID? Why aren't all the modules available that are available at http://www.w3.org/MarkUp/DTD/
? Who at the W3C should I contact to get this fixed? (And why does HTTP access take so long for these entities?)
回答1:
The URL you mentioned as alternative - http://www.w3.org/MarkUp/DTD/xhtml11.dtd
- seems to be consistently used in the XHTML 1.1 specs/DTDs/modules and appears to be the one endorsed by W3C, rather than http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd
. My guess is access to these declaration sets is deliberately throttled, as W3C doesn't want to serve these to the general public; you're supposed to store these locally and use an SGML/XML catalog file mapping identifiers to your local entity/declaration sets.
I had success in validating an XHTML 1.1 file using libxml2's xmllint
command-line tool by invoking
SGML_CATALOG_FILES=./catalog xmllint --catalogs --dtdvalid xhtml11.dtd testdoc.xhtml
with a catalog
file having the following content (and the referenced .dtd
, .mod
and .ent
files in place in that directory, of course):
OVERRIDE YES
SGMLDECL "xml1.dcl"
PUBLIC "-//W3C//DTD XHTML 1.1//EN" "xhtml11.dtd"
PUBLIC "-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN" "xhtml11-model-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Common Attributes 1.0//EN" "xhtml-attribs-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-attribs-1.mod" "xhtml-attribs-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Base Element 1.0//EN" "xhtml-base-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-base-1.mod" "xhtml-base-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML BDO Element 1.0//EN" "xhtml-bdo-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-bdo-1.mod" "xhtml-bdo-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Phrasal 1.0//EN" "xhtml-blkphras-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkphras-1.mod" "xhtml-blkphras-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Presentation 1.0//EN" "xhtml-blkpres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkpres-1.mod" "xhtml-blkpres-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Block Structural 1.0//EN" "xhtml-blkstruct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-blkstruct-1.mod" "xhtml-blkstruct-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Character Entities 1.0//EN" "xhtml-charent-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-charent-1.mod" "xhtml-charent-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Client-side Image Maps 1.0//EN" "xhtml-csismap-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-csismap-1.mod" "xhtml-csismap-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Datatypes 1.0//EN" "xhtml-datatypes-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-datatypes-1.mod" "xhtml-datatypes-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Editing Markup 1.0//EN" "xhtml-edit-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-edit-1.mod" "xhtml-edit-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Intrinsic Events 1.0//EN" "xhtml-events-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-events-1.mod" "xhtml-events-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Forms 1.0//EN" "xhtml-form-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-form-1.mod" "xhtml-form-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Modular Framework 1.0//EN" "xhtml-framework-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-framework-1.mod" "xhtml-framework-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Hypertext 1.0//EN" "xhtml-hypertext-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-hypertext-1.mod" "xhtml-hypertext-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Images 1.0//EN" "xhtml-image-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-image-1.mod" "xhtml-image-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Phrasal 1.0//EN" "xhtml-inlphras-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlphras-1.mod" "xhtml-inlphras-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Presentation 1.0//EN" "xhtml-inlpres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlpres-1.mod" "xhtml-inlpres-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Inline Structural 1.0//EN" "xhtml-inlstruct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlstruct-1.mod" "xhtml-inlstruct-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Inline Style 1.0//EN" "xhtml-inlstyle-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-inlstyle-1.mod" "xhtml-inlstyle-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Legacy Markup 1.0//EN" "xhtml-legacy-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-legacy-1.mod" "xhtml-legacy-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Link Element 1.0//EN" "xhtml-link-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-link-1.mod" "xhtml-link-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Lists 1.0//EN" "xhtml-list-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-list-1.mod" "xhtml-list-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Metainformation 1.0//EN" "xhtml-meta-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-meta-1.mod" "xhtml-meta-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Embedded Object 1.0//EN" "xhtml-object-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-object-1.mod" "xhtml-object-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Param Element 1.0//EN" "xhtml-param-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-param-1.mod" "xhtml-param-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Presentation 1.0//EN" "xhtml-pres-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-pres-1.mod" "xhtml-pres-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML Qualified Names 1.0//EN" "xhtml-qname-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-qname-1.mod" "xhtml-qname-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Ruby 1.0//EN" "xhtml-ruby-1.mod"
SYSTEM "http://www.w3.org/TR/ruby/xhtml-ruby-1.mod" "xhtml-ruby-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Scripting 1.0//EN" "xhtml-script-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-script-1.mod" "xhtml-script-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Server-side Image Maps 1.0//EN" "xhtml-ssismap-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-ssismap-1.mod" "xhtml-ssismap-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Document Structure 1.0//EN" "xhtml-struct-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-struct-1.mod" "xhtml-struct-1.mod"
PUBLIC "-//W3C//DTD XHTML Style Sheets 1.0//EN" "xhtml-style-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-style-1.mod" "xhtml-style-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Tables 1.0//EN" "xhtml-table-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-table-1.mod" "xhtml-table-1.mod"
PUBLIC "-//W3C//ELEMENTS XHTML Text 1.0//EN" "xhtml-text-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-text-1.mod" "xhtml-text-1.mod"
PUBLIC "-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN" "xhtml11-model-1.mod"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml11-model-1.mod" "xhtml11-model-1.mod"
PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent" "xhtml-lat1.ent"
PUBLIC "-//W3C//ENTITIES Special for XHTML//EN" "xhtml-special.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-special.ent" "xhtml-special.ent"
PUBLIC "-//W3C//ENTITIES Symbols for XHTML//EN" "xhtml-symbol.ent"
SYSTEM "http://www.w3.org/MarkUp/DTD/xhtml-symbol.ent" "xhtml-symbol.ent"
Note this is SGML/traditional/plain catalog syntax. If you want to use it with Java/JAXP, you'll have to convert it into a catalog file in XML syntax.
来源:https://stackoverflow.com/questions/60655704/w3c-breaks-xhtml-1-1-parsing-by-removing-modules-from-web-site