I want to replace all the
with
in my html file to support XML parser.
But I don\'t want to replace them directly, I\'d like t
What you have is a valid way to include an entity declaration in an internal subset. The document is not otherwise valid, though, as you can check with the W3C Markup Validator: the required xmlns
attribute on the html
element is missing, and so is the required title
attribute.
When served as text/html, the document is processed how browsers use to process HTML document, which means among other thing that internal subsets are not recognized; in fact, document type definitions are not read at all – instead, doctype declarations are just taken as magic strings so that some strings trigger “quirks mode”, some don’t. The doctype declaration is parsed in a simplistic manner, which makes the first “>” terminate it, so whatever comes after it is taken as character data.
The morale is that entity declarations just don’t work with “HTML”, internally or externally, when “HTML” means sending something to a browser and telling (in HTTP headers) it to be text/html – and that’s what servers normally tell when they send .html files.
Served as application/xhtml+xml and fixed to conform to XHTML syntax, your approach works on conforming browsers (online demo: http://www.cs.tut.fi/~jkorpela/test/nbsp.xhtml):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"
[<!ENTITY nbsp " ">]>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Entity demo</title></head>
<body>
<div>Hello World!</div>
</body>
</html>
However, IE 8 and earlier don’t process HTML when served as application/xhtml+xml (the browser just launches a “Save As” dialog).
The conclusions depend on what you are doing and why (and in which sense) you need to “support XML parser”. It’s not really about parsing but about entity declarations. XHTML user agents are not required to understand predefined entities as in HTML (except for those defined in XML), but has this possibility realized somehow? And in general, it is better to convert
to actual no-break space characters than to character references.
here
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"[<!ENTITY nbsp " ">