问题
Yes, I've googled it, and surprisingly got confusing answers.
One page says that < > & "
are the only reserved characters in (X)HTML. No doubt, this makes sense.
This page says < > & " '
are the reserved characters in (X)HTML. A little confusing, but okay, this makes sense too.
And then comes this page which says < > & " © ° £
and non-breaking space ( 
) are all reserved characters in (X)HTML. This makes no sense at all, and pretty much adds to my confusion.
Can someone knowledgeable, who actually do know this stuff, clarify which the reserved characters in (X)HTML actually are?
EDIT: Also, should all the reserved characters in code be escaped when wrapped in <pre>
tag? or is it just these three -- < > &
??
回答1:
Only <
and &
need to be escaped. Inside attributes, "
or '
(depending on which quote style you use for the attribute's value) needs to be escaped, too.
<a href="#" onclick="here you can use ' safely"></a>
<a href="#" onclick='here you can use " safely'></a>
回答2:
The XHTML 1.0 specification states at http://www.w3.org/TR/2002/REC-xhtml1-20020801/#xhtml:
XHTML 1.0 [...] is a reformulation of the three HTML 4 document types as applications of XML 1.0 [XML].
The XML 1.0 specification states at http://www.w3.org/TR/2008/REC-xml-20081126/#syntax:
Character Data and Markup: Text consists of intermingled character data and markup. [...] The ampersand character (
&
) and the left angle bracket (<
) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings "&
" and "<
" respectively. The right angle bracket (>
) may be represented using the string ">
", and MUST, for compatibility, be escaped using either ">
" or a character reference when it appears in the string "]]>
" in content, when that string is not marking the end of a CDATA section.
This means that when writing the text parts of an XHTML document you must escape &, <, and >.
You can escape a lot more, e.g. ü for umlaut u. You can as well state that the document is encoded in for example UTF-8 and write the byte sequence 0xc3bc instead to get the same umlaut u.
When writing the element parts (col. "tags") of the document, there are different rules. You have to take care of ", ' and a lot of rules concerning comments, CDATA and so on. There are also rules which characters can be used in element and attribute names. You can look it up in the XML specification, but in the end it comes down to: for element and attribute names, use letters, digits and "-"; do not use "_". For attribute values, you must escape & and (depending on the quote style) either ' or ".
If you use one of the many libraries to write XML / XHTML documents, somebody else has already taken care of this and you just have to tell the library to write text or elements. All the escaping is done the in the background.&
回答3:
By writing "(X)HTML", you are asking (at least) two different questions.
By the HTML rules, with "HTML" meaning any HTML version up to and including HTML 4.01, only "<" and "&" are reserved. The rules are somewhat complex. They should not not appear literally except in their syntactic use in tags, entity references, and character references. But by the formal rules, they may appear literally e.g. in the context "A & B" or "A < B" (but A&B
be formally wrong, and so would A<B
).
The XHTML rules, based on XML, are somewhat stricter, simpler: "<" and "&" are unconditionally reserved.
The ASCII quotation mark " and the ASCII apostrophe ' are not reserved, except in the very specific sense that a quoted attribute value must not literally contain the character used as quote, i.e. in "foo" the string foo must not contain " as such and in 'foo' the string foo must not contain ' as such.
回答4:
The characters < > & "
are reserved by XML format.
It means that you can use < and > chars only to define tags (
<mytag></mytag>
).Double quotes (") are used to define values of attributes (
<mytag attribute="value" />
)Ampersand (&) is used to write entities (
&
is used when you actually want to write ampersand, NOT&
). Also, when you write url in your XML document, you should use&
, not just&
:www.aaa.com?a=1&b=2
- is wrong;www.aaa.com?a=1&b=2
- is good!
XHTML is based on XML, so what I have wrote applies to XHTML.
© ° £
- These are not reserved chars. These are entities defined specifically for XHTML, not for XML.
In XML you can simply write ©
. In XHMTL you can also simply write ©, or use entity ©
, or numeric entity &00A9;
.
回答5:
In addition to the other answers it might help to know that there are also forbidden characters: all control characters in ASCII and ISO-8859-1 except TAB, LF, and CR.
https://www.w3.org/MarkUp/html3/specialchars.html
来源:https://stackoverflow.com/questions/10371493/what-are-the-reserved-characters-in-xhtml