Which are the HTML, and XML, special characters?

随声附和 提交于 2019-11-27 07:36:21

First, you're comparing a HTML 4.01 specification with an HTML 5 one. HTML5 ties more closely in with XML than HTML 4.01 ever does (that's why we have XHTML), so this answer will stick to HTML 5 and XML.

Your quoted references are all consistent on the following points:

  • < should always be represented with &lt; when not indicating a processing instruction
  • > should always be represented with &gt; when not indicating a processing instruction
  • & should always be represented with &amp;
  • except when within <![CDATA[ ]]> (which only applies to XML)

I agree 100% with this. You never want the parser to mistake literals for instructions, so it's a solid idea to always encode any non-space (see below) character. Good parsers know that anything contained within <![CDATA[ ]]> are not instructions, so the encoding is not necessary there.

In practice, I never encode ' or " unless

  • it appears within the value of an attribute (XML or HTML)
  • it appears within the text of XML tags. (<tag>&quot;Yoinks!&quot;, he said.</tag>)

Both specifications also agree with this.

So, the only point of contention is the (space). The only mention of it in either specification is when serialization is attempted. When not, you should always use a literal (space). Unless you are writing your own parser, I don't see the need to be doing any kind of serialization, so this is beside the point.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!