I\'m using an \'&
\' symbol with HTML5 and UTF-8 in my site\'s
. Google shows the ampersand fine on its SERPs, as do all the browse
I think this has turned into more of a question of "why follow the spec when browser's don't care." Here is my generalized answer:
Standards are not a "present" thing. They are a "future" thing. If we, as developers, follow web standards, then browser vendors are more likely to correctly implement those standards, and we move closer to a completely interoperable web, where CSS hacks, feature detection, and browser detection are not necessary. Where we don't have to figure out why our layouts break in a particular browser, or how to work around that.
Specifically, if HTML5 does not require using & in your specific situation, and you're using an HTML5 doctype (and also expecting your users to be using HTML5-compliant browsers), then there is no reason to do it.
The link has a fairly good example of when and why you may need to escape &
to &
https://jsfiddle.net/vh2h7usk/1/
Interestingly, I had to escape the character in order to represent it properly in my answer here. If I were to use the built-in code sample option (from the answer panel), I can just type in &
and it appears as it should. But if I were to manually use the <code></code>
element, then I have to escape in order to represent it correctly :)
In HTML a &
marks the begin of a reference, either of a character reference or of an entity reference. From that point on the parser expects either a #
denoting a character reference, or an entity name denoting an entity reference, both followed by a ;
. That’s the normal behavior.
But if the reference name or just the reference opening &
is followed by a white space or other delimiters like "
, '
, <
, >
, &
, the ending ;
and even a reference to represent a plain &
can be omitted:
<p title="&">foo & bar</p>
<p title="&">foo & bar</p>
<p title="&">foo & bar</p>
Only in these cases the ending ;
or even the reference itself can be omitted (at least in HTML 4). I think HTML 5 requires the ending ;
.
But the specification recommends to always use a reference like the character reference &
or the entity reference &
to avoid confusion:
Authors should use "
&
" (ASCII decimal 38) instead of "&
" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&
" in attribute values since character references are allowed within CDATA attribute values.
Yes, you should try to serve valid code if possible.
Most browsers will silently correct this error, but there is a problem with relying on the error handling in the browsers. There is no standard for how to handle incorrect code, so it's up to each browser vendor to try to figure out what to do with each error, and the results may vary.
Some examples where browsers are likely to react differently is if you put elements inside a table but outside the table cells, or if you nest links inside each other.
For your specific example it's not likely to cause any problems, but error correction in the browser might for example cause the browser to change from standards compliant mode into quirks mode, which could make your layout break down completely.
So, you should correct errors like this in the code, if not for anything else so to keep the error list in the validator short, so that you can spot more serious problems.
Yes. Just as the error said, in HTML, attributes are #PCDATA meaning they're parsed. This means you can use character entities in the attributes. Using &
by itself is wrong and if not for lenient browsers and the fact that this is HTML not XHTML, would break the parsing. Just escape it as &
and everything would be fine.
HTML5 allows you to leave it unescaped, but only when the data that follows does not look like a valid character reference. However, it's better just to escape all instances of this symbol than worry about which ones should be and which ones don't need to be.
Keep this point in mind; if you're not escaping & to &, it's bad enough for data that you create (where the code could very well be invalid), you might also not be escaping tag delimiters, which is a huge problem for user-submitted data, which could very well lead to HTML and script injection, cookie stealing and other exploits.
Please just escape your code. It will save you a lot of trouble in the future.
HTML5 rules are different from HTML4. It's not required in HTML5 - unless the ampersand looks like it starts a parameter name. "©=2" is still a problem, for example, since © is the copyright symbol.
However it seems to me that it's harder work to decide to encode or not to encode depending on the following text. So the easiest path is probably to encode all the time.