Why aren't browsers strict about HTML? [closed]

问题

It's a well known fact that browsers will accept invalid HTML and do their best trying to make sense out of it. If you create a web page containing only the following code:

<html>
    <head>
        <title>This is bad HTML</title>
    <body>
        <h1>Bad HTML</h2>
        <p>This is a paragraph
    </body>

then you will get a webpage parsed in a way that will show an acceptable view. Whether it is what you meant or not, depends on each browser's understanding of your mistakes.

This, to me, is the same as if Javascript could be written like this:

if (some_var == 1) {
    say_something("some text');
else {
    do_something_else();
// END OF CODE

which, a Javascript compiler written with the same effort to make sense out of invalid code could proably parse as you meant - or make its own sense but run it after all.

I've seen several articles and questions regarding the question "Is it even worth it writting valid HTML?", which present several opinions on the pros and cons of writting valid HTML. However, what this really makes me wonder is:

Why are browsers accepting invalid HTML in the first place?

NOTE: The following questions are not more questions, but a way to give context to the only question I'm asking here:

Why aren't browsers strict?
Why don't they reject with errors invalid code, just like any other programming language? (not that I'm calling HTML a programming language, but you get the point)
Wouldn't that force all developers to write HTML code that will be interpreted exactly the same in any browser?
If browsers refused to parse invalid markup, wouldn't that effectively result in valid markup everywhere and from anyone wanting to publish content in the web?
If this comes from historical reasons and backward compatibility, isn't it time already to change when we already see sites like adsense.google.com refusing compatibility with IE < v10?

EDIT: Those voting to close this question, please reconsider. This is not a broad question neither is a opinion based one. It's a very specific question on a very specific subject, completely related to the programming world and that can definitely be answered with a real answer by those who actually know it. Thanks.

回答1:

I don't know why they allowed it from the start, but here is why they cant switch now: Legacy Support. If a browser forced strict html, huge parts of the internet would just break, and yes some people would update their code, but some pages would just be lost. There is no incentive for browsers to do this because it would seem to the consumer that browser just doesn't work on some pages and would switch to another that still supports less optimal html.

Basically because it was allowed from the beginning, now it has to be allowed now.

回答2:

"Why are browsers accepting invalid HTML in the first place?"

For compatibility reasons, and in the case of newer browsers, because HTML5 dictates an algorithm for parsing even invalid documents.

Earlier HTML specifications were ambiguous on many situations, such as what happens when the wrong tag is seen, or inconsistent nesting of tags, such as <b><i></b></i>. Even so, many documents "just work" because some earlier browsers ignore unexpected tags or even "correct" incorrect nesting.

But now the HTML5 specification includes a much less ambiguous algorithm for parsing HTML documents. Note that the algorithm includes points where "parse errors" can occur. But these parse errors usually don't stop a modern browser from displaying an HTML document, although the browser is free to display parse errors in its developer tools if it chooses to:

[U]ser agents, while parsing an HTML document, may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification. [Emphasis added.]

But again, no modern browser, to my knowledge, aborts parsing a document this early because of parse errors (barring extraordinary situations, such as running out of memory).

On the adsense.google.com situation: This probably has nothing to do with invalid HTML, but rather, perhaps, because IE9 and earlier's DOM support is not sufficient for adsense.google.com's needs.

回答3:

To avoid opinion-based answers, this type of question requires an answer based on an authorative reference with credible and/or official sources.

The following excerpts are quotes from W3C Validator Help & FAQ that addresses Why are browsers accepting invalid HTML in the first place? and some other demonstrated concerns related to that.

About Markup

Most pages on the World Wide Web are written in computer languages (such as HTML) that allow Web authors to structure text, add multimedia content, and specify what appearance, or style, the result should have.

As for every language, these have their own grammar, vocabulary and syntax, and every document written with these computer languages are supposed to follow these rules. The (X)HTML languages, for all versions up to XHTML 1.1, are using machine-readable grammars called DTDs, a mechanism inherited from SGML.

However, Just as texts in a natural language can include spelling or grammar errors, documents using Markup languages may (for various reasons) not be following these rules.

[...]

Concepts

One of the important maxims of computer programming is: "Be conservative in what you produce; be liberal in what you accept."

Browsers follow the second half of this maxim by accepting Web pages and trying to display them even if they're not legal HTML. Usually this means that the browser will try to make educated guesses about what you probably meant. The problem is that different browsers (or even different versions of the same browser) will make different guesses about the same illegal construct; worse, if your HTML is really pathological, the browser could get hopelessly confused and produce a mangled mess, or even crash.

That's why you want to follow the first half of the maxim by making sure your pages are legal HTML.

[...]

Validity might not mean quality, and invalidity might not mean poor quality

A valid Web page is not necessarily a good web page, but an invalid Web page has little chance of being a good web page.

For that reason, the fact that the W3C Markup Validator says that one page passes validation does not mean that W3C assesses that it is a good page. It only means that a tool (not necessarily without flaws) has found the page to comply with a specific set of rules. No more, no less. This is also why the "valid ..." icons should never be considered as a "W3C seal of quality".

Unexpected browser behavior might mean that they actually don't accept invalid markup

While contemporary Web browsers do an increasingly good job of parsing even the worst HTML “tag soup”, some errors are not always caught gracefully. Very often, different software on different platforms will not handle errors in a similar fashion, making it extremely difficult to apply style or layout consistently.

Using standard, interoperable markup and stylesheets, on the other hand, offers a much greater chance of having one's page handled consistently across platforms and user-agents.

[...]

Compatibility problems

Checking that a page “displays fine” in several contemporary browsers may be a reasonable insurance that the page will “work” today, but it does not guarantee that it will work tomorrow.

In the past, many authors who relied on the quirks of Netscape 1.1 suddenly found their pages appeared totally blank in Netscape 2.0. Whilst Internet Explorer initially set out to be bug-compatible with Netscape, it too has moved towards standards compliance in later releases.

[...]

Relying too much on 3rd party tools

The answer to this one is that markup languages are no more than data formats. So a website doesn't look like anything at all! It only takes on a visual appearance when it is presented by your browser.

In practice, different browsers can and do display the same page very differently. This is deliberate, and doesn't imply any kind of browser bug. A term sometimes used for this is WYSINWOG - What You See Is Not What Others Get (unless by coincidence). It is indeed one of the principal strengths of the web, that (for example) a visually impaired user can select very large print or text-to-speech without a publisher having to go to the trouble and expense of preparing a separate edition.

来源：https://stackoverflow.com/questions/25559999/why-arent-browsers-strict-about-html

标签

html

html5

dom

xhtml

xhtml-1.0-strict