Is there a machine-readable version of HTML5 specs?

后端 未结 5 1029
日久生厌
日久生厌 2020-12-28 18:51

I\'m looking for a machine-readable version of the HTML5 specs, akin to a DTD, although any format would do as long as it\'s parsable.

The HTML5 specs don\'t seem to

相关标签:
5条回答
  • 2020-12-28 19:26

    Trawling through W3's site I can only see two things of interest on this:

    • "As HTML5 is no longer formally based upon SGML, the DOCTYPE no longer serves this purpose, and thus no longer needs to refer to a DTD." from the HTML5 working draft. It doesn't say there isn't one, just that clients don't need one
    • And that HTML5 is still a working draft obviously, not a specification, which implies there may be a DTD published later

    I've looked as hard as you probably have with nothing concrete. I think validator.nu's approach is the best as the working draft is likely to change several times before a specification is ever agreed upon. If someone did publish an unofficial DTD it would need constant maintenance.

    +1 great question, I wish I could find a concrete answer. I hope someone else can!

    0 讨论(0)
  • 2020-12-28 19:30

    NEW as of April 2019 The WHATWG HTML5 spec as JSON, although very incomplete and a work in progress.

    Uses Python to parse the multipage standard.

    Full disclosure: I made this.

    See also

    HTML5 RelaxNG schemas

    0 讨论(0)
  • 2020-12-28 19:31

    I've read this question and it's answers and decided to start a new project: WHATWG HTML5 Standard Parser. Currently, it parsers the singlepage version of the standard html page and provides the elements together with allowed attributes.

    Hope to get something started... Pull requests are welcome!!!

    0 讨论(0)
  • 2020-12-28 19:32

    UPDATING

    Since 2014-10-28 the HTML5 is a recommendation (!)... But this question is not obsolete (the validators now are more complex tham simple DTD).

    ANSWER

    there are no simple parser, as @ruediste clues show... Today, perhaps the best parser is at https://validator.nu/ ... so,

    1. You show the first part of the answer: it is a complex parser, and validator.nu is a good parser.
    2. the 2014-10-28 W3C's recommendation confirms that there are no simple parser (like a DTD or a list of elements) to say "this is a valid HTML5".
    3. ... this other question show that, perhaps, only context (use/community) can validate the list of tags and attributes.
    0 讨论(0)
  • 2020-12-28 19:34

    There isn't a BNF/CFG for HTML5 because HTML5 is partially about progressive enhancement and fixing errors silently. If a page features broken markup, it's the browser's duty to display the page as well as it can and not complain to the user.

    More about this history can be read at Dive Into HTML5 / How Did We Get Here?:

    As you might expect, the fact that “broken” HTML markup still worked in web browsers led authors to create broken HTML pages. A lot of broken pages. By some estimates, over 99% of HTML pages on the web today have at least one error in them. But because these errors don’t cause browsers to display visible error messages, nobody ever fixes them.

    I guess this isn't particularly helpful, so my apologies. You could try looking at the XHTML 1.1 DTD or SGML DTD as starting points. Or, if you want a heuristic-based best-attempt approach, check out an HTML parser such as Beautiful Soup.

    0 讨论(0)
提交回复
热议问题