What are the longest and shortest HTML character entity names? [closed]

会有一股神秘感。 提交于 2019-12-24 09:06:05

问题


There are a million cheatsheets all around the tubes that enumerate to different levels of comprehension the character entities specified by various versions and specifications of HTML. I don't want to trust any particular one of them, so I figure I'll toss it out here and see if anyone posts a more authoritative answer.

So, let's assume that I want to match any and all character references and entities using a regular expression. I'd start with /&(?:#(?:x[0-9a-f]+|[0-9]+)|[a-z]{???,???});/i. But what would go into ???s? I can think of entities that are two characters long, like lt and gt, but are there any one-letter entities in any specifications of the HTML? Likewise, what is the longest entity? Finally, those are the only three syntaxes for expressing literal characters in HTML aside from just typing them directly, are they not?

Cheers!


回答1:


Longest in HTML5 is &CounterClockwiseContourIntegral;, and there are no one-letter names.

But note that named entity references don't work as you think. Some named character references don't end with a semi-colon, so a regex won't cut the mustard.




回答2:


The HTML5 spec explicitly describes now, what browsers used to do as error correction since the mid-90s: Show the thing verbatim, if it doesn't match a known character reference. Therefore, if you want your regex to work like a browser, you have to copy the browsers behaviour.

That means, you have to test against a complete list of known references, like the one mentioned by Jukka. You can abbreviate the term with clever use of parentheses,

[aeiou]uml

but you need to bake the same knowledge into the regex, that the browser has, in order to get the same result.

Edit: By the way, named entities might also have numbers in them, e.g., &ensp13;.




回答3:


Entity names used to have 2 to 7 letters, following SGML tradition, and this is still the case in the HTML 4.01 specification (and XHTML specifications). But HTML5 drafts add a large number of entities, called named character references there, and some of them are fairly long, like EmptyVerySmallSquare. So it would be better to avoid any fixed upper limit – or a lower limit larger than 1.



来源:https://stackoverflow.com/questions/12566098/what-are-the-longest-and-shortest-html-character-entity-names

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!