What are all the HTML escaping contexts?

后端 未结 5 1081
予麋鹿
予麋鹿 2021-02-09 03:45

When outputting HTML, there are several different places where text can be interpreted as control characters rather than as text literals. For example, in \"regular\" text (tha

5条回答
  •  终归单人心
    2021-02-09 04:13

    The above contexts clearly have different rules about what needs to be escaped.

    I'm not sure that the different elements have different encoding rules like you say. All the examples you list require the HTML encoding.

    E.g.

    Fish & Chips

    Awesome picture of Meat Pie & Chips Fish & Chips

    The last example includes some URL Encoding for the ampersand too (&) and its at this point things get hairy (sending an ampersand as data, which is why it must be encoded).

    So my first question is, are there any other contexts in HTML in which characters can be interpreted as markup/control characters?

    Anywhere within the HTML document, if the control characters are not being used as control characters, you should encode them (as a good rule of thumb). Most of the time, its HTML Encoding, & or > etc. Othertimes, when trying to pass these characters via a URL, use URL Encoding %20, %26 etc.

    The second question is, what are the canonical, globally-safe lists of characters (for each context) that need to be escaped to ensure that any embedded text is treated as non-markup?

    I'd say that the Wikipedia article has a few good comments on it and might be worth a read - also the W3 Schools article I guess is a good point. Most languages have built in functions to prepare text as safe HTML, so it may be worth checking your language of choice (if you are indeed even using any scripting languages and not hand coding the HTML).

    Specifically, Wikipedia says: "Characters <, >, " and & are used to delimit tags, attribute values, and character references. Character entity references <, >, " and &, which are predefined in HTML, XML, and SGML, can be used instead for literal representations of the characters."

    For URL Encoding, this article seems a good starting point.

    Closing thoughts as I've already rambled a bit: This is all excluding the thoughts of XML / XHTML which brings a whole other ballgame to the court and its requirement that pretty much the world and its dog needs to be encoded. If you are using a scripting language and writing out a variable via that, I'm pretty sure it'll be easier to find the built in function, or download a library that'll do this for you. :) I hope this answer was scoped ok and didn't miss the point or question or come across in the wrong tone. :)

提交回复
热议问题