For HTTP responses with Content-Types suggesting character data, which charset should be assumed by the client if none is specified?

后端 未结 6 1939
说谎
说谎 2021-02-07 23:08

If no charset parameter is specified in the Content-Type header, RFC2616 section 3.7.1 seems to imply ISO8859-1 should be assumed for media types of subtype "text":

6条回答
  •  失恋的感觉
    2021-02-07 23:31

    In the absense of the charset parameter, the character encoding can be specified in the content. Here are some approaches taken by several content types:

    HTML - Via the meta tag:

    
    

    HTML5 variant:

    
    

    XML (XHTML, KML) - Via the XML declaration:

    
    

    Text - Via the Byte order mark. For example, for UTF-8 the first three bytes of a file in hexadecimal:

    EF BB BF
    

    As distinct from the character set associated with the document, note also that non-ASCII characters can be encoded via ASCII character sequences using various approaches:

    HTML - Via character references:

    &#nnnn;
    &#xhhhh;
    

    XML - Via character references:

    &
    &defined-entity;
    

    JSON - Via the escaping mechanism:

    \u005C
    \uD834\uDD1E
    

    Now, with respect to the the HTTP 1.1 protocol, RFC 2616 says this about charset:

    The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems.

    So, my interpretation of the above is that one cannot assume a default character set except for media subtypes of the type "text." Of course, we live in the real world and implementers do not always follow the rules. As described in the accepted answer, the various web browser vendors have implemented their own strategies for determining the document character set when it is not explicitly specified. One can assume that vendors of other clients (e.g., Google Earth) also implement their own strategies.

提交回复
热议问题