Sending non-ASCII text in HTTP POST header

后端 未结 3 1573
星月不相逢
星月不相逢 2020-12-02 00:00

I am sending a file to a server as an octet-stream, and I need to specify the filename in the header:

String filename = \"«úü¡»¿.doc\"
URL url = new URL(\"ht         


        
相关标签:
3条回答
  • 2020-12-02 00:45

    Actually, you can use non-ASCII characters in header (see RFC 2616):

       message-header = field-name ":" [ field-value ]
       field-name     = token
       field-value    = *( field-content | LWS )
       field-content  = <the OCTETs making up the field-value
                        and consisting of either *TEXT or combinations
                        of token, separators, and quoted-string>
    
       TEXT           = <any OCTET except CTLs,
                        but including LWS>
    
       CTL            = <any US-ASCII control character
                        (octets 0 - 31) and DEL (127)>
    
       LWS            = [CRLF] 1*( SP | HT )
    
       CRLF           = CR LF
    
       CR             = <US-ASCII CR, carriage return (13)>
    
       LF             = <US-ASCII LF, linefeed (10)>
    
       SP             = <US-ASCII SP, space (32)>
    
       HT             = <US-ASCII HT, horizontal-tab (9)>
    
    0 讨论(0)
  • You cannot use non ASCII character in HTTP headers, see the RFC 2616. URI are themselves standardized by RFC 2396 and don't permit non-ASCII either. The RFC says :

    The URI syntax was designed with global transcribability as one of its main concerns. A URI is a sequence of characters from a very limited set, i.e. the letters of the basic Latin alphabet, digits, and a few special characters.

    In order to use non ASCII characters in URI you need to escape them using the %hexcode syntax (see section 2 of RFC 2396).

    In Java you can do this using the java.net.URLEncoder class.

    2020 edit: RFC 2616 has been updated and the relevant section on header syntax is now at https://tools.ietf.org/html/rfc7230#section-3.2

     header-field   = field-name ":" OWS field-value OWS
    
     field-name     = token
     field-value    = *( field-content / obs-fold )
     field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
     field-vchar    = VCHAR / obs-text
    
     obs-fold       = CRLF 1*( SP / HTAB )
                    ; obsolete line folding
                    ; see Section 3.2.4
    

    Where VCHAR is defined in https://tools.ietf.org/html/rfc7230#section-1.2 as "any visible [USASCII] character". With the [USASCII] reference being

    [USASCII]     American National Standards Institute, "Coded Character
                  Set -- 7-bit American Standard Code for Information
                  Interchange", ANSI X3.4, 1986.
    

    The standards are still very clear, HTTP header are still US-ASCII ONLY

    0 讨论(0)
  • 2020-12-02 00:53

    This might help: HTTP headers encoding/decoding in Java

    0 讨论(0)
提交回复
热议问题