Several of our users have asked us to include data relative to their account in the HTTP headers of requests we send them, or even responses they get from our API.
The format for HTTP headers is defined in the HTTP specification. I'm going to talk about HTTP 1.1, for which the specification is RFC 2616. In section 4.2, 'Message Headers', the general structure of a header is defined:
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
field-content =
This definition rests on two main pillars, token and TEXT. Both are defined in section 2.2, 'Basic Rules'. Token is:
token = 1*
In turn resting on CHAR, CTL and separators:
CHAR =
CTL =
separators = "(" | ")" | "<" | ">" | "@"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
TEXT is:
TEXT =
Where LWS is linear white space, whose definition i won't reproduce, and OCTET is:
OCTET =
There is a note accompanying the definition:
The TEXT rule is only used for descriptive field contents and values
that are not intended to be interpreted by the message parser. Words
of *TEXT MAY contain characters from character sets other than ISO-
8859-1 [22] only when encoded according to the rules of RFC 2047
[14].
So, two conclusions. Firstly, it's clear that the header name must be composed from a subset of ASCII characters - alphanumerics, some punctuation, not a lot else. Secondly, there is nothing in the definition of a header value that restricts it to ASCII or excludes 8-bit characters: it's explicitly composed of octets, with only control characters barred (note that CR and LF are considered controls). Furthermore, the comment on the TEXT production implies that the octets are to be interpreted as being in ISO-8859-1, and that there is an encoding mechanism (which is horrible, incidentally) for representing characters outside that encoding.
So, to respond to @BalusC in particular, it's quite clear that according to the specification, header values are in ISO-8859-1. I've sent high-8859-1 characters (specifically, some accented vowels as used in French) in a header out of Tomcat, and had them interpreted correctly by Firefox, so to some extent, this works in practice as well as in theory (although this was a Location header, which contains a URL, and these characters are not legal in URLs, so this was actually illegal, but under a different rule!).
That said, i wouldn't rely on ISO-8859-1 working across all servers, proxies, and clients, so i would stick to ASCII as a matter of defensive programming.