Calling ServletRequest#setCharacterEncoding() will still fail in some cases.
If your container follows the servlet spec carefully (as does tomcat) it will be interpreting post parameters as ISO-8859-1 by default. This may garble UTF-8 characters (such as Japanese in the recent case I worked through) before they ever get to your code, especially if you have a servlet filter that inspects the request parameters with getParameter()
or getParameters()
. Those two methods force decoding of the parameters, and decoding is only ever done once.
Here's a link for how to get around this in Tomcat if you have filters that look at the request parameters. Folks will want to check the docs for their particular container.
http://wiki.apache.org/tomcat/FAQ/CharacterEncoding#Q1
The key bit from that is:
Add
useBodyEncodingForURI="true" URIEncoding="UTF-8"
to the Context element in Tomcat's server.xml and add
<filter>
<filter-name>Character Encoding Filter</filter-name>
<filter-class>org.apache.catalina.filters.SetCharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>Character Encoding Filter</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
as before any filter that calls getParameter() or getParameters() in web.xml. I found that although the link above makes the two attributes to the context element seem like alternatives, the useBodyEncodingForURI one is absolutely necessary or tomcat won't set the encoding for the querystring. From Request.java in tomcat 7.0.42:
boolean useBodyEncodingForURI = connector.getUseBodyEncodingForURI();
if (enc != null) {
parameters.setEncoding(enc);
if (useBodyEncodingForURI) {
parameters.setQueryStringEncoding(enc);
}
} else {
parameters.setEncoding
(org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
if (useBodyEncodingForURI) {
parameters.setQueryStringEncoding
(org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
}
}