URL decode ä -> ã1⁄4

问题

I have the problem that the decoding from a URL causes some major problems. The request URL contains %C3%BC as the letter 'ü'. The decoding server side should now decode it as an ü, but it does this: Ã¼

decoding is done like this:

decoded = URLDecoder.decode(value, "UTF-8");

while value contains '%C3%BC' and decoded should now conatain 'ü', but that's where the problem is. What's going wrong here? I use this method in more than one application and it works fine in all other cases...

回答1:

I don't have enough reputation yet to comment, so I'll have to make this as close to an answer as possible.

If you're using a servlet, and "value" is something that you got from calling getParameter() on the servlet, then it has already been decoded (rightly or wrongly) by the servlet container. (Tomcat?)

Likewise if it's part of the path. Your servlet container probably decoded it assuming that the percent-encoded bytes were ISO-8859-1, which is the default setting for Tomcat. See the document for the URIEncoding attribute of the Connector element in Tomcat's server.xml file, if that's what appserver you're using. If you set it to UTF-8, Tomcat will assume that percent-encoded bytes represent UTF-8 text.

回答2:

You are probably outputting the value wrong. First decoded.length() (assumedly 1) gives a fair indication; you could dump it too, Arrays.toString(decoded.toCharArray()).

In the IDE console under Windows you could see something like that mess for a Windows single byte ANSI encoding.

For the rest take care of:

String s;
byte[] b;

s.getBytes()   ->   s.getBytes(StandardCharsets.UTF_8)
                    s.getBytes("Cp1252")   // Windows Latin-1

new String(b)  ->   new String(b, StandardCharsets.UTF_8)

来源：https://stackoverflow.com/questions/24991491/url-decode-%c3%a4-%c3%a31%e2%81%844

标签

java

url

utf-8

decoding