URL decode ä -> ã1⁄4

好久不见. 提交于 2020-01-17 05:51:20

问题


I have the problem that the decoding from a URL causes some major problems. The request URL contains %C3%BC as the letter 'ü'. The decoding server side should now decode it as an ü, but it does this: ü

decoding is done like this:

decoded = URLDecoder.decode(value, "UTF-8");

while value contains '%C3%BC' and decoded should now conatain 'ü', but that's where the problem is. What's going wrong here? I use this method in more than one application and it works fine in all other cases...


回答1:


I don't have enough reputation yet to comment, so I'll have to make this as close to an answer as possible.

If you're using a servlet, and "value" is something that you got from calling getParameter() on the servlet, then it has already been decoded (rightly or wrongly) by the servlet container. (Tomcat?)

Likewise if it's part of the path. Your servlet container probably decoded it assuming that the percent-encoded bytes were ISO-8859-1, which is the default setting for Tomcat. See the document for the URIEncoding attribute of the Connector element in Tomcat's server.xml file, if that's what appserver you're using. If you set it to UTF-8, Tomcat will assume that percent-encoded bytes represent UTF-8 text.




回答2:


You are probably outputting the value wrong. First decoded.length() (assumedly 1) gives a fair indication; you could dump it too, Arrays.toString(decoded.toCharArray()).

In the IDE console under Windows you could see something like that mess for a Windows single byte ANSI encoding.

For the rest take care of:

String s;
byte[] b;

s.getBytes()   ->   s.getBytes(StandardCharsets.UTF_8)
                    s.getBytes("Cp1252")   // Windows Latin-1

new String(b)  ->   new String(b, StandardCharsets.UTF_8)


来源:https://stackoverflow.com/questions/24991491/url-decode-%c3%a4-%c3%a31%e2%81%844

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!