URL decoding Japanese characters etc. in Java

微笑、不失礼 提交于 2019-12-13 15:00:56

问题


I have a servlet that receives some POST data. Because this data is x-www-form-urlencoded, a string such as サボテン would be encoded to サボテン.

How would I unencode this string back to the correct characters? I have tried using URLDecoder.decode("encoded string", "UTF-8"); but it doesn't make a difference.

The reason I would like to unencode them, is because, before I display this data on a webpage, I escape & to & and at the moment, it is escaping the &s in the encoded string so the characters are not showing up properly.


回答1:


Those are not URL encodings. It would have looked like %E3%82%B5%E3%83%9C%E3%83%86%E3%83%B3. Those are decimal HTML/XML entities. To unescape HTML/XML entities, use Apache Commons Lang StringEscapeUtils.


Update as per the comments: you will get question marks when the response encoding is not UTF-8. If you're using JSP, just add the following line to top of the page:

<%@ page pageEncoding="UTF-8" %>

See for more detail the solutions about halfway this article. I would prefer using-UTF8-all-the-way above fiddling with regexps since regexps doesn't prepare you for world domination.




回答2:


This is a feature/bug of browsers. If a web page is in a limited charset, say ASCII, and users type in some chars outside the charset in a form field, browsers will send these chars in the form of $#xxxx;

It can be a problem because if users actually type $#xxxx; they'll be sent as is. So the server has no way to distinguish the two cases.

The best way is to use a charset that covers all characters, like UTF-8, so browsers won't do this trick.




回答3:


Just a wild guess, but are you using Tomcat?

If so, make sure you have set up the Connector in Tomcat with a URIEncoding of UTF-8. Google that on the web and you will find a ton of hits such as

How to get UTF-8 working in Java webapps?




回答4:


How about a regular expression?

Pattern pattern = Pattern.compile("&([^a][^m][^p][^;])?");
Matcher matcher = pattern.matcher(inputStr);
String output = matcher.replaceAll("&amp;$1");


来源:https://stackoverflow.com/questions/4652365/url-decoding-japanese-characters-etc-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!