问题
I have a servlet that receives some POST data. Because this data is x-www-form-urlencoded, a string such as サボテン would be encoded to サボテン.
How would I unencode this string back to the correct characters? I have tried using URLDecoder.decode("encoded string", "UTF-8");
but it doesn't make a difference.
The reason I would like to unencode them, is because, before I display this data on a webpage, I escape & to & and at the moment, it is escaping the &s in the encoded string so the characters are not showing up properly.
回答1:
Those are not URL encodings. It would have looked like %E3%82%B5%E3%83%9C%E3%83%86%E3%83%B3
. Those are decimal HTML/XML entities. To unescape HTML/XML entities, use Apache Commons Lang StringEscapeUtils.
Update as per the comments: you will get question marks when the response encoding is not UTF-8. If you're using JSP, just add the following line to top of the page:
<%@ page pageEncoding="UTF-8" %>
See for more detail the solutions about halfway this article. I would prefer using-UTF8-all-the-way above fiddling with regexps since regexps doesn't prepare you for world domination.
回答2:
This is a feature/bug of browsers. If a web page is in a limited charset, say ASCII, and users type in some chars outside the charset in a form field, browsers will send these chars in the form of $#xxxx;
It can be a problem because if users actually type $#xxxx;
they'll be sent as is. So the server has no way to distinguish the two cases.
The best way is to use a charset that covers all characters, like UTF-8, so browsers won't do this trick.
回答3:
Just a wild guess, but are you using Tomcat?
If so, make sure you have set up the Connector in Tomcat with a URIEncoding of UTF-8. Google that on the web and you will find a ton of hits such as
How to get UTF-8 working in Java webapps?
回答4:
How about a regular expression?
Pattern pattern = Pattern.compile("&([^a][^m][^p][^;])?");
Matcher matcher = pattern.matcher(inputStr);
String output = matcher.replaceAll("&$1");
来源:https://stackoverflow.com/questions/4652365/url-decoding-japanese-characters-etc-in-java