How to process encoded unicode text in servlet?

穿精又带淫゛_ 提交于 2019-12-08 02:15:59

问题


I am hitting my servlet URL from external source. One of the parameter is having Hindi text. The external source is encoding it. The encoded value is.

%E0%A4%AA%E0%A4%BE%E0%A4%A0%E0%A5%8D%E0%A4%AF%20%E0%A4%AD%E0%A4%BE%E0%A4%97

I can see it in TCP dump via wireshark. But I am not getting this encoded string in servlet application. I am trying to get it via getParameter() method. It's returning some random characters.

Since I am not getting correct value, so if I try to decode it in my servlet class with the use of

URLDecoder.decode(myString, "UTF-8");

Then it's returning some random characters, like this -

विषय वस�त�

Please suggest me how to read in servlet this encoded text and decode back to original value.


回答1:


I am trying to get it via getParameter() method.

getParameter and handling of input encodings in Servlet is broken in general. You get ISO-8559-1 whether you want it or not (and you generally don't).

You can work around this and get UTF-8 for query string parameters by:

  1. Container-specific configuration options (eg Tomcat URIEncoding).

  2. Grabbing the raw request.getQueryString() and passing its pieces into URLDecoder.decode(..., "utf-8") manually instead of relying on getParameter. Only if you are taking this route do you need to worry about URLDecoder yourself.

  3. Fixing up the mis-decoding of the getParameter output by encoding the bad value back to the original bytes it came from (using ISO-8859-1) and then decoding it as UTF-8, eg new String(request.getParameter("param").getBytes("iso-8859-1"), "utf-8").

See this question for background.




回答2:


I've tried this:

try {
    System.out.println(URLDecoder.decode("%E0%A4%AA%E0%A4%BE%E0%A4%A0%E0%A5%8D%E0%A4%AF%20%E0%A4%AD%E0%A4%BE%E0%A4%97", "UTF-8"));
} 
catch (Exception e) {
    e.printStackTrace();
}

... and it works for me, Hindi characters, no exception thrown.

Make sure your console is outputting in UTF-8, it's probably in a different encoding.

Edit

In Eclipse:

Run

Run Configurations...

"Commmon" tab

Encoding

[select UTF-8]

Edit II

Example code in the processRequest of your HttpServlet class:

response.setContentType("text/html;charset=UTF-8");
String argument = request.getParameter("argument");
String decoded;
if (argument != null) {
    decoded = URLDecoder.decode(argument, "UTF-8");
}
else {
    decoded = "null";
}
PrintWriter out = response.getWriter();
try {
    out.println("<!DOCTYPE html>");
    out.println("<html>");
    out.println("<head>");
    out.println("<title>Servlet TestServlet</title>");            
    out.println("</head>");
    out.println("<body>");
    out.println("<h1>The argument's value is: " + decoded + "</h1>");
    out.println("</body>");
    out.println("</html>");
} finally {            
    out.close();
}

Output:



来源:https://stackoverflow.com/questions/17212353/how-to-process-encoded-unicode-text-in-servlet

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!