DataInputStream and UTF-8

陌路散爱 提交于 2019-11-28 09:31:20

问题


I'm kind of a new programmer, and I'm having a couple of problems with the code I'm handling.

Basically what the code does is receive a form from another JSP, read the bytes, parse the data, and submit the results to SalesForce, using DataInputStream.

   //Getting the parameters from request
 String contentType = request.getContentType();
 DataInputStream in = new DataInputStream(request.getInputStream());
 int formDataLength = request.getContentLength();

 //System.out.println(formDataLength);
 byte dataBytes[] = new byte[formDataLength];
 int byteRead = 0;
 int totalBytesRead = 0;
 while (totalBytesRead < formDataLength) 
 {
  byteRead = in.read(dataBytes, totalBytesRead, formDataLength);
  totalBytesRead += byteRead;
 }

It works fine, but only if the code handles normal characters. Whenever it tries to handle special characters (like french chars: àâäæçéèêëîïôùûü) I get the following gibberish as a result:

à âäæçéèêëîïôùûü

I understand it could be an issue of DataInputStream, and how it doesn't return UTF-8 encoded text. Do you guys offer any suggestions on how to tackle this issue?

All the .jsp files include <%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%> and Tomcat's settings are fine (URI = UTF-8, etc). I tried adding:

request.setCharacterEncoding("UTF-8");

and

response.setCharacterEncoding("UTF-8");

to no avail.

Here's an example of how it parses the data:

    //Getting the notes for the Case 
 String notes = new String(dataBytes);
 System.out.println(notes);
 String savenotes = casetype.substring(notes.indexOf("notes"));
 //savenotes = savenotes.substring(savenotes.indexOf("\n"), savenotes.indexOf("---"));
 savenotes = savenotes.substring(savenotes.indexOf("\n")+1);
 savenotes = savenotes.substring(savenotes.indexOf("\n")+1);
 savenotes = savenotes.substring(0,savenotes.indexOf("name=\"datafile"));
 savenotes = savenotes.substring(0,savenotes.lastIndexOf("\n------"));
 savenotes = savenotes.trim();

Thanks in advance.


回答1:


The problem is not in the inputstreams since they doesn't handle characters, but only bytes. Your problem is at the point you convert those bytes to characters. In this particular case, you need to specify the proper encoding in the String constructor.

String notes = new String(dataBytes, "UTF-8");

See also:

  • Unicode - How to get characters right?

By the way, the DataInputStream has no additional value in the particular code snippet. You can just keep it InputStream.



来源:https://stackoverflow.com/questions/4505057/datainputstream-and-utf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!