问题
I receive data from external Microsoft SQL 2008 Data base (I make Queries with MyBatis). In theroy I receive data encoding on "Windows-1252".
I try decoded data with this code:
String textoFormado = ...value from MyBatis... ;
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");
Almost all the String is correctly decoded. But some letter with acents not.
For Example:
- I Receive from Data base this String: "�vila"
- I use the above code and this make this String: "�?vila"
- I expected this String: "Ávila"
回答1:
Obviously, textoFormado
is a variable of type String
. This means that the bytes were already decoded. Java then internally uses a 16-bit Unicode representation. What you did, is to encode your string with Windows-1252 followed by reading the resulting bytes with an UTF-8 encoding. That does not work.
What you need is the correct encoding when reading the bytes:
byte[] sourceBytes = getRawBytes();
String data = new String(sourceBytes , "Windows-1252");
For using this string inside your program, you do not need to do anything. Simply use it. If - however - you want to write the data back to a file for example, you need to encode again:
byte[] destinationBytes = data.getBytes("UTF-8");
// write bytes to destination file here
回答2:
I solved it thanks to all.
I have the next project structure:
- MyBatisQueries: I have a query with a "select" which gives me the String
- Pojo to save the String (which gave me the String with conversion problems)
- The class which uses the query and the Pojo object with data (that showed me bad decoded)
at first I had (MyBatis and Spring inject dependencies and params):
public class Pojo {
private String params;
public void setParams(String params) {
try {
this.params = params;
}
}
}
The solution:
public class Pojo {
private String params;
public void setParams(byte[] params) {
try {
this.params = new String(params, "UTF-8");
} catch (UnsupportedEncodingException e) {
this.params = null;
}
}
}
来源:https://stackoverflow.com/questions/23082522/java-convert-windows-1252-to-utf-8-some-letters-are-wrong