How do I properly set the default character encoding used by the JVM (1.5.x) programmatically?
I have read that -Dfile.encoding=whatever
used to be the
I think a better approach than setting the platform's default character set, especially as you seem to have restrictions on affecting the application deployment, let alone the platform, is to call the much safer String.getBytes("charsetName")
. That way your application is not dependent on things beyond its control.
I personally feel that String.getBytes()
should be deprecated, as it has caused serious problems in a number of cases I have seen, where the developer did not account for the default charset possibly changing.
We were having the same issues. We methodically tried several suggestions from this article (and others) to no avail. We also tried adding the -Dfile.encoding=UTF8
and nothing seemed to be working.
For people that are having this issue, the following article finally helped us track down describes how the locale setting can break unicode/UTF-8
in Java/Tomcat
http://www.jvmhost.com/articles/locale-breaks-unicode-utf-8-java-tomcat
Setting the locale correctly in the ~/.bashrc
file worked for us.
I can't answer your original question but I would like to offer you some advice -- don't depend on the JVM's default encoding. It's always best to explicitly specify the desired encoding (i.e. "UTF-8") in your code. That way, you know it will work even across different systems and JVM configurations.
I have a hacky way that definitely works!!
System.setProperty("file.encoding","UTF-8");
Field charset = Charset.class.getDeclaredField("defaultCharset");
charset.setAccessible(true);
charset.set(null,null);
This way you are going to trick JVM which would think that charset is not set and make it to set it again to UTF-8, on runtime!
I have tried a lot of things, but the sample code here works perfect. Link
The crux of the code is:
String s = "एक गाव में एक किसान";
String out = new String(s.getBytes("UTF-8"), "ISO-8859-1");
Not clear on what you do and don't have control over at this point. If you can interpose a different OutputStream class on the destination file, you could use a subtype of OutputStream which converts Strings to bytes under a charset you define, say UTF-8 by default. If modified UTF-8 is suffcient for your needs, you can use DataOutputStream.writeUTF(String)
:
byte inbytes[] = new byte[1024];
FileInputStream fis = new FileInputStream("response.txt");
fis.read(inbytes);
String in = new String(inbytes, "UTF8");
DataOutputStream out = new DataOutputStream(new FileOutputStream("response-2.txt"));
out.writeUTF(in); // no getBytes() here
If this approach is not feasible, it may help if you clarify here exactly what you can and can't control in terms of data flow and execution environment (though I know that's sometimes easier said than determined). Good luck.