问题
I have a JVM. where character set as "-Dfile.encoding=UTF-8" . This is how UTF-8 is set. I would want to set it to a non Unicode character set.
Is there an example/value for non unicode character set so that I can set to -Dfile.encoding=
?
回答1:
[ TLDR => Application encoding a confusing issue, but this document from Oracle should help. ]
First a few important general points about specifying the encoding by setting the System Property file.encoding
at run time:
It's use is not formally supported, and never has been. From a Java Bug Report in 1998:
The "file.encoding" property is not required by the J2SE platform specification; it's an internal detail of Sun's implementations and should not be examined or modified by user code. It's also intended to be read-only; it's technically impossible to support the setting of this property to arbitrary values on the command line or at any other time during program execution.
There is a draft JEP (JDK Enhancement Proposal), JDK-8187041 Use UTF-8 as default Charset, which proposes:
Use UTF-8 as the Java virtual machine's default charset so that APIs that depend on the default charset behave consistently across all platforms.
It doesn't necessarily make sense to claim that "This application uses encoding {x}" since there may be multiple encodings associated with an application, which can be addressed in different ways, including:
- The file encoding for console output.
- The file encoding of the application's source files.
- The file encoding(s) for file I/O.
- The file encoding of file paths.
All that said, Oracle specify all encodings supported by Java SE 8. I can't find a corresponding document for more recent JDK versions. Note that:
- Encodings can be environment specific, based on locale, operating system, Java version, etc.
- Almost every encoding has at least one alias. For example, the encoding name for simplified Chinese is GBK, but you could also use CP936 or windows-936.
- Most encoding are non Unicode since Unicode encoding names contain the string "UTF".
- An encoding name can vary depending on how the application is processing files (
java.nio
APIs vs.java.io
/java.lang
APIs.). For example, if performing some I/O on Turkish files on Windows:- If the
java.nio.*
classes are used, specify -Dfile.encoding=windows-1254 at runtime. - If the
java.lang.*
&java.io.*
classes are used, specify -Dfile.encoding=Cp1254 at runtime.
- If the
This DZone article provides a useful piece of code to show how setting -Dfile.encoding at runtime can impact various settings:
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.util.Locale;
import static java.lang.System.out;
/**
* Demonstrate default Charset-related details.
*/
public class CharsetDemo
{
/**
* Supplies the default encoding without using Charset.defaultCharset()
* and without accessing System.getProperty("file.encoding").
*
* @return Default encoding (default charset).
*/
public static String getEncoding()
{
final byte [] bytes = {'D'};
final InputStream inputStream = new ByteArrayInputStream(bytes);
final InputStreamReader reader = new InputStreamReader(inputStream);
final String encoding = reader.getEncoding();
return encoding;
}
public static void main(final String[] arguments)
{
out.println("Default Locale: " + Locale.getDefault());
out.println("Default Charset: " + Charset.defaultCharset());
out.println("file.encoding; " + System.getProperty("file.encoding"));
out.println("sun.jnu.encoding: " + System.getProperty("sun.jnu.encoding"));
out.println("Default Encoding: " + getEncoding());
}
}
Here's some sample output when specifying -Dfile.encoding=860 (an alias for MS-DOS Portuguese) using Java 12 on Windows 10:
run:
Default Locale: en_US
Default Charset: IBM860
file.encoding: 860
sun.jnu.encoding: Cp1252
Default Encoding: Cp860
BUILD SUCCESSFUL (total time: 0 seconds)
Test the encoding you plan to specify at run time on all target platforms. You may get unexpected results. For example, when I run the code above on Windows 10 with -Dfile.encoding=IBM864 (PC Arabic) it works, but fails with -Dfile.encoding=IBM420 (IBM Arabic).
来源:https://stackoverflow.com/questions/56028048/what-is-an-example-for-non-unicode-character-set-for-dfile-encoding