Default character encoding for java console output

前端 未结 1 484
星月不相逢
星月不相逢 2020-11-29 11:29

How does Java determine the encoding used for System.out?

Given the following class:

import java.io.File;
import java.io.PrintWriter;

p         


        
相关标签:
1条回答
  • 2020-11-29 12:05

    I'm assuming that your console still runs under cmd.exe. I doubt your console is really expecting UTF-8 - I expect it is really an OEM DOS encoding (e.g. 850 or 437.)

    Java will encode bytes using the default encoding set during JVM initialization.

    Reproducing on my PC:

    java Foo
    

    Java encodes as windows-1252; console decodes as IBM850. Result: Mojibake

    java -Dfile.encoding=UTF-8 Foo
    

    Java encodes as UTF-8; console decodes as IBM850. Result: Mojibake

    cat test.txt
    

    cat decodes file as UTF-8; cat encodes as IBM850; console decodes as IBM850.

    java Foo | cat
    

    Java encodes as windows-1252; cat decodes as windows-1252; cat encodes as IBM850; console decodes as IBM850

    java -Dfile.encoding=UTF-8 Foo | cat
    

    Java encodes as UTF-8; cat decodes as UTF-8; cat encodes as IBM850; console decodes as IBM850

    This implementation of cat must use heuristics to determine if the character data is UTF-8 or not, then transcodes the data from either UTF-8 or ANSI (e.g. windows-1252) to the console encoding (e.g. IBM850.)

    This can be confirmed with the following commands:

    $ java HexDump utf8.txt
    78 78 c3 a4 c3 b1 78 78
    
    $ cat utf8.txt
    xxäñxx
    
    $ java HexDump ansi.txt
    78 78 e4 f1 78 78
    
    $ cat ansi.txt
    xxäñxx
    

    The cat command can make this determination because e4 f1 is not a valid UTF-8 sequence.

    You can correct the Java output by:

    • Setting the console encoding to the system ANSI value
    • Using the Console type
    • Using some shiv layer as you are doing with cat

    HexDump is a trivial Java application:

    import java.io.*;
    class HexDump {
      public static void main(String[] args) throws IOException {
        try (InputStream in = new FileInputStream(args[0])) {
          int r;
          while((r = in.read()) != -1) {
            System.out.format("%02x ", 0xFF & r);
          }
          System.out.println();
        }
      }
    }
    
    0 讨论(0)
提交回复
热议问题