UTF-8 CJK characters not displaying in Java

后端 未结 4 555
天涯浪人
天涯浪人 2020-12-10 13:27

I\'ve been reading up on Unicode and UTF-8 encoding for a while and I think I understand it, so hopefully this won\'t be a stupid question:

I have a file which conta

相关标签:
4条回答
  • 2020-12-10 13:41

    The following program prints CJK characters to the console using TextPad. To see the Korean Hangul and Japanese Hiragana I had to tell Java to change the print stream's encoding to EUC_KR and set the properties of TextPad's tool output window:

    • font is Arial Unicode MS
    • script is Hangul

    import java.io.PrintStream;
    import java.io.UnsupportedEncodingException;
    
    class Hangul {
    
        public static void main(String[] args)  throws Exception {
    
            // Change console encoding to Korean
    
            PrintStream out = new PrintStream(System.out, true, "EUC_KR");
            System.setOut(out);
    
            // Print sample to console
    
            String go_hello  = "가다 こんにちは";
            System.out.println(go_hello);
        }
    }
    

    Tool Output is:

    가다 こんにちは

    0 讨论(0)
  • 2020-12-10 13:42
    System.out.println(sb);
    

    The problem is the above line. This will encode character data using the default system encoding and emit the data to STDOUT. On many systems, this is a lossy process.

    If you change the defaults, the encoding used by System.out and the encoding used by the console must match.

    The only supported mechanism to change the default system encoding is via the operating system. (Some will advise using the file.encoding system property, but this is not supported and may have unintended side-effects.) You can use setOut to your own custom PrintStream:

    PrintStream stdout = new PrintStream(System.out, autoFlush, encoding);
    

    You can change the Eclipse console encoding via the Run configuration.

    You can find a number of posts about the subject on my blog - via my profile.

    0 讨论(0)
  • 2020-12-10 13:44

    Yeah, you need to change the encoding of the Eclipse console as explained in this how-to-display-chinese-character-in-eclipse-console article

    0 讨论(0)
  • 2020-12-10 13:55

    Depending on your platform, it is highly likely that your console (or windows CMD) does not support or use the UTF-8 characterset, and therefor converts all unmappable characters to a question mark.

    On Windows for example CMD almost always uses WIN1252 or a similar single byte characterset.

    0 讨论(0)
提交回复
热议问题