to read unicode character in java

前端 未结 3 1015
你的背包
你的背包 2021-01-13 02:24

i am trying to read Unicode characters from a text file saved in utf-8 using java my text file is as follows

अ, अदेबानि ,अन, अनसुला, अनसुलि, अनफावरि, अनजालु, अ

相关标签:
3条回答
  • 2021-01-13 02:54

    You're reading it correctly - the problem is almost certainly just that your console can't handle the text. The simplest way to verify this is to print out each char within the string. For example:

    public static void dumpString(String text) {
        for (int i = 0; i < text.length(); i++) {
            char c = text.charAt(i);
            System.out.printf("%c - %04x\n", c, (int) c);
        }
    }
    

    You can then verify that each character is correct using the Unicode code charts.

    Once you've verified that you're reading the file correctly, you can then work on the output side of things - but it's important to try to focus on one side of it at a time. Trying to diagnose potential failures in both input and output encodings at the same time is very hard.

    0 讨论(0)
  • 2021-01-13 02:56

    If you are reading the text properly using UTF-8 encoding then make sure that your console also supports UTF-8. In case you are using eclipse then you can enable UTF-8 encoding foryour console by:

    Run Configuration->Common -> Encoding -> Select UTF 8
    

    Here is the eclipse screenshot.

    enter image description here

    0 讨论(0)
  • 2021-01-13 03:09

    You are (most likely) reading the text correctly, but when you write it out, you also need to enable UTF-8. Otherwise every character that cannot be printed in your default encoding will be turned into question marks.

    Try writing it to a File instead of System.out (and specify the proper encoding):

    Writer w = new OutputStreamWriter(
       new FileOutputStream("x.txt"), "UTF-8");
    
    0 讨论(0)
提交回复
热议问题