问题
I have a file which has characters like: " Joh 1:1 ஆதியிலே வார்த்தை இருந்தது, அந்த வார்த்தை தேவனிடத்திலிருந்தது, அந்த வார்த்தை தேவனாயிருந்தது. "
www.unicode.org/charts/PDF/U0B80.pdf
When I use the following code:
bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out, "UTF8"));
The output is boxes and other weird characters like this:
"�P�^����O֛���;�<�aYՠ؛"
Can anyone help?
these are the complete codes:
File f=new File("E:\\bible.docx");
Reader decoded=new InputStreamReader(new FileInputStream(f), StandardCharsets.UTF_8);
bufferedWriter = new BufferedWriter (new OutputStreamWriter(System.out, StandardCharsets.UTF_8));
char[] buffer = new char[1024];
int n;
StringBuilder build=new StringBuilder();
while(true){
n=decoded.read(buffer);
if(n<0){break;}
build.append(buffer,0,n);
bufferedWriter.write(buffer);
}
The StringBuilder value shows the UTF characters but when displaying it in the window it shows as boxes..
Found the Answer to the problem!!! The Encoding is Correct (i.e UTF-8) Java reads the file as UTF-8 and the String characters are UTF-8, The problem is that there is no font to display it in netbeans' output panel. After changing the font for the output panel (Netbeans->tools->options->misc->output tab) I got the expected result. The same applies when it is displayed in JTextArea(font needs to be changed). But we can't change font the windows' cmd prompt.
回答1:
Because your output is encoded in UTF-8, but still contains the replacement character (U+FFFD
, �), I believe the problem occurs when you read the data.
Make sure that you know what encoding your input stream uses, and set the encoding for the InputStreamReader
according. If that's Tamil, I would guess it's probably in UTF-8. I don't know if Java supports TACE-16. It would look something like this…
StringBuilder buffer = new StringBuilder();
try (InputStream encoded = ...) {
Reader decoded = new InputStreamReader(encoded, StandardCharsets.UTF_8);
char[] buffer = new char[1024];
while (true) {
int n = decoded.read(buffer);
if (n < 0)
break;
buffer.append(buffer, 0, n);
}
}
String verse = buffer.toString();
回答2:
System.out
is too near to the operating system, to be versatile enough. In your case, the NetBeans console probably is using the operating system encoding, and IDE picked font.
Write to a file first. If you make it HTML, you can even double click it, and specify internally the right encoding. Mind using "UTF-8" then, as "UTF8" is Java specific ("UTF-8" can be used in Java too). Maybe with JDesktop.getDesktop().open("... .html");
.
A small JFrame with a JTextPane would do too.
回答3:
It turns out that Tamil is encoded in 16 bits, so just use UTF-16 instead of UTF-8. By doing that I was able to print Tamil text in the Eclipse console.
来源:https://stackoverflow.com/questions/17985026/read-file-and-write-file-which-has-characters-in-utf-8-different-language