I\'m a bit new to java, When I assign a unicode string to
String str = \"\\u0142o\\u017Cy\\u0142\";
System.out.println(str);
final StringBuilder stri
I think its just "UTF8" not "UTF-8".
Here I saw it: Source
Your code should be correct, but I guess that the file "a.txt" does not contain the Unicode characters encoded with UTF-8, but the escaped string "\u0142o\u017Cy\u0142".
Please check if the text file is correct, using an UTF-8 aware editor such as recent versions of Notepad or Notepad++ on Windows. Or edit it with your favorite hex editor - it should not contain backslashes.
I tried it with "€" as UTF-8-encoded content of the file and it gets printed correctly. Note that not all Unicode characters can be printed, depending on your terminal encoding (really a hassle on Windows) and font.
I posted Java code to unescape (“descape”?) such things and many others in this answer.
Java interprets unicode escapes such as your \u0142
that are in the source code as if you had actually typed that character (latin small letter L with stroke) into the source.
Java does not interpret unicode escapes that it reads from a file.
If you take your String str = "\u0142o\u017Cy\u0142";
and write it to a file a.txt
from your Java program, then open the file in an editor, you'll see the characters themselves in the file, not the \uNNNN sequence.
If you then take your original posted program and read that a.txt
file you should see what you expected.
So, you want to unescape unicode codepoints? There is no public API available for this. The java.util.Properties has a loadConvert()
method which does exactly this, but it's private
. Check the Java source for the case you'd like to reuse this. It's doing the conversion by simple parsing. I wouldn't use regex for this since this is too error prone in very specific circumstances.
Or you should probably after all be using java.util.Properties
or its i18n counterpart java.util.ResourceBundle with a .properties
file instead of a plain .txt
file.
You have used FileInputStream and is a byte code reader not character reader. Try using FileReader instead
something like:
BufferedReader inputStream = new BufferedReader(new FileReader("C:/a.txt"));
then you can use the line oriented I/O BufferedReader to read each line. FileInputREader is a low level I/O that you should avoid. You're writing the characters to your file not the bytes, the best approach is to use character streams. for wrinting and reading unless you need to write bytes/binary data.