问题
Hello and thank you for reading my post.
My problem is the following: I want to compile a Java source file with "javac" with this file being UTF-8 encoded with a BOM (the OS is WinXP).
Below is what I do:
1) Create a file with "Notepad" and choose the UTF-8 encoding
dos> notepad Test.java
"File -> Save as..."
File name : Test.java
Save as type: All Files
Encoding : UTF-8
Save
2) Create a Java class in that file and saved the file like in 1)
public class Test
{
public static void main(String [] args)
{
System.out.println("This is a test.");
}
}
3) Visualize the hexadecimal version of the file (first line)
dos> xxd Test.java | head -1
0000000: efbb bf70 7562 6c69 6320 636c 6173 7320 ...public class
Note: ef bb bf is the UTF-8 encoded BOM (the UTF-16 encoded BOM being FE FF).
4) Try to compile this code with "javac"
dos> javac -encoding utf8 Test.java
Test.java:1: illegal character: \65279
?public class Test
^
1 error
Note: 65279 is the decimal version of the BOM.
My question is the following: how can I make this compiling work with:
- keeping it UTF-8 encoded
- and keeping the BOM?
Thank you for helping and best regards.
Léa
回答1:
Trim the BOM and then use javac -encoding utf8 x.java
回答2:
This isn't a problem with your text editor, it's a problem with javac ! The Unicode spec says BOM is optionnal in UTF-8, it doesn't say it's forbidden ! If a BOM can be there, then javac HAS to handle it, but it doesn't. Actually, using the BOM in UTF-8 files IS useful to distinguish an ANSI-coded file from an Unicode-coded file.
The proposed solution of removing the BOM is only a workaround and not the proper solution.
This bug report indicates that this "problem" will never be fixed : http://bugs.java.com/view_bug.do?bug_id=4508058
Since this thread is in the top 2 google results for the "javac BOM" search, I'm leaving this here for future readers.
回答3:
https://stackoverflow.com/a/28043356/7050261
Actually, using the BOM in UTF-8 files IS useful to distinguish an ANSI-coded file from an Unicode-coded file.
Actually
BOM is not about distinguishing ANSI and Unicode. Do not use a feature on purpose it is not designed for.
UTF-8 was designed to be backward-compatible with ANSI intentionally, so a lot of code written to process formatted text relied on 0..127 bytes only (XML, JSON, etc.) should work correctly with UTF-8 encoded text without any modifications.
来源:https://stackoverflow.com/questions/9811382/compiling-javac-a-utf8-encoded-java-source-code-with-a-bom