Different behaviour between javac 1.6 and javac 1.7 when handling special characters

百般思念 提交于 2019-12-12 10:51:36

问题


first of all I would like to thank you and to explicitly say that I've been slamming my head on this issue for several days and looking for a solution in other similar threads with no success.

Our application is responsible of generating java classes and some of them may contain special characters in the class name (thus file name) such as ZoneRéservée435.java forcing the encoding to be UTF-8.

Till Java 1.6 the ant task:

<javac source="1.5" target="1.5" srcdir="${src.dir}" destdir="${classes.dir}" deprecation="on" debug="on" classpathref="classpath" fork="false" memoryMaximumSize="512m" encoding="UTF-8">

worked fine.

When moved to java 1.7 the fileName was not getting saved using the UTF-8 encoding resulting in a file name similar to: ZoneRe?serve?e435.java

Looking around I came to understand that I needed to set the env variable LC_CTYPE to UTF-8. That solved the fileName issue but I still get a compilation error

error: class ZoneRéservée435 is public, should be declared in a file named ZoneRéservée435.java

Although they have the same name, they seem to be encoded in two different ways. The interesting part is that this difference of encoding was happening with java 1.6 but was compiling fine.

Does anyone have any suggestion or ideas?

For what I came to understand the encoding issue is related to the fact that the class is generated with the following:

 Writer out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), Charset.forName("UTF-8")));
  • The code inside the file is using U+00E9 to define the special char;
  • The file name uses eU+0301;

Any suggestion on how to deal with this?


回答1:


It seems that your file system uses the decomposed form of the letter é (which is the sequence of the characters e and ´ or \u0065 and \u0301) while your code generator uses the composed form of é (which is \u00e9). This is a typical problem on Apple's HFS+ file system, which always uses the decomposed form.

What you can do to solve this problem is modify your application to decompose the class name that appears in the generated source file with java.text.Normalizer:

Normalizer.normalize(classname, Normalizer.Form.NFD)

See also: http://en.wikipedia.org/wiki/Unicode_equivalence



来源:https://stackoverflow.com/questions/13588940/different-behaviour-between-javac-1-6-and-javac-1-7-when-handling-special-charac

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!