i am converting a project from Ant to Maven and i\'m having problems with a specific unit test which deals with UTF-8 characters. The problem is about the following String:<
I have found a "solution" myself:
I had to pass the encoding into the maven-surefire-plugin, but the usual
<encoding>${project.build.sourceEncoding}</encoding>
did not work. I still have no idea why, but when i pass the command line arguments into the plugin, the tests works as they should:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.15</version>
<configuration>
<argLine>-Dfile.encoding=UTF-8</argLine>
</configuration>
</plugin>
Thanks for all your responses and additional comments!
this works for me:
...
<properties>
**<project.build.sourceEncoding>ISO-8859-1</project.build.sourceEncoding>
<project.reporting.outputEncoding>ISO-8859-1</project.reporting.outputEncoding>**
</properties>
...
<build>
<finalName>Project</finalName>
<sourceDirectory>src</sourceDirectory>
<plugins>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>2.3.2</version>
<configuration>
<source>1.6</source>
<target>1.6</target>
**<encoding>${project.build.sourceEncoding}</encoding>**
</configuration>
</plugin>
<plugin>
<artifactId>maven-war-plugin</artifactId>
<version>2.2</version>
<configuration>
<warSourceDirectory>WebContent</warSourceDirectory>
</configuration>
</plugin>
</plugins>
</build>
I had a really resilient problem of this kind and setting environmental variable
MAVEN_OPTS=-Dfile.encoding=UTF-8
fixed the issue for me.
When debugging Unicode problems, make sure you convert everything to ASCII so you can read and understand what is inside of a String without guesswork. This means you should use, for example, StringEscapeUtils
from commons-lang3 to turn ä
into \u00e4
. That way, you can be sure that you see ?
because the console can't print it. And you can distinguish " " (\u0020
) from " " (\u00a0
)
In the test case, check the escaped version of the inputs as early as possible to make sure the data is actually what you expect.
So the code above should be:
assertEquals("\u010d\u00e4\u....", escape(l_string));
Make sure you use the correct encoding for file I/O. Never use the default encoding of Java, always use InputStreamReader
/OutputStreamWriter
and specify the encoding to use.
The POM looks correct. Run mvn
with -X
to make sure it picks up the correct options and runs the Java compiler using the correct options. mvn help:effective-pom
might also help.
Disassemble the class file to check the strings. Java will use ?
to denote that it couldn't read something.
If you get the ?
from System.out.println( ">>> " + l_string );
, this means the code wasn't compiled with UTF-8 or that the source file was maybe saved with another Unicode encoding (UTF-16 or similar).
Another source of problems could be the properties file. Make sure it was saved with ISO-8859-1 and that it wasn't modified by the compilation process.
Make sure Maven actually compiles your file. Use mvn clean
to force a full-recompile.
Your problem is not the encoding of the source file (and therefore the String inside your class file) but the Problem is the encoding of System.out
's implicite PrintStream
. It uses file.encoding
which represents the System encoding, and this is in Windows the ANSI codepage.
You would have to set up a PrintWriter
with the OEM code page (or you use the class which is intended for this: Console).
See also various bugs around this in: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4153167