I have a very simple bit of Scala code
var str = \"≤\"
for( ch <- str ) { printf(\"%d, %x\", ch.toInt, ch.toInt) ; println }
println
str = \"\\u2264
To answer my own questions:
Does the scala compiler work with UTF-8 encoded files?
Yes, but only if it knows they are UTF-8 encoded. In the absence of any other evidence, it uses Java's file.encoding
property. (Thanks to @AndreasNeumann for this part of the answer.)
Why did my program not behave as I expected?
Because my file.encoding
property was set to MacRoman
. Even though I had told eclipse that the file is UTF-8, this information was not communicated to the Scala compiler. Thus the compiler interpreted the 3 byte sequence E2 89 A4 as a three character sequence according to the MacRoman
encoding: a lower single quote (which looks a lot like a comma), an "a" circumflex, and a section symbol. The unicode for this 3 character sequence was U+201A U+00E2 U+00A7, which explains the output of my program.
How do you fix the problem?
On the command line for scalac use the option -encoding UTF-8
. In eclipse you can use the preferences (options) for the Scala plugin to add this option. (Thanks to @Jesper for this part of the answer.) You can also use the -D
option either on the scalac
command line or via theJAVA_OPTS
environment variable to set the file.encoding
property. (See the answer of @AndreasNeumann for details.)
If you use the Scala IDE for Eclipse, there are at least three things you can do.
Properties
), under the Resource
preferences, select UTF-8 as the Text file encoding
.-encoding UTF-8
under additional command line parameters
under Compiler >> Scala in the preferences (or options). You can set this as a global preference (or option) or as a project specific property setting.
Yes Scala fully supports UTF-8.
I can't reproduce your results. MacOS X, Java 7, Scala 2.10.4.
Check the file encoding of your system:
scala> System.getProperty("file.encoding")
res0: String = UTF-8
Add this line to your .bashrc . This might fix the problem in some *nix environments.
export JAVA_OPTS='-Dfile.encoding=UTF-8'
Sometimes the IDE is set to the wrong file encoding. You could check this also.
The Scala plugin respects the encoding settings of Eclipse. You can set the workspace default in Preferences. If that doesn't trickle down to your sources, check if there is an overriding encoding at the project or source folder level.
For example, here is the property page of a source folder: