Does the Scala compiler work with UTF-8 encoded source files?

后端 未结 3 2062
星月不相逢
星月不相逢 2020-12-19 20:45

I have a very simple bit of Scala code

 var str = \"≤\"
 for( ch <- str ) { printf(\"%d, %x\", ch.toInt, ch.toInt) ; println  }
 println
 str = \"\\u2264         


        
3条回答
  •  时光说笑
    2020-12-19 21:18

    To answer my own questions:

    Does the scala compiler work with UTF-8 encoded files?

    Yes, but only if it knows they are UTF-8 encoded. In the absence of any other evidence, it uses Java's file.encoding property. (Thanks to @AndreasNeumann for this part of the answer.)

    Why did my program not behave as I expected?

    Because my file.encoding property was set to MacRoman. Even though I had told eclipse that the file is UTF-8, this information was not communicated to the Scala compiler. Thus the compiler interpreted the 3 byte sequence E2 89 A4 as a three character sequence according to the MacRoman encoding: a lower single quote (which looks a lot like a comma), an "a" circumflex, and a section symbol. The unicode for this 3 character sequence was U+201A U+00E2 U+00A7, which explains the output of my program.

    How do you fix the problem?

    On the command line for scalac use the option -encoding UTF-8. In eclipse you can use the preferences (options) for the Scala plugin to add this option. (Thanks to @Jesper for this part of the answer.) You can also use the -D option either on the scalac command line or via theJAVA_OPTS environment variable to set the file.encoding property. (See the answer of @AndreasNeumann for details.)

    If you use the Scala IDE for Eclipse, there are at least three things you can do.

    • One is to set the default encoding for all your workspaces under General >> Workspace in Eclipse's global preferences (or options), as shown in Iulian Dragos's answer.
    • In the project properties (right-click on the project in the Package Explorer an select Properties), under the Resource preferences, select UTF-8 as the Text file encoding.
    • Finally, you can add -encoding UTF-8 under additional command line parameters under Compiler >> Scala in the preferences (or options). You can set this as a global preference (or option) or as a project specific property setting. Image of Eclipse preferences dialog

提交回复
热议问题