Does the Scala compiler work with UTF-8 encoded source files?

后端 未结 3 2064
星月不相逢
星月不相逢 2020-12-19 20:45

I have a very simple bit of Scala code

 var str = \"≤\"
 for( ch <- str ) { printf(\"%d, %x\", ch.toInt, ch.toInt) ; println  }
 println
 str = \"\\u2264         


        
相关标签:
3条回答
  • 2020-12-19 21:18

    To answer my own questions:

    Does the scala compiler work with UTF-8 encoded files?

    Yes, but only if it knows they are UTF-8 encoded. In the absence of any other evidence, it uses Java's file.encoding property. (Thanks to @AndreasNeumann for this part of the answer.)

    Why did my program not behave as I expected?

    Because my file.encoding property was set to MacRoman. Even though I had told eclipse that the file is UTF-8, this information was not communicated to the Scala compiler. Thus the compiler interpreted the 3 byte sequence E2 89 A4 as a three character sequence according to the MacRoman encoding: a lower single quote (which looks a lot like a comma), an "a" circumflex, and a section symbol. The unicode for this 3 character sequence was U+201A U+00E2 U+00A7, which explains the output of my program.

    How do you fix the problem?

    On the command line for scalac use the option -encoding UTF-8. In eclipse you can use the preferences (options) for the Scala plugin to add this option. (Thanks to @Jesper for this part of the answer.) You can also use the -D option either on the scalac command line or via theJAVA_OPTS environment variable to set the file.encoding property. (See the answer of @AndreasNeumann for details.)

    If you use the Scala IDE for Eclipse, there are at least three things you can do.

    • One is to set the default encoding for all your workspaces under General >> Workspace in Eclipse's global preferences (or options), as shown in Iulian Dragos's answer.
    • In the project properties (right-click on the project in the Package Explorer an select Properties), under the Resource preferences, select UTF-8 as the Text file encoding.
    • Finally, you can add -encoding UTF-8 under additional command line parameters under Compiler >> Scala in the preferences (or options). You can set this as a global preference (or option) or as a project specific property setting. Image of Eclipse preferences dialog
    0 讨论(0)
  • 2020-12-19 21:24

    Yes Scala fully supports UTF-8.

    I can't reproduce your results. MacOS X, Java 7, Scala 2.10.4.

    Check the file encoding of your system:

    scala> System.getProperty("file.encoding")
    res0: String = UTF-8
    

    Add this line to your .bashrc . This might fix the problem in some *nix environments.

    export JAVA_OPTS='-Dfile.encoding=UTF-8'
    

    Sometimes the IDE is set to the wrong file encoding. You could check this also.

    0 讨论(0)
  • 2020-12-19 21:24

    The Scala plugin respects the encoding settings of Eclipse. You can set the workspace default in Preferences. If that doesn't trickle down to your sources, check if there is an overriding encoding at the project or source folder level.

    Workspace Preferences

    For example, here is the property page of a source folder:

    enter image description here

    0 讨论(0)
提交回复
热议问题