Does the Scala compiler work with UTF-8 encoded source files?

后端未结

关注

 3  2062

星月不相逢 2020-12-19 20:45

I have a very simple bit of Scala code

 var str = \"≤\"
 for( ch <- str ) { printf(\"%d, %x\", ch.toInt, ch.toInt) ; println  }
 println
 str = \"\\u2264


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   时光说笑
                                             
                
                
                (楼主)
            
              
              
                2020-12-19 21:18
              

            
            
                        
To answer my own questions:


  Does the scala compiler work with UTF-8 encoded files?


Yes, but only if it knows they are UTF-8 encoded.  In the absence of any other evidence, it uses Java's file.encoding property. (Thanks to @AndreasNeumann for this part of the answer.)


  Why did my program not behave as I expected?


Because my file.encoding property was set to MacRoman. Even though I had told eclipse that the file is UTF-8, this information was not communicated to the Scala compiler. Thus the compiler interpreted the 3 byte sequence E2 89 A4 as a three character sequence according to the MacRoman encoding: a lower single quote (which looks a lot like a comma), an "a" circumflex, and a section symbol. The unicode for this 3 character sequence was U+201A U+00E2 U+00A7, which explains the output of my program.


  How do you fix the problem?


On the command line for scalac use the option -encoding UTF-8.  In eclipse you can use the preferences (options) for the Scala plugin to add this option. (Thanks to @Jesper for this part of the answer.) You can also use the -D option either on the scalac command line or via theJAVA_OPTS environment variable to set the file.encoding property. (See the answer of @AndreasNeumann for details.)

If you use the Scala IDE for Eclipse, there are at least three things you can do.


One is to set the default encoding for all your workspaces under General >> Workspace in Eclipse's global preferences (or options), as shown in Iulian Dragos's answer.
In the project properties (right-click on the project in the Package Explorer an select Properties), under the Resource preferences, select UTF-8 as the Text file encoding.
Finally, you can add -encoding UTF-8 under additional command line parameters under Compiler >> Scala in the preferences (or options). You can set this as a global preference (or option) or as a project specific property setting.


    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复