how to display and input chinese (and other non-ASCII) character in r console?

前端 未结 1 1365
难免孤独
难免孤独 2021-02-06 00:45

My system: win7 ultimate 64 english version + r-3.1(64) .
Here is my sessionInfo.

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw3         


        
1条回答
  •  忘了有多久
    2021-02-06 00:58

    It is probably not very well documented, but you want to use setlocale in order to use Chinese. And the method applies to many other languages as well. The solution is not obvious as the official document of setlocale didn't specifically mentioned it as a method to solve the display issues.

    > print('ÊÔÊÔ') #试试, meaning let's give it a shot in Chinese
    [1] "ÊÔÊÔ" #won't show up correctly
    > Sys.getlocale()
    [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
    > Sys.setlocale(category = "LC_ALL", locale = "chs") #cht for traditional Chinese, etc.
    [1] "LC_COLLATE=Chinese_People's Republic of China.936;LC_CTYPE=Chinese_People's Republic of China.936;LC_MONETARY=Chinese_People's Republic of China.936;LC_NUMERIC=C;LC_TIME=Chinese_People's Republic of China.936"
    > print('试试')
    [1] "试试"
    > read.table("c:/CHS.txt",sep=" ") #Chinese: the 1st record/observation
      V1   V2  V3 V4  V5   V6
    1 122 第一 122 条 122 记录 
    

    If you just want to change the display encoding, without changing other aspects of locales, use LC_CTYPE instead of LC_ALL:

    > Sys.setlocale(category = "LC_CTYPE", locale = "chs")
    [1] "Chinese_People's Republic of China.936"
    > print('试试')
    [1] "试试"
    

    Now, of course this only applies to the official R console. If you use other IDE's, such as the very popular RStudio, you don't need to do this at all to be able to type and display Chinese, even if you didn't have the Chinese locale loaded.

    Migrate some useful stuff from the following comments:

    If the data still fails to show up correctly, the we should also look into the issue of the file encoding. If the file is UTF-8 encoded, tither data <- read.table("you_file", sep=',', fileEncoding="UTF-8-BOM", header=TRUE) or fileEncoding="UTF-8" will do, depends on which encoding it really has.

    But you may want to stay away from UTF-BOM as it is not recommended: What's different between UTF-8 and UTF-8 without BOM?

    0 讨论(0)
提交回复
热议问题