Manipulating files with non-English names in R

前端 未结 2 682
感情败类
感情败类 2020-12-17 00:51

When using the R functions to manipulate files in Windows, e.g. dir(), those with non-English characters, like Cyrillic, are presented as a sequence of \"?\".

相关标签:
2条回答
  • 2020-12-17 01:08

    try this: iconv("привет.txt","UTF-8","CP1251")

    Convert Character Vector between Encodings:
    https://stat.ethz.ch/R-manual/R-devel/library/base/html/iconv.html

    The iconv library:
    http://www.delorie.com/gnu/docs/recode/recode_30.html

    0 讨论(0)
  • 2020-12-17 01:31

    One easy solution is to change location if you only want to run the script once or twice and know the target language.

    Sys.setlocale(category = "LC_ALL", locale="Russian") 
    x1<-read.table("C:\\привет.txt",head=TRUE)  #work just fine with R_3.1.2
    Sys.setlocale(category = "LC_ALL", locale="English") 
    x2<-read.table("C:\\привет.txt",head=TRUE)  #will present error
    

    In case you want to read from server, I strongly recommend that you use Python or other script language to process Unicode path. If you insist, I would say: (c.f. Set locale to system default UTF-8)

    Sys.setlocale(category = "LC_ALL", locale = "English_United States.1252")
    x3<-read.table("C:\\привет.txt",head=TRUE)  #will present warning or not, but successfully read a table into x3
    

    However, you should still process this table's content using some package (e.g. stringi) and remember to revert location after this read operation if necessary.

    ==Update==

    (c.f.https://stat.ethz.ch/pipermail/r-help/2011-May/278206.html) This may also be an inconsistent problem according to R-FAQ document:

    3.6 I don't see characters with accents at the R console, for example in text.

    You need to specify a font in Rconsole (see Q5.2) that supports the encoding in use. This used to be a problem in earlier versions of Windows, but now it is hard to find a font which does not.

    Support for these characters within Rterm depends on the environment (the terminal window and shell, including locale and codepage settings) within which it is run as well as the font used by the terminal window. Those are usually on legacy DOS settings and need to altered.

    Taking this, please tell me if you can input Russian file names in R-console using 'read'. Thanks.

    0 讨论(0)
提交回复
热议问题