Writing data isn't preserving encoding

后端 未结 2 1602
予麋鹿
予麋鹿 2020-12-30 07:10

I have a string like the following:

str <- \"ていただけるなら\"
Encoding(str) #returns \"UTF-8\"

I write it to disk:

write.table         


        
相关标签:
2条回答
  • 2020-12-30 07:49

    Have you tried using argument fileEncoding ?

    write.table(str, file="chartest", quote=F, col.names=F, row.names=F, fileEncoding="UTF-8")
    
    0 讨论(0)
  • 2020-12-30 08:07

    This is an annoying "feature" of R in Windows. The only solution that I have found so far is to temporarily and programatically switch your locale to the appropriate one required to decode the script of the text in question. So, in the above case you would use the Japanese locale.

    ## This won't work on Windows
    str <- "ていただけるなら"
    Encoding(str) #returns "UTF-8"
    write.table(str, file="c:/chartest.txt", quote=F, col.names=F, row.names=F)
    ## The following should work on Windows - first grab and save your existing locale
    print(Sys.getlocale(category = "LC_CTYPE"))
    original_ctype <- Sys.getlocale(category = "LC_CTYPE")
    ## Switch to the appropriate local for the script
    Sys.setlocale("LC_CTYPE","japanese")
    ## Now you can write your text out and have it look as you would expect
    write.table(str, "c:/chartest2.txt", quote = FALSE, col.names = FALSE, 
                row.names = FALSE, sep = "\t", fileEncoding = "UTF-8")
    ## ...and don't forget to switch back
    Sys.setlocale("LC_CTYPE", original_ctype)
    

    The above produces the two files you can see in this screenshot. The first file shows the Unicode code points, which is not what you want, while the second shows the glyphs you would normally expect.

    Japanese text

    So far nobody has been able to explain to me why this happens in R. It is not an unavoidable feature of Windows because Perl, as I mention in this post, gets round the issue somehow.

    0 讨论(0)
提交回复
热议问题