Export UTF-8 BOM to .csv in R

前端 未结 2 1286
醉酒成梦
醉酒成梦 2020-12-05 16:32

I am reading a file through RJDBC from a MySQL database and it correctly displays all letters in R (e.g., נווה שאנן). However, even when exporting it using write.csv and fil

相关标签:
2条回答
  • 2020-12-05 16:52

    The accepted answer did not help me in a similar application (R 3.1 in Windows, while I was trying to open the file in Excel). Anyway, based on this part of file documentation:

    If a BOM is required (it is not recommended) when writing it should be written explicitly, e.g. by writeChar("\ufeff", con, eos = NULL) or writeBin(as.raw(c(0xef, 0xbb, 0xbf)), binary_con)

    I came up with the following workaround:

    write.csv.utf8.BOM <- function(df, filename)
    {
        con <- file(filename, "w")
        tryCatch({
        for (i in 1:ncol(df))
            df[,i] = iconv(df[,i], to = "UTF-8") 
        writeChar(iconv("\ufeff", to = "UTF-8"), con, eos = NULL)
        write.csv(df, file = con)
        },finally = {close(con)})
    }
    

    Note that df is the data.frame and filename is the path to the csv file.

    0 讨论(0)
  • 2020-12-05 17:02

    On help page to Encoding (help("Encoding")) you could read about special encoding - bytes.

    Using this I was able to generate csv file by:

    v <- "נווה שאנן"
    X <- data.frame(v1=rep(v,3), v2=LETTERS[1:3], v3=0, stringsAsFactors=FALSE)
    
    Encoding(X$v1) <- "bytes"
    write.csv(X, "test.csv", row.names=FALSE)
    

    Take care about differences between factor and character. The following should work:

    id_characters <- which(sapply(X,
        function(x) is.character(x) && Encoding(x)=="UTF-8"))
    for (i in id_characters) Encoding(X[[i]]) <- "bytes"
    
    id_factors <- which(sapply(X,
        function(x) is.factor(x) && Encoding(levels(x))=="UTF-8"))
    for (i in id_factors) Encoding(levels(X[[i]])) <- "bytes"
    
    write.csv(X, "test.csv", row.names=FALSE)
    
    0 讨论(0)
提交回复
热议问题