How can i specify encode in fwrite() for export csv file R?

前端 未结 4 1751
花落未央
花落未央 2021-01-13 18:19

Since fwrite() cannot apply encoding argument , how can i export csv file in specific encode as fast as fwrite() ? (fwrite()

相关标签:
4条回答
  • 2021-01-13 18:31

    As of writing this, fwrite does not support forcing encoding. There is a workaround that I use, but it's a bit more obtuse than I'd like. For your example:

    readr::write_excel_csv(DT[,0],"DT.csv")
    data.table::fwrite(DT,file = "DT.csv",append = T)
    

    The first line will save only the headers of your data table to the CSV, defaulting to UTF-8 with the Byte order mark required to let Excel know that the file is encoded UTF-8. The fwrite statement then uses the append option to add additional lines to the original CSV. This retains the encoding from write_excel_csv, while maximizing the write speed.

    0 讨论(0)
  • 2021-01-13 18:47

    If you work within R,
    try this as working approach:

    # You have DT   
    # DT is a data.table / data.frame   
    # DT$text contains any text data not encoded with 'utf-8'       
    
    library(data.table)   
    DT$text <– enc2utf8(DT$text) # it forces underlying data to be encoded with 'utf-8'   
    fwrite(DT, "DT.csv", bom = T) # Then save the file using ' bom = TRUE ' 
    

    Hope that helps.

    0 讨论(0)
  • 2021-01-13 18:52

    I know some people have already answered but I wanted to contribute a more holistic solution using the answer from user2554330.

    # Encode data in UTF-8
    for (col in colnames(DT)) {
        names(DT) <- enc2utf8(names(DT)) # Column names need to be encoded too
        DT[[col]] <- as.character(DT[[col]]) # Allows for enc2utf8() and Encoding()
        DT[[col]] <- enc2utf8(DT[[col]]) # same as users' answer
        Encoding(DT[[col]]) <- "unknown"
    }
    
    fwrite(DT, "DT.csv", bom = T)
    
    # When re-importing your data be sure to use encoding = "UTF-8"
    DT2 <- fread("DT.csv", encoding = "UTF-8") 
    # DT2 should be identical to the original DT
    

    This should work for any and all UTF-8 characters anywhere on a data.table

    0 讨论(0)
  • 2021-01-13 18:55

    You should post a reproducible example, but I would guess you could do this by making sure the data in DT is in UTF-8 within R, then setting the encoding of each column to "unknown". R will then assume the data is encoded in the native encoding when you write it out.

    For example,

    DF <- data.frame(text = "á", stringsAsFactors = FALSE)
    DF$text <- enc2utf8(DF$text) # Only necessary if Encoding(DF$text) isn't "UTF-8"
    Encoding(DF$text) <- "unknown"
    data.table::fwrite(DF, "DF.csv", bom = TRUE)
    

    If the columns of DF are factors, you'll need to convert them to character vectors before this will work.

    0 讨论(0)
提交回复
热议问题