Since fwrite()
cannot apply encoding argument , how can i export csv file in specific encode as fast as fwrite()
? (fwrite()
As of writing this, fwrite does not support forcing encoding. There is a workaround that I use, but it's a bit more obtuse than I'd like. For your example:
readr::write_excel_csv(DT[,0],"DT.csv")
data.table::fwrite(DT,file = "DT.csv",append = T)
The first line will save only the headers of your data table to the CSV, defaulting to UTF-8 with the Byte order mark required to let Excel know that the file is encoded UTF-8. The fwrite statement then uses the append option to add additional lines to the original CSV. This retains the encoding from write_excel_csv, while maximizing the write speed.
If you work within R,
try this as working approach:
# You have DT
# DT is a data.table / data.frame
# DT$text contains any text data not encoded with 'utf-8'
library(data.table)
DT$text <– enc2utf8(DT$text) # it forces underlying data to be encoded with 'utf-8'
fwrite(DT, "DT.csv", bom = T) # Then save the file using ' bom = TRUE '
Hope that helps.
I know some people have already answered but I wanted to contribute a more holistic solution using the answer from user2554330.
# Encode data in UTF-8
for (col in colnames(DT)) {
names(DT) <- enc2utf8(names(DT)) # Column names need to be encoded too
DT[[col]] <- as.character(DT[[col]]) # Allows for enc2utf8() and Encoding()
DT[[col]] <- enc2utf8(DT[[col]]) # same as users' answer
Encoding(DT[[col]]) <- "unknown"
}
fwrite(DT, "DT.csv", bom = T)
# When re-importing your data be sure to use encoding = "UTF-8"
DT2 <- fread("DT.csv", encoding = "UTF-8")
# DT2 should be identical to the original DT
This should work for any and all UTF-8 characters anywhere on a data.table
You should post a reproducible example, but I would guess you could do this by making sure the data in DT
is in UTF-8 within R, then setting the encoding of each column to "unknown". R will then assume the data is encoded in the native encoding when you write it out.
For example,
DF <- data.frame(text = "á", stringsAsFactors = FALSE)
DF$text <- enc2utf8(DF$text) # Only necessary if Encoding(DF$text) isn't "UTF-8"
Encoding(DF$text) <- "unknown"
data.table::fwrite(DF, "DF.csv", bom = TRUE)
If the columns of DF
are factors, you'll need to convert them to character vectors before this will work.