问题
I'm trying to import a dataset to RStudio, however I am stuck with Chinese characters, as they become messy codes. Here is the code:
library(tidyverse)
df <- read_csv("中文,英文\n英文,德文")
df
# A tibble: 1 x 2
`\xd6\xd0\xce\xc4` `Ӣ\xce\xc4`
<chr> <chr>
1 "<U+04E2>\xce\xc4" "<U+00B5>\xc2\xce\xc4"
When I use the base function read.csv, it works well. I guess I must do something wrong with encoding. But there are no encoding option in read_csv, how can I do this?
回答1:
This is because that the characters are marked as UTF-8
whereas the actual encoding is the system default (you can get by stringi::stri_enc_get()
).
So, you can do either:
1) Read data with the correct encoding:
df <- read_csv("中文,英文\n英文,德文", locale = locale(encoding = stringi::stri_enc_get()))
2) Read data with the incorrect encoding and mark them with the correct encoding later (note that this does not always work):
df <- read_csv("中文,英文\n英文,德文")
df <- dplyr::mutate_all(df, `Encoding<-`, value = "unknown")
来源:https://stackoverflow.com/questions/46996501/readrread-csv-issue-chinese-character-becomes-messy-codes