readr::read_csv issue: Chinese Character becomes messy codes

问题

I'm trying to import a dataset to RStudio, however I am stuck with Chinese characters, as they become messy codes. Here is the code:

library(tidyverse)
df <- read_csv("中文,英文\n英文,德文")
df
# A tibble: 1 x 2
  `\xd6\xd0\xce\xc4`            `Ӣ\xce\xc4`
               <chr>                  <chr>
1 "<U+04E2>\xce\xc4" "<U+00B5>\xc2\xce\xc4"

When I use the base function read.csv, it works well. I guess I must do something wrong with encoding. But there are no encoding option in read_csv, how can I do this?

回答1:

This is because that the characters are marked as UTF-8 whereas the actual encoding is the system default (you can get by stringi::stri_enc_get()).

So, you can do either:

1) Read data with the correct encoding:

df <- read_csv("中文,英文\n英文,德文", locale = locale(encoding = stringi::stri_enc_get()))

2) Read data with the incorrect encoding and mark them with the correct encoding later (note that this does not always work):

df <- read_csv("中文,英文\n英文,德文")
df <- dplyr::mutate_all(df, `Encoding<-`, value = "unknown")

来源：https://stackoverflow.com/questions/46996501/readrread-csv-issue-chinese-character-becomes-messy-codes

标签

dplyr

tidyverse

readr

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!