readr::read_csv issue: Chinese Character becomes messy codes

僤鯓⒐⒋嵵緔 提交于 2019-12-19 09:25:23

问题


I'm trying to import a dataset to RStudio, however I am stuck with Chinese characters, as they become messy codes. Here is the code:

library(tidyverse)
df <- read_csv("中文,英文\n英文,德文")
df
# A tibble: 1 x 2
  `\xd6\xd0\xce\xc4`            `Ӣ\xce\xc4`
               <chr>                  <chr>
1 "<U+04E2>\xce\xc4" "<U+00B5>\xc2\xce\xc4"

When I use the base function read.csv, it works well. I guess I must do something wrong with encoding. But there are no encoding option in read_csv, how can I do this?


回答1:


This is because that the characters are marked as UTF-8 whereas the actual encoding is the system default (you can get by stringi::stri_enc_get()).

So, you can do either:

1) Read data with the correct encoding:

df <- read_csv("中文,英文\n英文,德文", locale = locale(encoding = stringi::stri_enc_get()))

2) Read data with the incorrect encoding and mark them with the correct encoding later (note that this does not always work):

df <- read_csv("中文,英文\n英文,德文")
df <- dplyr::mutate_all(df, `Encoding<-`, value = "unknown")


来源:https://stackoverflow.com/questions/46996501/readrread-csv-issue-chinese-character-becomes-messy-codes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!