I try to open a UTF-8 encoded .csv file that contains (traditional) Chinese characters in R. For some reason, R displays the information sometimes as Chinese characters, som
Not a bug, more a misunderstanding of the underlying type system conversions (the character
type and the factor
type) when constructing a data.frame
.
You could start first with data <-read.csv("mydata.csv", encoding="UTF-8", stringsAsFactors=FALSE)
which will make your Chinese characters to be of the character
type and so by printing them out you should see waht you are expecting.
@nograpes: similarly x=c('中華民族');x; y <- data.frame(x, stringsAsFactors=FALSE)
and everything should be ok.
In my case, the utf-8 encoding does not work in my r. But the Gb* encoding works.The utf8 wroks in ubuntu. First you need to figure out the default encoding in your OS. And encode it as it is. Excel can not encode it as utf8 properly even it claims that it save as utf8.
(1) Download 'Open Sheet' software.
(2) Open it properly. You can scroll the encoding method until you see the Chinese character displayed in the preview windows.
(3) Save it as utf-8(if you want utf-8). (UTF-8 is not solution to every problem, you HAVE TO know the default encoding in your system first)