问题
I get UTF-8 character bytes as Latin-1 character bytes. Examples contain
Latin 1 character bytes ----- UTF-8 bytes
äännök ----- äännök
Ã<U+0084>Ã<U+0084>NÃ<U+0096>S ----- äänös
and my session info
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1
locale:
[1] C/UTF-8/C/C/C/C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
So what kind of settings do I need in R to handle umlauts correctly (not to return UTF-8 bytes as Latin-1 character bytes)?
Related?
Turn Unicode into Umlaut in R on Mac (Facebook Data)
https://stackoverflow.com/a/22945233/164148
Apparently by this, I need to
If you call Sys.setlocale with "LC_CTYPE" or "LC_ALL" to change the system locale while RStudio is running, you may run into some minor issues as RStudio assumes the system encoding doesn't change. If you are on Windows, we recommend you only call Sys.setlocale in .Rprofile. If you are on Mac or Linux and want to change the system locale, please visit the support forum and let us know your scenario.
- Does there exist some simple tool to convert the Latin-1 character bytes to UTF-8 character bytes?
P.s. I have tested this now in R on Linux and R on OSX, I get the same problem of interpreting the UTF-8 character bytes as Latin-1 character bytes.
来源:https://stackoverflow.com/questions/41873359/r-utf-8-character-bytes-as-latin-1-characters-bytes