R: UTF-8 character bytes as Latin-1 characters bytes

删除回忆录丶 提交于 2019-12-18 07:24:56

问题


I get UTF-8 character bytes as Latin-1 character bytes. Examples contain

Latin 1 character bytes        ----- UTF-8 bytes
äännök                      ----- äännök
Ã<U+0084>Ã<U+0084>NÃ<U+0096>S  ----- äänös 

and my session info

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

So what kind of settings do I need in R to handle umlauts correctly (not to return UTF-8 bytes as Latin-1 character bytes)?

Related?

  1. Turn Unicode into Umlaut in R on Mac (Facebook Data)

  2. https://stackoverflow.com/a/22945233/164148

  3. Apparently by this, I need to

If you call Sys.setlocale with "LC_CTYPE" or "LC_ALL" to change the system locale while RStudio is running, you may run into some minor issues as RStudio assumes the system encoding doesn't change. If you are on Windows, we recommend you only call Sys.setlocale in .Rprofile. If you are on Mac or Linux and want to change the system locale, please visit the support forum and let us know your scenario.

  1. Does there exist some simple tool to convert the Latin-1 character bytes to UTF-8 character bytes?

P.s. I have tested this now in R on Linux and R on OSX, I get the same problem of interpreting the UTF-8 character bytes as Latin-1 character bytes.

来源:https://stackoverflow.com/questions/41873359/r-utf-8-character-bytes-as-latin-1-characters-bytes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!