UTF-8 support in R on Windows

眉间皱痕 提交于 2020-12-06 06:04:52

问题


Since new function 'Beta: Use Unicode UTF-8 for worldwide language support' is added on Windows10, I thought it is possible for R to convert locale environment to UTF-8. However, when I try to change system locale to UTF-8 by

Sys.setlocale(locale = "Japanese_Japan.65001") 

or

Sys.setlocale(locale = "Japanese_Japan.UTF-8") 

I get

In Sys.setlocale("Japanese_Japan.65001") :
OS reports request to set locale to "Japanese_Japan.65001" cannot be honored

For now, does Windows allow R to use UTF-8?

(Because I am not very familiar with locale problem, I welcome comments if there should be more information.)

infomation

> Sys.getlocale()
[1] "LC_COLLATE=Japanese_Japan.932;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=Japanese_Japan.932;LC_NUMERIC=C;LC_TIME=Japanese_Japan.932"

回答1:


It appears that R has built experimental binaries that fully support UTF-8 on Windows 10, but since the project was marked as "experimental" as of 2020-07-30 and the official conclusion was:

Based also on this experience, I believe that switching to UCRT is already possible and I expect that building a complete toolchain should take a small number of months. It is I think the only realistic way to support Unicode characters (not representable in native encoding) reliably in R on Windows.

It clearly means that full UTF-8 support in R on Windows is still a plan for a bit more distant future.

Source: https://developer.r-project.org/Blog/public/2020/07/30/windows/utf-8-build-of-r-and-cran-packages/index.html




回答2:


Sys.setlocale(locale = foo) defaults to category = "LC_ALL"; maybe set aspects of the locale for the R process individually, e.g. as follows:

locales <- c("LC_COLLATE","LC_CTYPE","LC_MONETARY","LC_NUMERIC","LC_TIME");
for (x in locales) { Sys.setlocale(category = x, locale="Japanese_Japan.65001")}

Please observe all warnings from above code snippet and further notes from locales: Query or Set Aspects of the Locale article:

  • Attempts to change the character set (by Sys.setlocale("LC_CTYPE", ), if that implies a different character set) during a session may not work and are likely to lead to some confusion.
  • Setting "LC_NUMERIC" to any value other than "C" may cause R to function anomalously, so gives a warning.
  • Almost all the output routines used by R itself under Windows ignore the setting of "LC_NUMERIC" since they make use of the Trio library which is not internationalized.

For instance, my locale is Czech so I tried the following code snippet (itemized above loop to see the results and warnings in sequence):

Sys.getlocale(category = "LC_ALL")
Sys.setlocale(category = "LC_COLLATE" , locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_CTYPE"   , locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_MONETARY", locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_NUMERIC" , locale="Czech_Czechia.65001")
Sys.setlocale(category = "LC_TIME"    , locale="Czech_Czechia.65001")
Sys.getlocale(category = "LC_ALL")

Output (pasted into the RStudio console):

> Sys.getlocale()
[1] "LC_COLLATE=Czech_Czechia.1250;LC_CTYPE=Czech_Czechia.1250;LC_MONETARY=Czech_Czechia.1250;LC_NUMERIC=C;LC_TIME=Czech_Czechia.1250"
> Sys.setlocale(category = "LC_COLLATE" , locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
> Sys.setlocale(category = "LC_CTYPE"   , locale="Czech_Czechia.65001")
[1] ""
Warning message:
In Sys.setlocale(category = "LC_CTYPE", locale = "Czech_Czechia.65001") :
  OS reports request to set locale to "Czech_Czechia.65001" cannot be honored
> Sys.setlocale(category = "LC_MONETARY", locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
> Sys.setlocale(category = "LC_NUMERIC" , locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
Warning message:
In Sys.setlocale(category = "LC_NUMERIC", locale = "Czech_Czechia.65001") :
  setting 'LC_NUMERIC' may cause R to function strangely
> Sys.setlocale(category = "LC_TIME"    , locale="Czech_Czechia.65001")
[1] "Czech_Czechia.65001"
> Sys.getlocale(category = "LC_ALL")
[1] "LC_COLLATE=Czech_Czechia.65001;LC_CTYPE=Czech_Czechia.1250;LC_MONETARY=Czech_Czechia.65001;LC_NUMERIC=Czech_Czechia.65001;LC_TIME=Czech_Czechia.65001"
> 



回答3:


The best way to use R in Windows to this day (August 22nd, 2020) is to install WSL 2 (Windows Subsystem for Linux) and connect to RStudio Server via a web browser.

Instructions:

  • Install WSL 2: https://docs.microsoft.com/en-us/windows/wsl/install-win10 (which requires Windows 10, updated to version 1903 or higher). If you want GUI for WSL 2, here is the instruction: https://most-useful.com/ubuntu-20-04-desktop-gui-on-wsl-2-on-surface-pro-4/ (but it eats almost of my RAM and laggy as shit)
  • Install R and RStudio Server: https://rstudio.com/products/rstudio/download-server/
  • Start RStudio Server: sudo rstudio-server start
  • Open a web browser (I recommend Chrome) and connect to http://localhost:8787, access your Linux account, RStudio Server will open and run smoothly. I use it in full-screen mode and even create a desktop shortcut for the address which opens it in full-screen mode by default.


来源:https://stackoverflow.com/questions/62726261/utf-8-support-in-r-on-windows

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!