Printing UTF-8 characters in R, Rmd, knitr, bookdown

跟風遠走 提交于 2019-12-20 03:12:59

问题


UPDATE (April 2018):
The problem still persists, under different settings and computers. I believe it is related to all UNICODE, UTF-8 characters.

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

PROBLEM:

My Rmd/R file is saved with UTF-8 encoding. Other sessionInfo() details:

Platform: x86_64-w64-mingw32/x64 (64-bit)
LC_CTYPE=English_Canada.1252

other attached packages:
[1] knitr_1.17

Here is a simple data frame that I need to print as a table in a html document, e.g. with kable(dt) or any other way.

dt <- data.frame(
name=c("Борис Немцов","Martin Luter King"),
year=c("2015","1968") 
)

Neither of the following works:

Way 1

If I keep Sys.setlocale() as is (i.e. "English_Canada.1252"), then I get this:

> dt;                                                                                           
name year
1 <U+0411><U+043E><U+0440><U+0438><U+0441> <U+041D><U+0435><U+043C><U+0446><U+043E><U+0432> 2015
2 Martin Luter King 1968
> kable(dt)
|name                                                                                      |year |
|:-----------------------------------------------------------------------------------------|:----|
|<U+0411><U+043E><U+0440><U+0438><U+0441> <U+041D><U+0435><U+043C><U+0446><U+043E><U+0432> |2015 |
|Martin Luter King                                                                         |1968 |

Note that <U+....> are printed instead of characters.
Using dt$name <- enc2utf8(as.character(dt$name)) did not help.

Way 2

If I change Sys.setlocale("LC_CTYPE", "russian") #"Russian_Russia.1251"`, then I get this:

> dt; 
name year
1      Áîðèñ Íåìöîâ 2015
2 Martin Luter King 1968

> kable(dt)
|name              |year |
|:-----------------|:----|
|Áîðèñ Íåìöîâ      |2015 |
|Martin Luter King |1968 |

Note that characters have become gibberish.
Using print(dt,encoding="windows-1251"); print(dt,encoding="UTF-8") had no effect.

Any advice?

The closest I could find to address this problem are in the following links, but they did not help: http://blog.rolffredheim.com/2013/01/r-and-foreign-characters.html, https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows, https://www.smashingmagazine.com/2012/06/all-about-unicode-utf8-character-sets

I also tried to save my file with 1251 encoding (instead of current UTF-8 encoding) and some other character conversion/processing packages. Nothing helped yet.

UPDATE:

Opened related question: How to change Sys.setlocale, when you get Error "request to set locale … cannot be honored"


回答1:


The only solution that worked was the one suggested by Yihui Xie (knitr developer), which is :
creating a file .Rprofile, which contains one line Sys.setlocale("LC_CTYPE", "russian") and placing it in your home or working directory.

However, please note that, it works only with use of kable(), i.e with help of knitr package.
If you try to print with print(dt$name[1]), you still get Áîðèñ Íåìöîâ.
However, if you use kable(dt$name[1]), you'll get what you need - Борис Немцов !



来源:https://stackoverflow.com/questions/48307007/printing-utf-8-characters-in-r-rmd-knitr-bookdown

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!