Are there packages for Cyrillic text transliteration to Latin in R? I need to convert data frames to Latin to use factors. It is somewhat messy to use Cyrillic factors in R.
It is possible to do it with stringi package as you above, but with different transform identifier, for Serbian latin:
`stri_trans_general("жшчћђ", "Serbian-Latin/BGN")`
All characters should be transformed correctly to Serbian latin.
If afterwards one uses Base R to filter the data in Cyrillic, one get's all NA's, but if dplyr is used then everything is fine.
I have found the package at last.
> library(stringi)
> stri_trans_general("женщина", "cyrillic-latin")
[1] "ženŝina"
> stri_trans_general("женщина", "russian-latin/bgn")
[1] "zhenshchina"
After that, the only issue remaining is the "ё" letter.
> stri_trans_general("Ёж", "russian-latin/bgn")
[1] "Yëzh"
I had to remove all the "ё" letters
> iconv(stri_trans_general("ёж", "russian-latin/bgn"),from="UTF8",to="ASCII",sub="")
[1] "yzh"
Or one can just remove the 'Ё' and 'ё' letters before
> gsub('ё','e',gsub('Ё','E','Ёжики на ёлке'))
[1] "Eжики на eлке"
or after transliteration.