Convert accented characters into ascii character

爷,独闯天下 提交于 2019-11-27 21:16:47

问题


What is the optimal way to to remove German (or French) accents from a vector of 16 million string variables.

e.g., 'Sjögren's syndrome' into 'Sjogren's syndrome'

Converstion of single character into a single character is better then transliteration such as

ä => ae ö => oe ü => ue.

e.g., using regular expression would be one option but is there something better (R package for this)?

gsub('ü','u',gsub('ö','o',"Sjögren's syndrome ( über) "))

There are SO solutions for non-R platforms but not a good one for R.


回答1:


Use iconv to convert to ASCII with transliteration (if supported):

iconv(c("über","Sjögren's"),to="ASCII//TRANSLIT")
[1] "uber"      "Sjogren's"



回答2:


One of the linked answers suggest

library(stringi)
stri_trans_general("Zażółć gęślą jaźń", "Latin-ASCII")

[1] "Zazolc gesla jazn"


来源:https://stackoverflow.com/questions/13610319/convert-accented-characters-into-ascii-character

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!