问题
What is the optimal way to to remove German (or French) accents from a vector of 16 million string variables.
e.g., 'Sjögren's syndrome' into 'Sjogren's syndrome'
Converstion of single character into a single character is better then transliteration such as
ä => ae ö => oe ü => ue.
e.g., using regular expression would be one option but is there something better (R package for this)?
gsub('ü','u',gsub('ö','o',"Sjögren's syndrome ( über) "))
There are SO solutions for non-R platforms but not a good one for R.
回答1:
Use iconv
to convert to ASCII with transliteration (if supported):
iconv(c("über","Sjögren's"),to="ASCII//TRANSLIT")
[1] "uber" "Sjogren's"
回答2:
One of the linked answers suggest
library(stringi)
stri_trans_general("Zażółć gęślą jaźń", "Latin-ASCII")
[1] "Zazolc gesla jazn"
来源:https://stackoverflow.com/questions/13610319/convert-accented-characters-into-ascii-character