I'm trying to write a formula for Google Sheets which will convert Unicode characters with diacritics to their plain ASCII equivalents.
I see that Google uses RE2 in its "REGEXREPLACE" function. And I see that RE2 offers Unicode character classes.
I tried to write a formula (similar to this one):
REGEXREPLACE("público","(\pL)\pM*","$1")
But Sheets produces the following error:
Function REGEXREPLACE parameter 2 value "\pL" is not a valid regular expression.
I suppose I could write a formula consisting of a long set of nested SUBSTITUTE functions (Like this one), but that seems pretty awful.
Can any offer a suggestion for a better way to normalize Unicode letters with diacritical/accent marks in a Google Sheets formula?
[[:^alpha:]]
(negated ASCII character class) works fine for REGEXEXTRACT
formula.
But =REGEXREPLACE("público","([[:alpha:]])[[:^alpha:]]","$1")
gives "pblic" as a result. So, I guess, formula doesn't know what exact ASCII character must replace "ú".
Workaround
Let's take the word públicē
; we need to replace two symbols in it. Put this word in cell A1, and this formula in cell B1:
=JOIN("",ArrayFormula(IFERROR(VLOOKUP(SPLIT(REGEXREPLACE(A1,"(.)","$1-"),"-"),D:E,2,0),SPLIT(REGEXREPLACE(A1,"(.)","$1-"),"-"))))
And then make directory of replacements in range D:E:
D E
1 ú u
2 ē e
3 ... ...
This formula is still ugly, but more useful because you can control your directory by adding more characters to the table.
Or use Java Script
Also found a good solution, which works in google sheets.
This did it for me in Google Sheets, Google Apps Scripts, GAS
function normalizetext(text) {
var weird = 'öüóőúéáàűíÖÜÓŐÚÉÁÀŰÍçÇ!@£$%^&*()_+?/*."';
var normalized = 'ouooueaauiOUOOUEAAUIcC ';
var idoff = -1,new_text = '';
var lentext = text.toString().length -1
for (i = 0; i <= lentext; i++) {
idoff = weird.search(text.charAt(i));
if (idoff == -1) {
new_text = new_text + text.charAt(i);
} else {
new_text = new_text + normalized.charAt(idoff);
}
}
return new_text;
}
来源:https://stackoverflow.com/questions/35640493/how-can-i-normalize-asciify-unicode-characters-in-google-sheets