Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars

后端未结

关注

 12  743

故里飘歌 2020-11-22 11:42

I am looking at an algorithm that can map between characters with diacritics (tilde, circumflex, caret, umlaut, caron) and their \"simple\" character.

For example:

12条回答

遇见更好的自我 (楼主)

2020-11-22 12:11
There is a draft report on character folding on the unicode website which has a lot of relevant material. See specifically Section 4.1. "Folding algorithm".

Here's a discussion and implementation of diacritic marker removal using Perl.

These existing SO questions are related:
- How to convert UTF-8 to US ASCII
- How to change diacritic characters to non-diacritic ones
0 讨论(0)

查看其它12个回答
发布评论:

提交评论
- 加载中...