How can I remove diacritics (umlauts) from a String?

谁说我不能喝 提交于 2019-12-23 15:11:14

问题


How can I convert a string, such as Příliš žluťoučký kůň úpěl ďábelské ódy. into Prilis zlutoucky kun upel dabelske ody.?

The source string is in Unicode, so in principle it should be possible to use normalization/decomposition to separate the umlaut.

Unfortunately I didn't see any library in Pharo (maybe Zinc hidden somewhere?) that would support either stripping umlauts or decomposition.


回答1:


You can try Diacriticals package

Installation

Metacello new
    smalltalkhubUser: 'Pharo' project: 'MetaRepoForPharo50';
    configuration: 'Diacritics';
    version: #development;
    load.

Test

'Příliš žluťoučký kůň úpěl ďábelské ódy' asNonDiacritical.
 "'Prilis zlutoucky kun upel dabelske ody'"



回答2:


There isn't, as far as I'm aware of and the algorithm's that can do this are quite costly, so you'll probably not want to use Smalltalk implementation of them. At the company I work, we created a VM plugin that makes the calls to libicu. That way we don't have to maintain a separate implementation and profit from native speed. See ICU for reference.



来源:https://stackoverflow.com/questions/38724281/how-can-i-remove-diacritics-umlauts-from-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!