Is there any free opensource PHP translit lib? [closed]

被刻印的时光 ゝ 提交于 2019-12-04 06:58:48

Google has an AJAX transliteration API which does a good job on many major scripts.

Edit: Damn, it appears on further inspection that this only allows conversions from the Latin alphabet. It's kind of silly that Google hasn't made the reverse functionality available, since they're already using it in Google Translate to provide romanisations for Cyrillic, Chinese, Thai, Hindi, and others, though notably not abugidas such as Hebrew and Arabic.

Further Edit: I thought of a possible workaround: detect the language and use an AJAX query to run it through Google Translate using the same source language as destination language, e.g. Chinese-to-Chinese. Firebug reveals that the transliteration is output in a div whose ID is translit. Transliterations are typically heavily accented, so you'll need to convert them. This is by no means something to rely on (though Google typically doesn't make frequent structural changes to their HTML), but it is certainly an interesting possibility.

I am not a linguist, far from it, but I submit to you the possibility that what you are trying to do is impossible, or extremely complex to implement.

After all, translating names is more than just "converting alphabets." It is comparably easy in russian because every cyrillic character actually has a latin counterpart (they are sister alphabets).

I don't know about arabic, but for chinese you will need a romanization system like Pinyin to get anywhere. It's more complex than a simple replacing of characters.

Here's a full list of ISO Romanizations - If I understand correctly, a solution that works for you would have to implement those rules.

So the task would be:

  • Analyze a text containing numerous different character ranges

  • Identify every word for which character range it belongs to (อักษรไทย is Thai; Москва is cyrillic; and so on)

  • Apply the correct method of romanization to every word.

Now I'm very interested to hear about any libraries that can do this in PHP, but it is well possible that there are none.

Will iconv do?

With this module, you can turn a string represented by a local character set into the one represented by another character set, which may be the Unicode character set.

From PHP manual:

$text = "This is the Euro symbol '€'.";

echo 'Original : ', $text, PHP_EOL;
echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text), PHP_EOL;
echo 'IGNORE   : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $text), PHP_EOL;
echo 'Plain    : ', iconv("UTF-8", "ISO-8859-1", $text), PHP_EOL;

If that won't do, check out these

As an alternative, define the character map in an array and use str_replace or mb_substitute_character to do the conversion.

In PHP5.3, Intl introduces a transliterator class, which is a wrapper around ICU. The following library has the full ISO rule set:

http://www.php.net/manual/en/transliterator.transliterate.php

I ended up writing a PHP library based on URLify.js from the Django project, since I found iconv() to be too incomplete. You can find it here:

https://github.com/jbroadway/urlify

Handles Latin characters as well as Greek, Turkish, Russian, Ukrainian, Czech, Polish, and Latvian.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!