PHP Transliteration

给你一囗甜甜゛ 提交于 2019-11-26 11:52:14

You can use iconv, which has a special transliteration encoding.

When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character.

-- http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html

See here for a complete example that matches your use case.

If you are using iconv then make sure your locale is set correctly before you try the transliteration, otherwise some characters will not be correctly transliterated

setlocale(LC_CTYPE, 'en_US.UTF8');

This will convert as much as possible foreign characters (including Cyrillic, Chinese, Arabic etc.) to their A-z equivalents:

$AzString = transliterator_transliterate('Any-Latin;Latin-ASCII;', $foreignString);

You might want install PHP Intl extension first.

Kemal Dağ

If you are stuck with an development&release environment that doesn't support PHP 5.4 or newer, you should either use iconv or a custom Transliteration library.

In case of iconv, I find it extremely unhelpful especially using it on Arabic or Cyrillic alphabets. I would go for a PHP 5.4 built-in Transliteration class or a custom Transliteration class.

One of the solutions posted mentioned a custom library which I did not test.

When I was using Drupal, I loved their transliteration module, that I've recently ported it to make it usable without Drupal.


You can download it here and use as follows:

<?php

include "JTransliteration.php";

$mombojombotext = "誓曰:『時日害喪?予及女偕亡。』民欲與之偕亡,雖有";
$nonmombojombotex = JTransliteration::transliterate($mombojombotext);

echo $nonmombojombotex;

?>

Note: I'm reposting this from another similar question in the hope that it's helpful to others.

I ended up writing a PHP library based on URLify.js from the Django project, since I found iconv() to be too incomplete. You can find it here:

https://github.com/jbroadway/urlify

Handles Latin characters as well as Greek, Turkish, Russian, Ukrainian, Czech, Polish, and Latvian.

<?php
/**
 * @author bulforce[]gmail.com # 2011
 * Simple class to attempt transliteration of bulgarian lating text into bulgarian cyrilic text
 */

// Usage:
// $text = "yagoda i yundola";
// $tl = new Transliterate();
// echo $tl->lat_to_cyr($text); //ягода и юндола

class Transliterate {

    private $cyr_identical = array("а", "б", "в", "в", "г", "д", "е", "ж", "з", "и", "к", "л", "м", "н", "о", "п", "р", "с", "т", "у", "ф", "х", "ц", "ъ", "я");
    private $lat_identical = array("a", "b", "v", "w", "g", "d", "e", "j", "z", "i", "k", "l", "m", "n", "o", "p", "r", "s", "t", "u", "f", "h", "c", "y", "q");
    private $cyr_fricative = array("ж", "ч", "ш", "щ", "ц", "я", "ю", "я", "ю");    
    private $lat_fricative = array("zh", "ch", "sh", "sht", "ts", "ia", "iu", "ya", "yu");

    public function __construct() {
        $this->identical_to_upper();
        $this->fricative_to_variants();
    }

    public function lat_to_cyr($str) {

        for ($i = 0; $i < count($this->cyr_fricative); $i++) {
            $c_cyr = $this->cyr_fricative[$i];
            $c_lat = $this->lat_fricative[$i];
            $str = str_replace($c_lat, $c_cyr, $str);
        }

        for ($i = 0; $i < count($this->cyr_identical); $i++) {
            $c_cyr = $this->cyr_identical[$i];
            $c_lat = $this->lat_identical[$i];
            $str = str_replace($c_lat, $c_cyr, $str);
        }

        return $str;
    }

    private function identical_to_upper() {

        foreach ($this->cyr_identical as $k => $v) {
            $this->cyr_identical[] = mb_strtoupper($v, 'UTF-8');
        }

        foreach ($this->lat_identical as $k => $v) {
            $this->lat_identical[] = mb_strtoupper($v, 'UTF-8');
        }
    }

    private function fricative_to_variants() {
        foreach ($this->lat_fricative as $k => $v) {
            // This handles all chars to Upper
            $this->lat_fricative[] = mb_strtoupper($v, 'UTF-8');
            $this->cyr_fricative[] = mb_strtoupper($this->cyr_fricative[$k], 'UTF-8');

            // This handles variants
            // TODO: fix the 3 leter sounds
            for ($i = 0; $i <= count($v); $i++) {
                $v[$i] = mb_strtoupper($v[$i], 'UTF-8');
                $this->lat_fricative[] = $v;
                if ($i == 0) {
                    $this->cyr_fricative[] = mb_strtoupper($this->cyr_fricative[$k], 'UTF-8');
                } else {
                    $this->cyr_fricative[] = $this->cyr_fricative[$k];
                }
                $v[$i] = mb_strtolower($v[$i], 'UTF-8');
            }
        }
    }

}

for composer adepts there is slugify

https://github.com/cocur/slugify

use Cocur\Slugify\Slugify;
$slugify = new Slugify();
echo $slugify->slugify('Hello World!'); // hello-world

//You can also change the separator used by Slugify:
echo $slugify->slugify('Hello World!', '_'); // hello_world

//The library also contains Cocur\Slugify\SlugifyInterface. Use this interface whenever you need to type hint an instance of Slugify.
//To add additional transliteration rules you can use the addRule() method.
$slugify->addRule('i', 'ey');
echo $slugify->slugify('Hi'); // hey

The problem with your query is that it is a very hard thing to do. Not all glyphs in most languages have a-z equivalents, all glyphs have phonetic equivalents (but these are words not letters), if you are just dealing with Latin based languages then things are a little easier but you still have issues with things like I-mutation.

Your best solution word be to come up with a crude list of phonetic sounds -> a-z equivalents, it won't be perfect but without any more information on you exact requirements it is hard to develop a solution.

Nice library found at:

1) https://github.com/ashtokalo/php-translit (many languages, however, lacks of some languages)

2) https://github.com/fre5h/transliteration (only for Russian and Ukrainian)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!