Are there any solutions that will convert all foreign characters to A-z equivalents? I have searched extensively on Google and could not find a solution or even a list of characters and equivalents. The reason is I want to display A-z only URLs, plus plenty of other trip ups when dealing with these characters.
You can use iconv, which has a special transliteration encoding.
When the string "//TRANSLIT" is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character.
-- http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html
See here for a complete example that matches your use case.
If you are using iconv then make sure your locale is set correctly before you try the transliteration, otherwise some characters will not be correctly transliterated
setlocale(LC_CTYPE, 'en_US.UTF8');
This will convert as much as possible foreign characters (including Cyrillic, Chinese, Arabic etc.) to their A-z equivalents:
$AzString = transliterator_transliterate('Any-Latin;Latin-ASCII;', $foreignString);
You might want install PHP Intl extension first.
If you are stuck with an development&release environment that doesn't support PHP 5.4 or newer, you should either use iconv or a custom Transliteration library.
In case of iconv, I find it extremely unhelpful especially using it on Arabic or Cyrillic alphabets. I would go for a PHP 5.4 built-in Transliteration class or a custom Transliteration class.
One of the solutions posted mentioned a custom library which I did not test.
When I was using Drupal, I loved their transliteration module, that I've recently ported it to make it usable without Drupal.
You can download it here and use as follows:
<?php
include "JTransliteration.php";
$mombojombotext = "誓曰:『時日害喪?予及女偕亡。』民欲與之偕亡,雖有";
$nonmombojombotex = JTransliteration::transliterate($mombojombotext);
echo $nonmombojombotex;
?>
Note: I'm reposting this from another similar question in the hope that it's helpful to others.
I ended up writing a PHP library based on URLify.js from the Django project, since I found iconv() to be too incomplete. You can find it here:
https://github.com/jbroadway/urlify
Handles Latin characters as well as Greek, Turkish, Russian, Ukrainian, Czech, Polish, and Latvian.
<?php
/**
* @author bulforce[]gmail.com # 2011
* Simple class to attempt transliteration of bulgarian lating text into bulgarian cyrilic text
*/
// Usage:
// $text = "yagoda i yundola";
// $tl = new Transliterate();
// echo $tl->lat_to_cyr($text); //ягода и юндола
class Transliterate {
private $cyr_identical = array("а", "б", "в", "в", "г", "д", "е", "ж", "з", "и", "к", "л", "м", "н", "о", "п", "р", "с", "т", "у", "ф", "х", "ц", "ъ", "я");
private $lat_identical = array("a", "b", "v", "w", "g", "d", "e", "j", "z", "i", "k", "l", "m", "n", "o", "p", "r", "s", "t", "u", "f", "h", "c", "y", "q");
private $cyr_fricative = array("ж", "ч", "ш", "щ", "ц", "я", "ю", "я", "ю");
private $lat_fricative = array("zh", "ch", "sh", "sht", "ts", "ia", "iu", "ya", "yu");
public function __construct() {
$this->identical_to_upper();
$this->fricative_to_variants();
}
public function lat_to_cyr($str) {
for ($i = 0; $i < count($this->cyr_fricative); $i++) {
$c_cyr = $this->cyr_fricative[$i];
$c_lat = $this->lat_fricative[$i];
$str = str_replace($c_lat, $c_cyr, $str);
}
for ($i = 0; $i < count($this->cyr_identical); $i++) {
$c_cyr = $this->cyr_identical[$i];
$c_lat = $this->lat_identical[$i];
$str = str_replace($c_lat, $c_cyr, $str);
}
return $str;
}
private function identical_to_upper() {
foreach ($this->cyr_identical as $k => $v) {
$this->cyr_identical[] = mb_strtoupper($v, 'UTF-8');
}
foreach ($this->lat_identical as $k => $v) {
$this->lat_identical[] = mb_strtoupper($v, 'UTF-8');
}
}
private function fricative_to_variants() {
foreach ($this->lat_fricative as $k => $v) {
// This handles all chars to Upper
$this->lat_fricative[] = mb_strtoupper($v, 'UTF-8');
$this->cyr_fricative[] = mb_strtoupper($this->cyr_fricative[$k], 'UTF-8');
// This handles variants
// TODO: fix the 3 leter sounds
for ($i = 0; $i <= count($v); $i++) {
$v[$i] = mb_strtoupper($v[$i], 'UTF-8');
$this->lat_fricative[] = $v;
if ($i == 0) {
$this->cyr_fricative[] = mb_strtoupper($this->cyr_fricative[$k], 'UTF-8');
} else {
$this->cyr_fricative[] = $this->cyr_fricative[$k];
}
$v[$i] = mb_strtolower($v[$i], 'UTF-8');
}
}
}
}
for composer adepts there is slugify
https://github.com/cocur/slugify
use Cocur\Slugify\Slugify;
$slugify = new Slugify();
echo $slugify->slugify('Hello World!'); // hello-world
//You can also change the separator used by Slugify:
echo $slugify->slugify('Hello World!', '_'); // hello_world
//The library also contains Cocur\Slugify\SlugifyInterface. Use this interface whenever you need to type hint an instance of Slugify.
//To add additional transliteration rules you can use the addRule() method.
$slugify->addRule('i', 'ey');
echo $slugify->slugify('Hi'); // hey
The problem with your query is that it is a very hard thing to do. Not all glyphs in most languages have a-z equivalents, all glyphs have phonetic equivalents (but these are words not letters), if you are just dealing with Latin based languages then things are a little easier but you still have issues with things like I-mutation.
Your best solution word be to come up with a crude list of phonetic sounds -> a-z equivalents, it won't be perfect but without any more information on you exact requirements it is hard to develop a solution.
Nice library found at:
1) https://github.com/ashtokalo/php-translit (many languages, however, lacks of some languages)
2) https://github.com/fre5h/transliteration (only for Russian and Ukrainian)
来源:https://stackoverflow.com/questions/1284535/php-transliteration