transliteration

Is there any free opensource PHP translit lib? [closed]

被刻印的时光 ゝ 提交于 2019-12-04 06:58:48
so I have lots of users posting articles with names in different languages. I need some lib to translate thouse article names to english letters for example turn russian 'р' into eng 'r' and so on for all european languages, russian and asian languages. Where to get such lib? 45 seconds of google gave me this "This extension allows you to transliterate text in non-latin characters (such as Chinese, Cyrillic, Greek etc) to latin characters." It seems to be what I realy needed. Has any one tried this in real life? Google has an AJAX transliteration API which does a good job on many major scripts

Removing accent marks (diacritics) from Latin characters for comparison [duplicate]

余生长醉 提交于 2019-12-03 14:22:54
This question already has an answer here: Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars 12 answers I need to compare the names of European places that are written using the Latin alphabet with accent marks (diacritics) on some characters. There are lots of Central and Eastern European names that are written with accent marks like Latin characters on ž and ü , but some people write the names just using the regular Latin characters without accent marks like z and u . I need a way to have my system recognize for example mšk žilina being the same as msk zilina , and

use string.translate in Python to transliterate Cyrillic?

时光总嘲笑我的痴心妄想 提交于 2019-12-03 12:29:49
问题 I'm getting UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-51: ordinal not in range(128) exception trying to use string.maketrans in Python . I'm kinda discouraged with this kind of error in following code (gist): # -*- coding: utf-8 -*- import string def translit1(string): """ This function works just fine """ capital_letters = { u'А': u'A', u'Б': u'B', u'В': u'V', u'Г': u'G', u'Д': u'D', u'Е': u'E', u'Ё': u'E', u'Ж': u'Zh', u'З': u'Z', u'И': u'I', u'Й': u'Y', u'К':

use string.translate in Python to transliterate Cyrillic?

我只是一个虾纸丫 提交于 2019-12-03 02:55:00
I'm getting UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-51: ordinal not in range(128) exception trying to use string.maketrans in Python . I'm kinda discouraged with this kind of error in following code ( gist ): # -*- coding: utf-8 -*- import string def translit1(string): """ This function works just fine """ capital_letters = { u'А': u'A', u'Б': u'B', u'В': u'V', u'Г': u'G', u'Д': u'D', u'Е': u'E', u'Ё': u'E', u'Ж': u'Zh', u'З': u'Z', u'И': u'I', u'Й': u'Y', u'К': u'K', u'Л': u'L', u'М': u'M', u'Н': u'N', u'О': u'O', u'П': u'P', u'Р': u'R', u'С': u'S', u'Т': u'T',

Transliterate/transpose the characters in the NSString

梦想的初衷 提交于 2019-12-02 21:04:44
I want to transliterate a cyrillic string to it's closest latin equivalent. E.g. "матрешка" => "matreshka", "водка" => "vodka". So ideally I want some ready to use method on the NSString or somewhere else that already knows everything about the alphabets and can do the conversation. But if such functionality doesn't exist in the iOS APIs then I will be totally happy with something like ruby's tr method that just replaces the characters in a string using a simple mapping specified as a parameter. "баба".tr('абвгд', 'abvgd') ksh Either try CFStringTransform function of CFMutableString with

Cyrillic transliteration in R

时光毁灭记忆、已成空白 提交于 2019-12-02 08:46:20
问题 Are there packages for Cyrillic text transliteration to Latin in R? I need to convert data frames to Latin to use factors. It is somewhat messy to use Cyrillic factors in R. 回答1: I have found the package at last. > library(stringi) > stri_trans_general("женщина", "cyrillic-latin") [1] "ženŝina" > stri_trans_general("женщина", "russian-latin/bgn") [1] "zhenshchina" After that, the only issue remaining is the "ё" letter. > stri_trans_general("Ёж", "russian-latin/bgn") [1] "Yëzh" I had to remove

Cyrillic transliteration in R

拜拜、爱过 提交于 2019-12-02 04:02:25
Are there packages for Cyrillic text transliteration to Latin in R? I need to convert data frames to Latin to use factors. It is somewhat messy to use Cyrillic factors in R. I have found the package at last. > library(stringi) > stri_trans_general("женщина", "cyrillic-latin") [1] "ženŝina" > stri_trans_general("женщина", "russian-latin/bgn") [1] "zhenshchina" After that, the only issue remaining is the "ё" letter. > stri_trans_general("Ёж", "russian-latin/bgn") [1] "Yëzh" I had to remove all the "ё" letters > iconv(stri_trans_general("ёж", "russian-latin/bgn"),from="UTF8",to="ASCII",sub="") [1

Does .NET transliteration library exists? [closed]

落花浮王杯 提交于 2019-12-01 16:10:36
Does .NET Transliteration library exists ? Note that this is not translation, something like this Perl lib : http://www.lingua-systems.com/transliteration/Lingua-Translit-Perl-module/ I just find : http://transliterator.codeplex.com/ Dima Stefantsov Check my UnidecodeSharpFork . It's based on great Python Unidecode transliteration tables, support many languages. Example usage: Assert.AreEqual("CZSczs", "ČŽŠčžš".Unidecode()); Assert.AreEqual("Hello, World!", "Hello, World!".Unidecode()); Assert.AreEqual("Rabota s kirillitsey", "Работа с кириллицей".Unidecode()); Simple, fast and powerful. And

Convert accented characters into ascii character

做~自己de王妃 提交于 2019-11-28 23:14:24
What is the optimal way to to remove German (or French) accents from a vector of 16 million string variables. e.g., 'Sjögren's syndrome' into 'Sjogren's syndrome' Converstion of single character into a single character is better then transliteration such as ä => ae ö => oe ü => ue. e.g., using regular expression would be one option but is there something better (R package for this)? gsub('ü','u',gsub('ö','o',"Sjögren's syndrome ( über) ")) There are SO solutions for non-R platforms but not a good one for R. Use iconv to convert to ASCII with transliteration (if supported): iconv(c("über",

Transliterate any convertible utf8 char into ascii equivalent

痞子三分冷 提交于 2019-11-28 20:30:37
Is there any good solution out there that does this transliteration in a good manner? I've tried using iconv() , but is very annoying and it does not behave as one might expect. Using //TRANSLIT will try to replace what it can, leaving everything nonconvertible as "?" Using //IGNORE will not leave "?" in text, but will also not transliterate and will also raise E_NOTICE when nonconvertible char is found, so you have to use iconv with @ error suppressor Using //IGNORE//TRANSLIT (as some people suggested in PHP forum) is actually same as //IGNORE (tried it myself on php versions 5.3.2 and 5.3.13