transliteration

Slugify and Character Transliteration in C#

落爺英雄遲暮 提交于 2019-11-28 06:33:36
I'm trying to translate the following slugify method from PHP to C#: http://snipplr.com/view/22741/slugify-a-string-in-php/ Edit: For the sake of convenience, here the code from above: /** * Modifies a string to remove al non ASCII characters and spaces. */ static public function slugify($text) { // replace non letter or digits by - $text = preg_replace('~[^\\pL\d]+~u', '-', $text); // trim $text = trim($text, '-'); // transliterate if (function_exists('iconv')) { $text = iconv('utf-8', 'us-ascii//TRANSLIT', $text); } // lowercase $text = strtolower($text); // remove unwanted characters $text

Transliteration from Cyrillic to Latin ICU4j java [duplicate]

风格不统一 提交于 2019-11-28 00:54:24
This question already has an answer here: icu4j cyrillic to latin 3 answers I need to do something rather simple but without hash mapping hard coding. I have a String s and it is in Cyrillic I need some sort of example on how to turn it into Latin characters using a custom filter of a sort (to give a purely Latin example as to not confuse anyone if String s = sniff; I want it to look up s-n-i-f-f and change them into something else (there might also be combinations). I can see that ICU4j can do this sort of thing but I have no idea how to achieve it as I can't find any working examples (or I'm

Convert accented characters into ascii character

爷,独闯天下 提交于 2019-11-27 21:16:47
问题 What is the optimal way to to remove German (or French) accents from a vector of 16 million string variables. e.g., 'Sjögren's syndrome' into 'Sjogren's syndrome' Converstion of single character into a single character is better then transliteration such as ä => ae ö => oe ü => ue. e.g., using regular expression would be one option but is there something better (R package for this)? gsub('ü','u',gsub('ö','o',"Sjögren's syndrome ( über) ")) There are SO solutions for non-R platforms but not a

Cyrillic transliteration in PHP

☆樱花仙子☆ 提交于 2019-11-27 19:43:18
How to transliterate cyrillic characters into latin letters? E.g. Главная страница -> Glavnaja stranica This Transliteration PHP Extension would do this very well, but I can't install it on my server. It would be best to have the same implementation but in PHP. Tural Ali Try following code $textcyr="Тествам с кирилица"; $textlat="I pone dotuk raboti!"; $cyr = [ 'а','б','в','г','д','е','ё','ж','з','и','й','к','л','м','н','о','п', 'р','с','т','у','ф','х','ц','ч','ш','щ','ъ','ы','ь','э','ю','я', 'А','Б','В','Г','Д','Е','Ё','Ж','З','И','Й','К','Л','М','Н','О','П', 'Р','С','Т','У','Ф','Х','Ц','Ч',

How to transliterate Cyrillic to Latin text

僤鯓⒐⒋嵵緔 提交于 2019-11-27 19:26:28
I have a method which turns any Latin text (e.g. English, French, German, Polish) into its slug form, e.g. Alpha Bravo Charlie => alpha-bravo-charlie But it can't work for Cyrillic text (e.g. Russian), so what I'm wanting to do is transliterate the Cyrillic text to Latin characters, then slugify that. Does anyone have a way to do such transliteration? Whether by actual source or a library. I'm coding in C#, so a .NET library will work. Alternatively, if you have non-C# code, I'm sure I could convert it. You can use .NET open source dll library UnidecodeSharpFork to transliterate Cyrillic and

Transliteration in ruby

独自空忆成欢 提交于 2019-11-27 15:07:46
What is the simplest way for transliteration of non English characters in ruby. That is conversion such as: translit "Gévry" #=> "Gevry" Ruby has an Iconv library in its stdlib which converts encodings in a very similar way to the usual iconv command Use the UnicodeUtils gem. This works in 1.9 and 2.0. Iconv has been deprecated in these releases. gem install unicode_utils Then try this in IRB: 2.0.0p0 :001 > require 'unicode_utils' #=> true 2.0.0p0 :002 > r = "Résumé" #=> "Résumé" 2.0.0p0 :003 > r.encoding #=> #<Encoding:UTF-8> 2.0.0p0 :004 > UnicodeUtils.nfkd(r).gsub(/(\p{Letter})\p{Mark}+/,'

How to convert (transliterate) a string from utf8 to ASCII (single byte) in c#?

亡梦爱人 提交于 2019-11-27 12:30:42
I have a string object "with multiple characters and even special characters" I am trying to use UTF8Encoding utf8 = new UTF8Encoding(); ASCIIEncoding ascii = new ASCIIEncoding(); objects in order to convert that string to ascii. May I ask someone to bring some light to this simple task, that is hunting my afternoon. EDIT 1: What we are trying to accomplish is getting rid of special characters like some of the special windows apostrophes. The code that I posted below as an answer will not take care of that. Basically O'Brian will become O?Brian. where ' is one of the special apostrophes This

How do you map-replace characters in Javascript similar to the 'tr' function in Perl?

混江龙づ霸主 提交于 2019-11-27 11:40:43
I've been trying to figure out how to map a set of characters in a string to another set similar to the tr function in Perl. I found this site that shows equivalent functions in JS and Perl , but sadly no tr equivalent. the tr (transliteration) function in Perl maps characters one to one, so data =~ tr|\-_|+/|; would map - => + and _ => / How can this be done efficiently in JavaScript? There isn't a built-in equivalent, but you can get close to one with replace : data = data.replace(/[\-_]/g, function (m) { return { '-': '+', '_': '/' }[m]; }); neniu I can't vouch for 'efficient' but this uses

Python and character normalization

拈花ヽ惹草 提交于 2019-11-27 07:53:02
Hello I retrieve text based utf8 data from a foreign source which contains special chars such as u"ıöüç" while I want to normalize them to English such as "ıöüç" -> "iouc" . What would be the best way to achieve this ? I recommend using Unidecode module : >>> from unidecode import unidecode >>> unidecode(u'ıöüç') 'iouc' Note how you feed it a unicode string and it outputs a byte string. The output is guaranteed to be ASCII. It all depends on how far you want to go in transliterating the result. If you want to convert everything all the way to ASCII ( αβγ to abg ) then unidecode is the way to

icu4j cyrillic to latin

六月ゝ 毕业季﹏ 提交于 2019-11-27 04:28:34
问题 I'm trying to get Cyrillic words to be in latin so I can have them in urls. I use icu4j transliterator, but it still gives weird characters like this: Vilʹândimaa . It should be more like viljandimaa . When I copy that url these letters turn to %.. something useless. Does anybody know how to get Cyrillic to a-z with icu4j? UPDATE Can't answer myself already but found this question that was very helpful: Converting Symbols, Accent Letters to English Alphabet 回答1: Modify your identifier to do