Slugify and Character Transliteration in C#

落爺英雄遲暮 提交于 2019-11-28 06:33:36

I would also like to add that the //TRANSLIT removes the apostrophes and that @jxac solution doesn't address that. I'm not sure why but by first encoding it to Cyrillic and then to ASCII you get a similar behavior as //TRANSLIT.

var str = "éåäöíØ";
var noApostrophes = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(str)); 

=> "eaaoiO"
ikutsin

There is a .NET library for transliteration on codeplex - unidecode. It generally does the trick using Unidecode tables ported from python.

user262976

conversion to string:

byte[] unicodeBytes = Encoding.Unicode.GetBytes(str);
byte[] asciiBytes = Encoding.Convert(Encoding.Unicode, Encoding.ASCII, unicodeBytes);
string asciiString = Encoding.ASCII.GetString(asciiBytes);

conversion to bytes:

byte[] ascii = Encoding.ASCII.GetBytes(str);

@Thomas Levesque is right, will get encoded by the output stream...

to remove the diacritics (accent marks), you can use the String.Normalize function, as detailed here:

http://www.siao2.com/2007/05/14/2629747.aspx

that should take care of most of the cases (where the glyph is really a character plus an accent mark). for an even more aggressive char matching (to take care of cases like the Scandinavian slashed o [Ø], digraphs, and other exotic glyphs), there's the table approach:

http://www.codeproject.com/KB/cs/UnicodeNormalization.aspx

this includes around 1,000 symbol mappings in addition to the normalization.

(note, all punctuation is removed by the regex replace in your example)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!