How can I change extended latin characters to their unaccented ASCII equivalents?

前端 未结 5 1481
萌比男神i
萌比男神i 2021-01-18 09:50

I need a generic transliteration or substitution regex that will map extended latin characters to similar looking ASCII characters, and all other extended characters to \'\'

5条回答
  •  抹茶落季
    2021-01-18 10:35

    Text::Unaccent or alternatively Text::Unaccent::PurePerl sounds like what you're asking for, at least the first half of it.

    $unaccented = unac_string($charset, $string);
    

    Removing all non-ASCII characters would be a relatively simple.

    s/[^\000-\177]+//g;
    

提交回复
热议问题