use string.translate in Python to transliterate Cyrillic?

前端 未结 3 527
星月不相逢
星月不相逢 2020-12-29 07:44

I\'m getting UnicodeEncodeError: \'ascii\' codec can\'t encode characters in position 0-51: ordinal not in range(128) exception trying to use string.maket

相关标签:
3条回答
  • 2020-12-29 08:12

    translate behaves differently when used with unicode strings. Instead of a maketrans table, you have to provide a dictionary ord(search)->ord(replace):

    symbols = (u"абвгдеёжзийклмнопрстуфхцчшщъыьэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ",
               u"abvgdeejzijklmnoprstufhzcss_y_euaABVGDEEJZIJKLMNOPRSTUFHZCSS_Y_EUA")
    
    tr = {ord(a):ord(b) for a, b in zip(*symbols)}
    
    # for Python 2.*:
    # tr = dict( [ (ord(a), ord(b)) for (a, b) in zip(*symbols) ] )
    
    text = u'Добрый Ден'
    print text.translate(tr)  # looks good
    

    That said, I'd second the suggestion not to reinvent the wheel and to use an established library: http://pypi.python.org/pypi/Unidecode

    0 讨论(0)
  • 2020-12-29 08:14

    Check out the CyrTranslit package, it's specifically made to transliterate from and to Cyrillic script text. It currently supports Serbian, Montenegrin, Macedonian, and Russian.

    Example usage:

    >>> import cyrtranslit
    >>> cyrtranslit.supported()
    ['me', 'sr', 'mk', 'ru']
    
    >>> cyrtranslit.to_latin('Моё судно на воздушной подушке полно угрей', 'ru')
    'Moyo sudno na vozdushnoj podushke polno ugrej'
    
    >>> cyrtranslit.to_cyrillic('Moyo sudno na vozdushnoj podushke polno ugrej')
    'Моё судно на воздушной подушке полно угрей'
    
    0 讨论(0)
  • 2020-12-29 08:18

    You can use transliterate package (https://pypi.python.org/pypi/transliterate)

    Example #1:

    from transliterate import translit
    print translit("Lorem ipsum dolor sit amet", "ru")
    # Лорем ипсум долор сит амет
    

    Example #2:

    print translit(u"Лорем ипсум долор сит амет", "ru", reversed=True)
    # Lorem ipsum dolor sit amet
    
    0 讨论(0)
提交回复
热议问题