Python 2 maketrans() function doesn't work with Unicode: “the arguments are different lengths” when they actually are

后端 未结 1 824
半阙折子戏
半阙折子戏 2020-12-30 11:07

[Python 2] SUB = string.maketrans(\"0123456789\",\"₀₁₂₃₄₅₆₇₈₉\")

this code produces the error:

ValueError: maketrans arguments must have same len         


        
1条回答
  •  隐瞒了意图╮
    2020-12-30 11:36

    No, the arguments are not the same length:

    >>> len("0123456789")
    10
    >>> len("₀₁₂₃₄₅₆₇₈₉")
    30
    

    You are trying to pass in encoded data; I used UTF-8 here, where each digit is encoded to 3 bytes each.

    You cannot use str.translate() to map ASCII bytes to UTF-8 byte sequences. Decode your string to unicode and use the slightly different unicode.translate() method; it takes a dictionary instead:

    nummap = {ord(c): ord(t) for c, t in zip(u"0123456789", u"₀₁₂₃₄₅₆₇₈₉")}
    

    This creates a dictionary mapping Unicode codepoints (integers), which you can then use on a Unicode string:

    >>> nummap = {ord(c): ord(t) for c, t in zip(u"0123456789", u"₀₁₂₃₄₅₆₇₈₉")}
    >>> u'99 bottles of beer on the wall'.translate(nummap)
    u'\u2089\u2089 bottles of beer on the wall'
    >>> print u'99 bottles of beer on the wall'.translate(nummap)
    ₉₉ bottles of beer on the wall
    

    You can then encode the output to UTF-8 again if you so wish.

    From the method documentation:

    For Unicode objects, the translate() method does not accept the optional deletechars argument. Instead, it returns a copy of the s where all characters have been mapped through the given translation table which must be a mapping of Unicode ordinals to Unicode ordinals, Unicode strings or None. Unmapped characters are left untouched. Characters mapped to None are deleted.

    0 讨论(0)
提交回复
热议问题