How to fix: “UnicodeDecodeError: 'ascii' codec can't decode byte”

前端 未结 19 1581
谎友^
谎友^ 2020-11-22 01:21
as3:~/ngokevin-site# nano content/blog/20140114_test-chinese.mkd
as3:~/ngokevin-site# wok
Traceback (most recent call last):
File \"/usr/local/bin/wok\", line 4, in
         


        
19条回答
  •  名媛妹妹
    2020-11-22 01:54

    I find the best is to always convert to unicode - but this is difficult to achieve because in practice you'd have to check and convert every argument to every function and method you ever write that includes some form of string processing.

    So I came up with the following approach to either guarantee unicodes or byte strings, from either input. In short, include and use the following lambdas:

    # guarantee unicode string
    _u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
    _uu = lambda *tt: tuple(_u(t) for t in tt) 
    # guarantee byte string in UTF8 encoding
    _u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
    _uu8 = lambda *tt: tuple(_u8(t) for t in tt)
    

    Examples:

    text='Some string with codes > 127, like Zürich'
    utext=u'Some string with codes > 127, like Zürich'
    print "==> with _u, _uu"
    print _u(text), type(_u(text))
    print _u(utext), type(_u(utext))
    print _uu(text, utext), type(_uu(text, utext))
    print "==> with u8, uu8"
    print _u8(text), type(_u8(text))
    print _u8(utext), type(_u8(utext))
    print _uu8(text, utext), type(_uu8(text, utext))
    # with % formatting, always use _u() and _uu()
    print "Some unknown input %s" % _u(text)
    print "Multiple inputs %s, %s" % _uu(text, text)
    # but with string.format be sure to always work with unicode strings
    print u"Also works with formats: {}".format(_u(text))
    print u"Also works with formats: {},{}".format(*_uu(text, text))
    # ... or use _u8 and _uu8, because string.format expects byte strings
    print "Also works with formats: {}".format(_u8(text))
    print "Also works with formats: {},{}".format(*_uu8(text, text))
    

    Here's some more reasoning about this.

提交回复
热议问题