How to fix: “UnicodeDecodeError: 'ascii' codec can't decode byte”

前端 未结 19 1539
谎友^
谎友^ 2020-11-22 01:21
as3:~/ngokevin-site# nano content/blog/20140114_test-chinese.mkd
as3:~/ngokevin-site# wok
Traceback (most recent call last):
File \"/usr/local/bin/wok\", line 4, in
         


        
19条回答
  •  北恋
    北恋 (楼主)
    2020-11-22 01:54

    This is the classic "unicode issue". I believe that explaining this is beyond the scope of a StackOverflow answer to completely explain what is happening.

    It is well explained here.

    In very brief summary, you have passed something that is being interpreted as a string of bytes to something that needs to decode it into Unicode characters, but the default codec (ascii) is failing.

    The presentation I pointed you to provides advice for avoiding this. Make your code a "unicode sandwich". In Python 2, the use of from __future__ import unicode_literals helps.

    Update: how can the code be fixed:

    OK - in your variable "source" you have some bytes. It is not clear from your question how they got in there - maybe you read them from a web form? In any case, they are not encoded with ascii, but python is trying to convert them to unicode assuming that they are. You need to explicitly tell it what the encoding is. This means that you need to know what the encoding is! That is not always easy, and it depends entirely on where this string came from. You could experiment with some common encodings - for example UTF-8. You tell unicode() the encoding as a second parameter:

    source = unicode(source, 'utf-8')
    

提交回复
热议问题