python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 9629: character maps to

前端 未结 4 597
故里飘歌
故里飘歌 2021-01-28 06:05

I\'m trying to make a script that gets data out from an sqlite3 database, but I have run in to a problem.

The field in the database is of type text and the contains a ht

4条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-28 06:46

    While Python 3 deals in Unicode, the Windows console or POSIX tty that you're running inside does not. So, whenever you print, or otherwise send Unicode strings to stdout, and it's attached to a console/tty, Python has to encode it.

    The error message indirectly tells you what character set Python was trying to use:

      File "C:\Python32\lib\encodings\cp850.py", line 19, in encode
    

    This means the charset is cp850.

    You can test or yourself that this charset doesn't have the appropriate character just by doing '\u2013'.encode('cp850'). Or you can look up cp850 online (e.g., at Wikipedia).

    It's possible that Python is guessing wrong, and your console is really set for, say UTF-8. (In that case, just manually set sys.stdout.encoding='utf-8'.) It's also possible that you intended your console to be set for UTF-8 but did something wrong. (In that case, you probably want to follow up at superuser.com.)

    But if nothing is wrong, you just can't print that character. You will have to manually encode it with one of the non-strict error-handlers. For example:

    >>> '\u2013'.encode('cp850')
    UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 0: character maps to 
    >>> '\u2013'.encode('cp850', errors='replace')
    b'?'
    

    So, how do you print a string that won't print on your console?

    You can replace every print function with something like this:

    >>> print(r['body'].encode('cp850', errors='replace').decode('cp850'))
    ?
    

    … but that's going to get pretty tedious pretty fast.

    The simple thing to do is to just set the error handler on sys.stdout:

    >>> sys.stdout.errors = 'replace'
    >>> print(r['body'])
    ?
    

    For printing to a file, things are pretty much the same, except that you don't have to set f.errors after the fact, you can set it at construction time. Instead of this:

    with open('path', 'w', encoding='cp850') as f:
    

    Do this:

    with open('path', 'w', encoding='cp850', errors='replace') as f:
    

    … Or, of course, if you can use UTF-8 files, just do that, as Mark Ransom's answer shows:

    with open('path', 'w', encoding='utf-8') as f:
    

提交回复
热议问题