UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

后端未结

关注

 29  2938

余生分开走 2020-11-21 04:43

I\'m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

The problem is that

29条回答

野性不改 (楼主)

2020-11-21 05:02

I had this issue trying to output Unicode characters to stdout, but with sys.stdout.write, rather than print (so that I could support output to a different file as well).

From BeautifulSoup's own documentation, I solved this with the codecs library:

import sys
import codecs

def main(fIn, fOut):
    soup = BeautifulSoup(fIn)
    # Do processing, with data including non-ASCII characters
    fOut.write(unicode(soup))

if __name__ == '__main__':
    with (sys.stdin) as fIn: # Don't think we need codecs.getreader here
        with codecs.getwriter('utf-8')(sys.stdout) as fOut:
            main(fIn, fOut)

0 讨论(0)

查看其它29个回答