UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

后端 未结 29 2800
余生分开走
余生分开走 2020-11-21 04:43

I\'m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

The problem is that

29条回答
  •  野性不改
    2020-11-21 05:02

    I had this issue trying to output Unicode characters to stdout, but with sys.stdout.write, rather than print (so that I could support output to a different file as well).

    From BeautifulSoup's own documentation, I solved this with the codecs library:

    import sys
    import codecs
    
    def main(fIn, fOut):
        soup = BeautifulSoup(fIn)
        # Do processing, with data including non-ASCII characters
        fOut.write(unicode(soup))
    
    if __name__ == '__main__':
        with (sys.stdin) as fIn: # Don't think we need codecs.getreader here
            with codecs.getwriter('utf-8')(sys.stdout) as fOut:
                main(fIn, fOut)
    

提交回复
热议问题