UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

后端 未结 29 3003
余生分开走
余生分开走 2020-11-21 04:43

I\'m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

The problem is that

29条回答
  •  说谎
    说谎 (楼主)
    2020-11-21 04:49

    This is a classic python unicode pain point! Consider the following:

    a = u'bats\u00E0'
    print a
     => batsà
    

    All good so far, but if we call str(a), let's see what happens:

    str(a)
    Traceback (most recent call last):
      File "", line 1, in 
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)
    

    Oh dip, that's not gonna do anyone any good! To fix the error, encode the bytes explicitly with .encode and tell python what codec to use:

    a.encode('utf-8')
     => 'bats\xc3\xa0'
    print a.encode('utf-8')
     => batsà
    

    Voil\u00E0!

    The issue is that when you call str(), python uses the default character encoding to try and encode the bytes you gave it, which in your case are sometimes representations of unicode characters. To fix the problem, you have to tell python how to deal with the string you give it by using .encode('whatever_unicode'). Most of the time, you should be fine using utf-8.

    For an excellent exposition on this topic, see Ned Batchelder's PyCon talk here: http://nedbatchelder.com/text/unipain.html

提交回复
热议问题