UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

后端 未结 29 2803
余生分开走
余生分开走 2020-11-21 04:43

I\'m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

The problem is that

相关标签:
29条回答
  • 2020-11-21 05:05

    You need to read the Python Unicode HOWTO. This error is the very first example.

    Basically, stop using str to convert from unicode to encoded text / bytes.

    Instead, properly use .encode() to encode the string:

    p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()
    

    or work entirely in unicode.

    0 讨论(0)
  • 2020-11-21 05:05

    I found elegant work around for me to remove symbols and continue to keep string as string in follows:

    yourstring = yourstring.encode('ascii', 'ignore').decode('ascii')
    

    It's important to notice that using the ignore option is dangerous because it silently drops any unicode(and internationalization) support from the code that uses it, as seen here (convert unicode):

    >>> u'City: Malmö'.encode('ascii', 'ignore').decode('ascii')
    'City: Malm'
    
    0 讨论(0)
  • 2020-11-21 05:06

    Please open terminal and fire the below command:

    export LC_ALL="en_US.UTF-8"
    
    0 讨论(0)
  • 2020-11-21 05:08

    For me, what worked was:

    BeautifulSoup(html_text,from_encoding="utf-8")
    

    Hope this helps someone.

    0 讨论(0)
  • 2020-11-21 05:08

    Simple helper functions found here.

    def safe_unicode(obj, *args):
        """ return the unicode representation of obj """
        try:
            return unicode(obj, *args)
        except UnicodeDecodeError:
            # obj is byte string
            ascii_text = str(obj).encode('string_escape')
            return unicode(ascii_text)
    
    def safe_str(obj):
        """ return the byte string representation of obj """
        try:
            return str(obj)
        except UnicodeEncodeError:
            # obj is unicode
            return unicode(obj).encode('unicode_escape')
    
    0 讨论(0)
  • 2020-11-21 05:08

    Update for python 3.0 and later. Try the following in the python editor:

    locale-gen en_US.UTF-8
    export LANG=en_US.UTF-8 LANGUAGE=en_US.en
    LC_ALL=en_US.UTF-8
    

    This sets the system`s default locale encoding to the UTF-8 format.

    More can be read here at PEP 538 -- Coercing the legacy C locale to a UTF-8 based locale.

    0 讨论(0)
提交回复
热议问题