UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

后端未结

关注

 29  2920

I\'m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

The problem is that

相关标签:

29条回答

说谎

2020-11-21 05:05
You need to read the Python Unicode HOWTO. This error is the very first example.

Basically, stop using str to convert from unicode to encoded text / bytes.

Instead, properly use .encode() to encode the string:
```
p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()
```
or work entirely in unicode.
0 讨论(0)
发布评论:

提交评论
- 加载中...
小鲜肉

2020-11-21 05:05
I found elegant work around for me to remove symbols and continue to keep string as string in follows:
```
yourstring = yourstring.encode('ascii', 'ignore').decode('ascii')
```
It's important to notice that using the ignore option is dangerous because it silently drops any unicode(and internationalization) support from the code that uses it, as seen here (convert unicode):
```
>>> u'City: Malmö'.encode('ascii', 'ignore').decode('ascii')
'City: Malm'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2020-11-21 05:06
Please open terminal and fire the below command:
```
export LC_ALL="en_US.UTF-8"
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2020-11-21 05:08
For me, what worked was:
```
BeautifulSoup(html_text,from_encoding="utf-8")
```
Hope this helps someone.
0 讨论(0)
发布评论:

提交评论
- 加载中...

谎友^

2020-11-21 05:08

Simple helper functions found here.

def safe_unicode(obj, *args):
    """ return the unicode representation of obj """
    try:
        return unicode(obj, *args)
    except UnicodeDecodeError:
        # obj is byte string
        ascii_text = str(obj).encode('string_escape')
        return unicode(ascii_text)

def safe_str(obj):
    """ return the byte string representation of obj """
    try:
        return str(obj)
    except UnicodeEncodeError:
        # obj is unicode
        return unicode(obj).encode('unicode_escape')

0 讨论(0)

爱一瞬间的悲伤

2020-11-21 05:08
Update for python 3.0 and later. Try the following in the python editor:
```
locale-gen en_US.UTF-8
export LANG=en_US.UTF-8 LANGUAGE=en_US.en
LC_ALL=en_US.UTF-8
```
This sets the system`s default locale encoding to the UTF-8 format.

More can be read here at PEP 538 -- Coercing the legacy C locale to a UTF-8 based locale.
0 讨论(0)
发布评论:

提交评论
- 加载中...