UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

后端未结

关注

 29  2918

I\'m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

The problem is that

相关标签:

29条回答

深忆病人

2020-11-21 04:51

A subtle problem causing even print to fail is having your environment variables set wrong, eg. here LC_ALL set to "C". In Debian they discourage setting it: Debian wiki on Locale

$ echo $LANG
en_US.utf8
$ echo $LC_ALL 
C
$ python -c "print (u'voil\u00e0')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)
$ export LC_ALL='en_US.utf8'
$ python -c "print (u'voil\u00e0')"
voilà
$ unset LC_ALL
$ python -c "print (u'voil\u00e0')"
voilà

0 讨论(0)

暗喜

2020-11-21 04:54
I always put the code below in the first two lines of the python files:
```
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2020-11-21 04:54
The recommended solution did not work for me, and I could live with dumping all non ascii characters, so
```
s = s.encode('ascii',errors='ignore')
```
which left me with something stripped that doesn't throw errors.
0 讨论(0)
发布评论:

提交评论
- 加载中...
你的背包

2020-11-21 04:55

Many answers here (@agf and @Andbdrew for example) have already addressed the most immediate aspects of the OP question.

However, I think there is one subtle but important aspect that has been largely ignored and that matters dearly for everyone who like me ended up here while trying to make sense of encodings in Python: Python 2 vs Python 3 management of character representation is wildly different. I feel like a big chunk of confusion out there has to do with people reading about encodings in Python without being version aware.

I suggest anyone interested in understanding the root cause of OP problem to begin by reading Spolsky's introduction to character representations and Unicode and then move to Batchelder on Unicode in Python 2 and Python 3.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-11-21 04:56
Just add to a variable encode('utf-8')
```
agent_contact.encode('utf-8')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
你的背包

2020-11-21 04:58
Try to avoid conversion of variable to str(variable). Sometimes, It may cause the issue.

Simple tip to avoid :
```
try: 
    data=str(data)
except:
    data = data #Don't convert to String
```
The above example will solve Encode error also.
0 讨论(0)
发布评论:

提交评论
- 加载中...