UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

后端未结

关注

 29  2919

I\'m having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

The problem is that

相关标签:

29条回答

一生所求

2020-11-21 05:08

This problem often happens when a django project deploys using Apache. Because Apache sets environment variable LANG=C in /etc/sysconfig/httpd. Just open the file and comment (or change to your flavior) this setting. Or use the lang option of the WSGIDaemonProcess command, in this case you will be able to set different LANG environment variable to different virtualhosts.

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉梦人生

2020-11-21 05:09
well i tried everything but it did not help, after googling around i figured the following and it helped. python 2.7 is in use.
```
# encoding=utf8
import sys
reload(sys)
sys.setdefaultencoding('utf8')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
不知归路

2020-11-21 05:09
The problem is that you're trying to print a unicode character, but your terminal doesn't support it.

You can try installing language-pack-en package to fix that:
```
sudo apt-get install language-pack-en
```
which provides English translation data updates for all supported packages (including Python). Install different language package if necessary (depending which characters you're trying to print).

On some Linux distributions it's required in order to make sure that the default English locales are set-up properly (so unicode characters can be handled by shell/terminal). Sometimes it's easier to install it, than configuring it manually.

Then when writing the code, make sure you use the right encoding in your code.

For example:
```
open(foo, encoding='utf-8')
```
If you've still a problem, double check your system configuration, such as:
- Your locale file (/etc/default/locale), which should have e.g.
```
LANG="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
```
  or:
```
LC_ALL=C.UTF-8
LANG=C.UTF-8
```
- Value of LANG/LC_CTYPE in shell.
- Check which locale your shell supports by:
```
locale -a | grep "UTF-8"
```
Demonstrating the problem and solution in fresh VM.
1. Initialize and provision the VM (e.g. using vagrant):
```
vagrant init ubuntu/trusty64; vagrant up; vagrant ssh
```
  ^{See: available Ubuntu boxes.}.
2. Printing unicode characters (such as trade mark sign like ™):
```
$ python -c 'print(u"\u2122");'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 0: ordinal not in range(128)
```
3. Now installing language-pack-en:
```
$ sudo apt-get -y install language-pack-en
The following extra packages will be installed:
  language-pack-en-base
Generating locales...
  en_GB.UTF-8... /usr/sbin/locale-gen: done
Generation complete.
```
4. Now problem should be solved:
```
$ python -c 'print(u"\u2122");'
™
```
5. Otherwise, try the following command:
```
$ LC_ALL=C.UTF-8 python -c 'print(u"\u2122");'
™
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
無奈伤痛

2020-11-21 05:09
Add line below at the beginning of your script ( or as second line):
```
# -*- coding: utf-8 -*-
```
That's definition of python source code encoding. More info in PEP 263.
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2020-11-21 05:09
Alas this works in Python 3 at least...

Python 3

Sometimes the error is in the enviroment variables and enconding so
```
import os
import locale
os.environ["PYTHONIOENCODING"] = "utf-8"
myLocale=locale.setlocale(category=locale.LC_ALL, locale="en_GB.UTF-8")
... 
print(myText.encode('utf-8', errors='ignore'))
```
where errors are ignored in encoding.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3 4 5