问题
I wrote a small example of the issue for everybody to see what's going on using Python 2.7 and Django 1.10.8
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, unicode_literals, print_function
import time
from django import setup
setup()
from django.contrib.auth.models import Group
group = Group(name='schön')
print(type(repr(group)))
print(type(str(group)))
print(type(unicode(group)))
print(group)
print(repr(group))
print(str(group))
print(unicode(group))
time.sleep(1.0)
print('%s' % group)
print('%r' % group) # fails
print('%s' % [group]) # fails
print('%r' % [group]) # fails
Exits with the following output + traceback
$ python .PyCharmCE2017.2/config/scratches/scratch.py
<type 'str'>
<type 'str'>
<type 'unicode'>
schön
<Group: schön>
schön
schön
schön
Traceback (most recent call last):
File "/home/srkunze/.PyCharmCE2017.2/config/scratches/scratch.py", line 22, in <module>
print('%r' % group) # fails
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 11: ordinal not in range(128)
Has somebody an idea what's going on here?
回答1:
At issue here is that you are interpolating UTF-8 bytestrings into a Unicode string. Your '%r'
string is a Unicode string because you used from __future__ import unicode_literals
, but repr(group)
(used by the %r
placeholder) returns a bytestring. For Django models, repr()
can include Unicode data in the representation, encoded to a bytestring using UTF-8. Such representations are not ASCII safe.
For your specific example, repr()
on your Group
instance produces the bytestring '<Group: sch\xc3\xb6n>'
. Interpolating that into a Unicode string triggers the implicit decoding:
>>> u'%s' % '<Group: sch\xc3\xb6n>'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 11: ordinal not in range(128)
Note that I did not use from __future__ import unicode_literals
in my Python session, so the '<Group: sch\xc3\xb6n>'
string is not a unicode
object, it is a str
bytestring object!
In Python 2, you should avoid mixing Unicode and byte strings. Always explicitly normalise your data (encoding Unicode to bytes or decoding bytes to Unicode).
If you must use from __future__ import unicode_literals
, you can still create bytestrings by using a b
prefix:
>>> from __future__ import unicode_literals
>>> type('') # empty unicode string
<type 'unicode'>
>>> type(b'') # empty bytestring, note the b prefix
<type 'str'>
>>> b'%s' % b'<Group: sch\xc3\xb6n>' # two bytestrings
'<Group: sch\xc3\xb6n>'
回答2:
I had a hard time finding general solution to your problem.
__repr__()
is what I understand supposed to return str, any efforts to change that seems to cause new problems.
Regarding the fact that the __repr__()
method is defined outside the project, you are able to overload methods. For example
def new_repr(self):
return 'My representation of self {}'.format(self.name)
Group.add_to_class("__repr__", new_repr)
The only solution I can find, that works is to explicitly tell the interpreter how to handle the strings.
from __future__ import unicode_literals
from django.contrib.auth.models import Group
group = Group(name='schön')
print(type(repr(group)))
print(type(str(group)))
print(type(unicode(group)))
print(group)
print(repr(group))
print(str(group))
print(unicode(group))
print('%s' % group)
print('%r' % repr(group))
print('%s' % [str(group)])
print('%r' % [repr(group)])
# added
print('{}'.format([repr(group).decode("utf-8")]))
print('{}'.format([repr(group)]))
print('{}'.format(group))
Working with strings in python 2.x is a mess. Hope this brings some light into how to work around (which is the only way I can find) the problem.
回答3:
I think the real issue is in the django code.
It was reported six years ago:
https://code.djangoproject.com/ticket/18063
I think patch to django would solve it:
def __repr__(self):
return self.....encode('ascii', 'replace')
I think the repr() method should return "7 bit ascii".
回答4:
If it's the case then we need to override the unicode method with our customised method. Try below code. It will work. I have tested it.
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from django.contrib.auth.models import Group
def custom_unicode(self):
return u"%s" % (self.name.encode('utf-8', 'ignore'))
Group.__unicode__ = custom_unicode
group = Group(name='schön')
# Tests
print(type(repr(group)))
print(type(str(group)))
print(type(unicode(group)))
print(group)
print(repr(group))
print(str(group))
print(unicode(group))
print('%s' % group)
print('%r' % group)
print('%s' % [group])
print('%r' % [group])
# output:
<type 'str'>
<type 'str'>
<type 'unicode'>
schön
<Group: schön>
schön
schön
schön
<Group: schön>
[<Group: schön>]
[<Group: schön>]
Reference: https://docs.python.org/2/howto/unicode.html
回答5:
I am not familiar with Django. Your issue seems to be representing text data in ASCI which is actually in unicode. Please try unidecode module in Python.
from unidecode import unidecode
#print(string) is replaced with
print(unidecode(string))
Refer Unidecode
来源:https://stackoverflow.com/questions/46726926/unicodedecodeerror-using-django-and-format-strings