问题

I have a model class that looks like the following:

class Address(models.Model):
    # taking length of address/city fields from existing UserProfile model
    address_1 = models.CharField(max_length=128,
                                 blank=False,
                                 null=False)

    address_2 = models.CharField(max_length=128,
                                 blank=True,
                                 null=True)

    address_3 = models.CharField(max_length=128,
                                 blank=True,
                                 null=True)

    unit = models.CharField(max_length=10,
                            blank=True,
                            null=True)

    city = models.CharField(max_length=128,
                            blank=False,
                            null=False)

    state_or_province = models.ForeignKey(StateOrProvince)

    postal_code = models.CharField(max_length=20,
                                   blank=False,
                                   null=False)

    phone = models.CharField(max_length=20,
                             blank=True,
                             null=True)

    is_deleted = models.BooleanField(default=False,
                                     null=False)

    def __unicode__(self):
        return u"{}, {} {}, {}".format(
            self.city, self.state_or_province.postal_abbrev, self.postal_code, self.address_1)

The key being the __unicode__ method. I have a customer model that has a foreign key field to this table, and I am doing the following logging:

log.debug(u'Generated customer [{}]'.format(vars(customer)))

This works fine, but if an address_1 field value contains a non ascii value, say

57562 Vån Ness Hwy

the system is throwing the following exception:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 345: ordinal not in range(128)

I tracked this down to a strange method in django/db/models/base.py:

def __repr__(self):
        try:
            u = six.text_type(self)
        except (UnicodeEncodeError, UnicodeDecodeError):
            u = '[Bad Unicode data]'
        return force_str('<%s: %s>' % (self.__class__.__name__, u))

as you can see, this method is getting called to force_str, which doesn't get handled correctly. is this a bug? if unicode is getting called on my object, shouldn't everything be in unicode?

回答1:

According to the docs, when a python object is passed as an argument to '{}'.format(obj),

A general convention is that an empty format string ("") [within the "{}"] produces the same result as if you had called str() on the value.

This means you're effectively calling str(vars(customer)), and vars(customer) returns a dict.

Calling str() on a dict will call repr() on its keys and values because otherwise you'd get ambiguous output (eg str(1) == str('1') == '1' but repr(1) == '1' and repr('1') == '"1"' (see Difference between __str__ and __repr__ in Python)

Therefore repr() is still being called on your Address, which returns a string.

Now returning unicode from repr() is not allowed in Python 2 - https://stackoverflow.com/a/3627835/648176, so you'll need to either override __str__() in your model to make it handle decoding into ascii (Django docs), or do something like:

string_dict = {str(k): str(v) for (k, v) in vars(customer).items()}
log.debug(u'Generated customer [{}]'.format(string_dict))

回答2:

Try decode for non utf-8 chars with:

def __unicode__(self):
        return u"{}, {} {}, {}".format(
            self.city, self.state_or_province.postal_abbrev, self.postal_code, self.address_1.decode('utf-8'))

回答3:

This is more of a hack that a pretty answer, but I'll still throw my two cents to the pile. Just subclass the "logging.Handler" you are using, and change the 'emit' method (if it is the one causing the exceptions).

Pros

Very easy to setup. After setup, no actions required with any model/data.

Cons

The result is that there will be no UnicodeErrors, but the log file will have "strange looking strings starting with a backslash" where ever there was a unicode mark. For example 🦄 will turn into '\xf0\x9f\xa6\x84\'. Perhaps you could use a script to translate the '\xf0\x9f\xa6\x84\' back to unicode inside the log file when needed.

The steps are

1) Make a "custom_logging.py", which you can import to your settings.py

from logging import FileHandler

class Utf8FileHandler(FileHandler):
    """
          This is a hack-around version of the logging.Filehandler

        Prevents errors of the type
        UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f984' in position 150: character maps to <undefined>
    """
    def __init__(self, *args, **kwargs):
        FileHandler.__init__(self, *args, **kwargs)

    def emit(self, record):
        """
        Emit a record.

        If a formatter is specified, it is used to format the record.
        The record is then written to the stream with a trailing newline.  If
        exception information is present, it is formatted using
        traceback.print_exception and appended to the stream.  If the stream
        has an 'encoding' attribute, it is used to determine how to do the
        output to the stream.
        """
        try:
            msg = self.format(record)
            stream = self.stream
            stream.write(msg)
            stream.write(self.terminator)
            self.flush()
        except Exception:
            # The hack.
            try:
                stream.write(str(msg.encode('utf-8'))[2:-1])
                stream.write(self.terminator)
                self.flush()
            # End of the hack.
            except Exception:
                self.handleError(record)

2) In your settings.py, use your custom made filehandler, like this (set the LOGGING['handlers']['file']['class'] to point to the custom_logging module.):

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'verbose': {
            'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
        },
    },
    'handlers': {
        'file': {
            'level': 'DEBUG',
            'class': 'config.custom_logging.Utf8FileHandler',
            'filename': secrets['DJANGO_LOG_FILE'],
            'formatter': 'verbose',
        },
    },
    'loggers': {
        'django': {
            'handlers': ['file'],
            'level': 'DEBUG',
            'propagate': True,
        },
    },
}

来源：https://stackoverflow.com/questions/32193797/django-model-unicode-raising-exception-when-logging

标签

django