OverflowError: Unsupported UTF-8 sequence length when > encoding string

问题

Inside a Twisted Resource, I am returning a json encoded dict as the response var below. The data is a list of 5 people with name, guid, and a couple other fields less than 32 characters long each, so not a ton of data.

I get this OverflowError exception pretty often, but I don't quite understand what the unsupported utf-8 sequence length refers to.

self.request.write(ujson.dumps(response))

exceptions.OverflowError: Unsupported UTF-8 sequence length when encoding string

回答1:

When in doubt, check the source: http://code.google.com/p/rapidjson/source/browse/trunk/thirdparty/ultrajson/ultrajsonenc.c

This error happens when the UTF-8 length is 5 or 6 bytes. This JSON implementation doesn't implement that. Those characters won't work if you're using the data in a browser anyway, since they're outside the range of UTF-16.

I'd be surprised if this actually happened often; it'd only happen with Unicode codepoints over U+1FFFFF, which are vanishingly rare, and not even supported in Unicode strings by most builds of Python due to being outside this range. You should find out why these characters are showing up in your data.

回答2:

Just a note that I recently encountered this same error, and can give a little background.

If you see this, it's possible you're trying to json encode a Mongo Object with ujson in python.

Using the native python library, we get a more helpful error message:

TypeError: ObjectId('510652d322fc956ca9e41342') is not JSON serializable

ujson is somehow trying to parse an ObjectId python object and getting lost. There are a few options, the most direct being wiping the '_id' field from Mongo before saving. You could also subclass ujson to somehow parse or munge the ObjectIds into a simple character string.

来源：https://stackoverflow.com/questions/8422243/overflowerror-unsupported-utf-8-sequence-length-when-encoding-string

标签

python

twisted