why python json.dumps complains about ascii decoding?

流过昼夜 提交于 2019-12-23 15:11:29

问题


I have the following lines in my code

outs = codecs.getwriter('utf-8')(sys.stdout)
# dJSON contains JSON message with non-ASCII chars
outs.write(json.dumps(dJSON,encoding='utf-8', ensure_ascii=False, indent=indent_val))

I am getting the following exception:

    outs.write(json.dumps(dJSON,encoding='utf-8', ensure_ascii=False, indent=indent_val))
    File "/usr/lib/python2.7/json/__init__.py", line 238, in dumps
         **kw).encode(obj)
    File "/usr/lib/python2.7/json/encoder.py", line 204, in encode
         return ''.join(chunks)
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 27: ordinal not in range(128)

I through that by specifying encoding='utf-8' in the json.dumps statement, I avoid this type of problem. Why am I still getting the error?


回答1:


My guess is that dJSON object does not contain pure unicode but it contains mix of unicode and strings already encoded as utf-8 e.g. this fails

>>> d = {u'name':u'पाइथन'.encode('utf-8')}
>>> json.dumps(d, encoding='utf-8', ensure_ascii=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 204, in encode
    return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 1: ordinal not in range(128)

But this works (everything unicode)

>>> d = {u'name':u'पाइथन'}
>>> json.dumps(d, encoding='utf-8', ensure_ascii=False)
u'{"name": "\u092a\u093e\u0907\u0925\u0928"}

Though this also works (everything string)

>>> d = {'name':u'पाइथन'.encode('utf-8')}
>>> json.dumps(d, encoding='utf-8', ensure_ascii=False)
'{"name": "\xe0\xa4\xaa\xe0\xa4\xbe\xe0\xa4\x87\xe0\xa4\xa5\xe0\xa4\xa8"}'



回答2:


There is a workaround: pass the utf8 encoding (not utf-8!) to dumps method. In this case it'll force all strings to be decoded to unicode first, and you can use a mix of unicode strings and strings already encoded as UTF-8. Why does it work? Because there is such a thing in the source code of JSONEncoder:

if self.encoding != 'utf-8':
     def _encoder(o, _orig_encoder=_encoder, _encoding=self.encoding):
         if isinstance(o, str):
             o = o.decode(_encoding)
         return _orig_encoder(o)

That's what we need, and it'll not work out of the box. But when we change the encoding to utf8 (that's absolutely the same UTF-8 as utf-8), we force this _encoder to be defined and everything works just fine :)




回答3:


as per previous answer, you can work around this with utf8 vs utf-8, but it does not include the "copy paste this" fix.

here's the copy-paste this fix ;P

your_unicode_result = json.dumps(your_dict, encoding="utf8", ensure_ascii=False)



来源:https://stackoverflow.com/questions/18990021/why-python-json-dumps-complains-about-ascii-decoding

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!