I have a json file which happens to have a multitude of Chinese and Japanese (and other language) characters. I\'m loading it into my python 2.7 script using io.open
The JSON module handles encoding and decoding for you, so you can simply open the input and output files in binary mode. The JSON module assumes UTF-8 encoding, but can be changed using encoding
attribute on the load()
and dump()
methods.
with open('multiIdName.json', 'rb') as json_data:
cards = json.load(json_data)
then:
with open("testJson.json", 'wb') as outfile:
json.dump(cards, outfile, ensure_ascii=False)
Thanks to @Antti Haapala, Python 2.x JSON module gives either Unicode or str depending on the contents of the object.
You will have to add a sense check to ensure the result is a Unicode before writing through io
:
with io.open("testJson.json", 'w', encoding="utf-8") as outfile:
my_json_str = json.dumps(my_obj, ensure_ascii=False)
if isinstance(my_json_str, str):
my_json_str = my_json_str.decode("utf-8")
outfile.write(my_json_str)
The reason for this error is the completely stupid behaviour of json.dumps
in Python 2:
>>> json.dumps({'a': 'a'}, ensure_ascii=False)
'{"a": "a"}'
>>> json.dumps({'a': u'a'}, ensure_ascii=False)
u'{"a": "a"}'
>>> json.dumps({'a': 'ä'}, ensure_ascii=False)
'{"a": "\xc3\xa4"}'
>>> json.dumps({u'a': 'ä'}, ensure_ascii=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 210, in encode
return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
This coupled with the fact that io.open
with encoding
set only accepts unicode
objects (which by itself is right), leads to problems.
The return type is completely dependent on whatever is the type of keys or values in the dictionary, if ensure_ascii=False
, but str
is returned always if ensure_ascii=True
. If you can accidentally set 8-bit strings to dictionaries, you cannot blindly convert this return type to unicode
, because you need to set the encoding, presumably UTF-8:
>>> x = json.dumps(obj, ensure_ascii=False)
>>> if isinstance(x, str):
... x = unicode(x, 'UTF-8')
In this case I believe you can use the json.dump
to write to an open binary file; however if you need to do something more complicated with the resulting object, you probably need the above code.
One solution is to end all this encoding/decoding madness by switching to Python 3.
Can you try the following?
with io.open("testJson.json",'w',encoding="utf-8") as outfile:
outfile.write(unicode(json.dumps(cards, ensure_ascii=False)))