I have dictionary of dictionary which has utf8 encoded keys. I am dumping this dictionary to a file using json
module.
In the file keys are printed in utf8
You are writing JSON; the JSON standard allows for \uxxxx
escape sequences to encode non-ASCII characters. The Python json
module uses this by default.
Switch off the feature by using the ensure_ascii=False
switch when dumping the data:
json.dump(obj, yourfileobject, ensure_ascii=False)
This does mean that the output is no longer encoded to UTF-8 bytes as well; you'll need to use a codecs.open()
managed file for this:
import json
import codecs
with codecs.open('/path/to/file', 'w', encoding='utf8') as output:
json.dump(obj, output, ensure_ascii=False)
Now your unicode characters will be written to the file as UTF-8 encoded bytes instead. When opening the file with another program that decodes UTF-8 again, your codepoints should be displayed again as the same characters.
use ensure_ascii
parameter.
>>> import json
>>> print json.dumps(u'\u0982')
"\u0982"
>>> print json.dumps(u'\u0982', ensure_ascii=False)
"ং"
http://docs.python.org/2/library/json.html#json.dump
If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the result is a str instance consisting of ASCII characters only. If ensure_ascii is False, some chunks written to fp may be unicode instances. ...