sample code:
>>> import json
>>> json_string = json.dumps(\"ברי צקלה\")
>>> print json_string
\"\\u05d1\\u05e8\\u05d9 \\u05e6\\u05
Using ensure_ascii=False in json.dumps is the right direction to solve this problem, as pointed out by Martijn. However, this may raise an exception:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 1: ordinal not in range(128)
You need extra settings in either site.py or sitecustomize.py to set your sys.getdefaultencoding() correct. site.py is under lib/python2.7/ and sitecustomize.py is under lib/python2.7/site-packages.
If you want to use site.py, under def setencoding(): change the first if 0: to if 1: so that python will use your operation system's locale.
If you prefer to use sitecustomize.py, which may not exist if you haven't created it. simply put these lines:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
Then you can do some Chinese json output in utf-8 format, such as:
name = {"last_name": u"王"}
json.dumps(name, ensure_ascii=False)
You will get an utf-8 encoded string, rather than \u escaped json string.
To verify your default encoding:
print sys.getdefaultencoding()
You should get "utf-8" or "UTF-8" to verify your site.py or sitecustomize.py settings.
Please note that you could not do sys.setdefaultencoding("utf-8") at interactive python console.
Peters' python 2 workaround fails on an edge case:
d = {u'keyword': u'bad credit \xe7redit cards'}
with io.open('filename', 'w', encoding='utf8') as json_file:
data = json.dumps(d, ensure_ascii=False).decode('utf8')
try:
json_file.write(data)
except TypeError:
# Decode data to Unicode first
json_file.write(data.decode('utf8'))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 25: ordinal not in range(128)
It was crashing on the .decode('utf8') part of line 3. I fixed the problem by making the program much simpler by avoiding that step as well as the special casing of ascii:
with io.open('filename', 'w', encoding='utf8') as json_file:
data = json.dumps(d, ensure_ascii=False, encoding='utf8')
json_file.write(unicode(data))
cat filename
{"keyword": "bad credit çredit cards"}
As of Python 3.7 the following code works fine:
from json import dumps
result = {"symbol": "ƒ"}
json_string = dumps(result, sort_keys=True, indent=2, ensure_ascii=False)
print(json_string)
Output:
{"symbol": "ƒ"}
To write to a file
import codecs
import json
with codecs.open('your_file.txt', 'w', encoding='utf-8') as f:
json.dump({"message":"xin chào việt nam"}, f, ensure_ascii=False)
To print to stdout
import json
print(json.dumps({"message":"xin chào việt nam"}, ensure_ascii=False))
Here's my solution using json.dump():
def jsonWrite(p, pyobj, ensure_ascii=False, encoding=SYSTEM_ENCODING, **kwargs):
with codecs.open(p, 'wb', 'utf_8') as fileobj:
json.dump(pyobj, fileobj, ensure_ascii=ensure_ascii,encoding=encoding, **kwargs)
where SYSTEM_ENCODING is set to:
locale.setlocale(locale.LC_ALL, '')
SYSTEM_ENCODING = locale.getlocale()[1]
If you are loading JSON string from a file & file contents arabic texts. Then this will work.
{
"key1" : "لمستخدمين",
"key2" : "إضافة مستخدم"
}
with open(arabic.json, encoding='utf-8') as f:
# deserialises it
json_data = json.load(f)
f.close()
# json formatted string
json_data2 = json.dumps(json_data, ensure_ascii = False)
# If have to get the JSON index in Django Template file, then simply decode the encoded string.
json.JSONDecoder().decode(json_data2)
done! Now we can get the results as JSON index with arabic value.