sample code:
>>> import json
>>> json_string = json.dumps(\"ברי צקלה\")
>>> print json_string
\"\\u05d1\\u05e8\\u05d9 \\u05e6\\u05
UPDATE: This is wrong answer, but it's still useful to understand why it's wrong. See comments.
How about unicode-escape
?
>>> d = {1: "ברי צקלה", 2: u"ברי צקלה"}
>>> json_str = json.dumps(d).decode('unicode-escape').encode('utf8')
>>> print json_str
{"1": "ברי צקלה", "2": "ברי צקלה"}
Use codecs if possible,
with codecs.open('file_path', 'a+', 'utf-8') as fp:
fp.write(json.dumps(res, ensure_ascii=False))
The following is my understanding var reading answer above and google.
# coding:utf-8
r"""
@update: 2017-01-09 14:44:39
@explain: str, unicode, bytes in python2to3
#python2 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)
#1.reload
#importlib,sys
#importlib.reload(sys)
#sys.setdefaultencoding('utf-8') #python3 don't have this attribute.
#not suggest even in python2 #see:http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script
#2.overwrite /usr/lib/python2.7/sitecustomize.py or (sitecustomize.py and PYTHONPATH=".:$PYTHONPATH" python)
#too complex
#3.control by your own (best)
#==> all string must be unicode like python3 (u'xx'|b'xx'.encode('utf-8')) (unicode 's disappeared in python3)
#see: http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes
#how to Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence
#http://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence
"""
from __future__ import print_function
import json
a = {"b": u"中文"} # add u for python2 compatibility
print('%r' % a)
print('%r' % json.dumps(a))
print('%r' % (json.dumps(a).encode('utf8')))
a = {"b": u"中文"}
print('%r' % json.dumps(a, ensure_ascii=False))
print('%r' % (json.dumps(a, ensure_ascii=False).encode('utf8')))
# print(a.encode('utf8')) #AttributeError: 'dict' object has no attribute 'encode'
print('')
# python2:bytes=str; python3:bytes
b = a['b'].encode('utf-8')
print('%r' % b)
print('%r' % b.decode("utf-8"))
print('')
# python2:unicode; python3:str=unicode
c = b.decode('utf-8')
print('%r' % c)
print('%r' % c.encode('utf-8'))
"""
#python2
{'b': u'\u4e2d\u6587'}
'{"b": "\\u4e2d\\u6587"}'
'{"b": "\\u4e2d\\u6587"}'
u'{"b": "\u4e2d\u6587"}'
'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'
'\xe4\xb8\xad\xe6\x96\x87'
u'\u4e2d\u6587'
u'\u4e2d\u6587'
'\xe4\xb8\xad\xe6\x96\x87'
#python3
{'b': '中文'}
'{"b": "\\u4e2d\\u6587"}'
b'{"b": "\\u4e2d\\u6587"}'
'{"b": "中文"}'
b'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'
b'\xe4\xb8\xad\xe6\x96\x87'
'中文'
'中文'
b'\xe4\xb8\xad\xe6\x96\x87'
"""
Thanks for the original answer here. With python 3 the following line of code:
print(json.dumps(result_dict,ensure_ascii=False))
was ok. Consider trying not writing too much text in the code if it's not imperative.
This might be good enough for the python console. However, to satisfy a server you might need to set the locale as explained here (if it is on apache2) http://blog.dscpl.com.au/2014/09/setting-lang-and-lcall-when-using.html
basically install he_IL or whatever language locale on ubuntu check it is not installed
locale -a
install it where XX is your language
sudo apt-get install language-pack-XX
For example:
sudo apt-get install language-pack-he
add the following text to /etc/apache2/envvrs
export LANG='he_IL.UTF-8'
export LC_ALL='he_IL.UTF-8'
Than you would hopefully not get python errors on from apache like:
print (js) UnicodeEncodeError: 'ascii' codec can't encode characters in position 41-45: ordinal not in range(128)
Also in apache try to make utf the default encoding as explained here:
How to change the default encoding to UTF-8 for Apache?
Do it early because apache errors can be pain to debug and you can mistakenly think it's from python which possibly isn't the case in that situation
Use the ensure_ascii=False
switch to json.dumps()
, then encode the value to UTF-8 manually:
>>> json_string = json.dumps("ברי צקלה", ensure_ascii=False).encode('utf8')
>>> json_string
b'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'
>>> print(json_string.decode())
"ברי צקלה"
If you are writing to a file, just use json.dump()
and leave it to the file object to encode:
with open('filename', 'w', encoding='utf8') as json_file:
json.dump("ברי צקלה", json_file, ensure_ascii=False)
Caveats for Python 2
For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use io.open() instead of open()
to produce a file object that encodes Unicode values for you as you write, then use json.dump()
instead to write to that file:
with io.open('filename', 'w', encoding='utf8') as json_file:
json.dump(u"ברי צקלה", json_file, ensure_ascii=False)
Do note that there is a bug in the json module where the ensure_ascii=False
flag can produce a mix of unicode
and str
objects. The workaround for Python 2 then is:
with io.open('filename', 'w', encoding='utf8') as json_file:
data = json.dumps(u"ברי צקלה", ensure_ascii=False)
# unicode(data) auto-decodes data to unicode if str
json_file.write(unicode(data))
In Python 2, when using byte strings (type str
), encoded to UTF-8, make sure to also set the encoding
keyword:
>>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }
>>> d
{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}
>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')
>>> s
u'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'
>>> json.loads(s)['1']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> json.loads(s)['2']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> print json.loads(s)['1']
ברי צקלה
>>> print json.loads(s)['2']
ברי צקלה
>>>import json
>>>json_string = json.dumps("ברי צקלה")
>>>json_string.encode('ascii').decode('unicode-escape')
'"ברי צקלה"'
>>>s = '漢 χαν хан'
>>>print('unicode: ' + s.encode('unicode-escape').decode('utf-8'))
unicode: \u6f22 \u03c7\u03b1\u03bd \u0445\u0430\u043d
>>>u = s.encode('unicode-escape').decode('utf-8')
>>>print('original: ' + u.encode("utf-8").decode('unicode-escape'))
original: 漢 χαν хан
original resource:https://blog.csdn.net/chuatony/article/details/72628868