Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence

前端 未结 12 947
说谎
说谎 2020-11-21 23:25

sample code:

>>> import json
>>> json_string = json.dumps(\"ברי צקלה\")
>>> print json_string
\"\\u05d1\\u05e8\\u05d9 \\u05e6\\u05         


        
相关标签:
12条回答
  • 2020-11-21 23:59

    UPDATE: This is wrong answer, but it's still useful to understand why it's wrong. See comments.

    How about unicode-escape?

    >>> d = {1: "ברי צקלה", 2: u"ברי צקלה"}
    >>> json_str = json.dumps(d).decode('unicode-escape').encode('utf8')
    >>> print json_str
    {"1": "ברי צקלה", "2": "ברי צקלה"}
    
    0 讨论(0)
  • 2020-11-22 00:08

    Use codecs if possible,

    with codecs.open('file_path', 'a+', 'utf-8') as fp:
        fp.write(json.dumps(res, ensure_ascii=False))
    
    0 讨论(0)
  • 2020-11-22 00:11

    The following is my understanding var reading answer above and google.

    # coding:utf-8
    r"""
    @update: 2017-01-09 14:44:39
    @explain: str, unicode, bytes in python2to3
        #python2 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)
        #1.reload
        #importlib,sys
        #importlib.reload(sys)
        #sys.setdefaultencoding('utf-8') #python3 don't have this attribute.
        #not suggest even in python2 #see:http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script
        #2.overwrite /usr/lib/python2.7/sitecustomize.py or (sitecustomize.py and PYTHONPATH=".:$PYTHONPATH" python)
        #too complex
        #3.control by your own (best)
        #==> all string must be unicode like python3 (u'xx'|b'xx'.encode('utf-8')) (unicode 's disappeared in python3)
        #see: http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes
    
        #how to Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence
        #http://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence
    """
    
    from __future__ import print_function
    import json
    
    a = {"b": u"中文"}  # add u for python2 compatibility
    print('%r' % a)
    print('%r' % json.dumps(a))
    print('%r' % (json.dumps(a).encode('utf8')))
    a = {"b": u"中文"}
    print('%r' % json.dumps(a, ensure_ascii=False))
    print('%r' % (json.dumps(a, ensure_ascii=False).encode('utf8')))
    # print(a.encode('utf8')) #AttributeError: 'dict' object has no attribute 'encode'
    print('')
    
    # python2:bytes=str; python3:bytes
    b = a['b'].encode('utf-8')
    print('%r' % b)
    print('%r' % b.decode("utf-8"))
    print('')
    
    # python2:unicode; python3:str=unicode
    c = b.decode('utf-8')
    print('%r' % c)
    print('%r' % c.encode('utf-8'))
    """
    #python2
    {'b': u'\u4e2d\u6587'}
    '{"b": "\\u4e2d\\u6587"}'
    '{"b": "\\u4e2d\\u6587"}'
    u'{"b": "\u4e2d\u6587"}'
    '{"b": "\xe4\xb8\xad\xe6\x96\x87"}'
    
    '\xe4\xb8\xad\xe6\x96\x87'
    u'\u4e2d\u6587'
    
    u'\u4e2d\u6587'
    '\xe4\xb8\xad\xe6\x96\x87'
    
    #python3
    {'b': '中文'}
    '{"b": "\\u4e2d\\u6587"}'
    b'{"b": "\\u4e2d\\u6587"}'
    '{"b": "中文"}'
    b'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'
    
    b'\xe4\xb8\xad\xe6\x96\x87'
    '中文'
    
    '中文'
    b'\xe4\xb8\xad\xe6\x96\x87'
    """
    
    0 讨论(0)
  • 2020-11-22 00:12

    Thanks for the original answer here. With python 3 the following line of code:

    print(json.dumps(result_dict,ensure_ascii=False))
    

    was ok. Consider trying not writing too much text in the code if it's not imperative.

    This might be good enough for the python console. However, to satisfy a server you might need to set the locale as explained here (if it is on apache2) http://blog.dscpl.com.au/2014/09/setting-lang-and-lcall-when-using.html

    basically install he_IL or whatever language locale on ubuntu check it is not installed

    locale -a 
    

    install it where XX is your language

    sudo apt-get install language-pack-XX
    

    For example:

    sudo apt-get install language-pack-he
    

    add the following text to /etc/apache2/envvrs

    export LANG='he_IL.UTF-8'
    export LC_ALL='he_IL.UTF-8'
    

    Than you would hopefully not get python errors on from apache like:

    print (js) UnicodeEncodeError: 'ascii' codec can't encode characters in position 41-45: ordinal not in range(128)

    Also in apache try to make utf the default encoding as explained here:
    How to change the default encoding to UTF-8 for Apache?

    Do it early because apache errors can be pain to debug and you can mistakenly think it's from python which possibly isn't the case in that situation

    0 讨论(0)
  • 2020-11-22 00:16

    Use the ensure_ascii=False switch to json.dumps(), then encode the value to UTF-8 manually:

    >>> json_string = json.dumps("ברי צקלה", ensure_ascii=False).encode('utf8')
    >>> json_string
    b'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'
    >>> print(json_string.decode())
    "ברי צקלה"
    

    If you are writing to a file, just use json.dump() and leave it to the file object to encode:

    with open('filename', 'w', encoding='utf8') as json_file:
        json.dump("ברי צקלה", json_file, ensure_ascii=False)
    

    Caveats for Python 2

    For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use io.open() instead of open() to produce a file object that encodes Unicode values for you as you write, then use json.dump() instead to write to that file:

    with io.open('filename', 'w', encoding='utf8') as json_file:
        json.dump(u"ברי צקלה", json_file, ensure_ascii=False)
    

    Do note that there is a bug in the json module where the ensure_ascii=False flag can produce a mix of unicode and str objects. The workaround for Python 2 then is:

    with io.open('filename', 'w', encoding='utf8') as json_file:
        data = json.dumps(u"ברי צקלה", ensure_ascii=False)
        # unicode(data) auto-decodes data to unicode if str
        json_file.write(unicode(data))
    

    In Python 2, when using byte strings (type str), encoded to UTF-8, make sure to also set the encoding keyword:

    >>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }
    >>> d
    {1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}
    
    >>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')
    >>> s
    u'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'
    >>> json.loads(s)['1']
    u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
    >>> json.loads(s)['2']
    u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
    >>> print json.loads(s)['1']
    ברי צקלה
    >>> print json.loads(s)['2']
    ברי צקלה
    
    0 讨论(0)
  • 2020-11-22 00:17

    use unicode-escape to solve problem

    >>>import json
    >>>json_string = json.dumps("ברי צקלה")
    >>>json_string.encode('ascii').decode('unicode-escape')
    '"ברי צקלה"'
    

    explain

    >>>s = '漢  χαν  хан'
    >>>print('unicode: ' + s.encode('unicode-escape').decode('utf-8'))
    unicode: \u6f22  \u03c7\u03b1\u03bd  \u0445\u0430\u043d
    
    >>>u = s.encode('unicode-escape').decode('utf-8')
    >>>print('original: ' + u.encode("utf-8").decode('unicode-escape'))
    original: 漢  χαν  хан
    

    original resource:https://blog.csdn.net/chuatony/article/details/72628868

    0 讨论(0)
提交回复
热议问题