Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence

前端 未结 12 960
说谎
说谎 2020-11-21 23:25

sample code:

>>> import json
>>> json_string = json.dumps(\"ברי צקלה\")
>>> print json_string
\"\\u05d1\\u05e8\\u05d9 \\u05e6\\u05         


        
12条回答
  •  梦谈多话
    2020-11-22 00:16

    Use the ensure_ascii=False switch to json.dumps(), then encode the value to UTF-8 manually:

    >>> json_string = json.dumps("ברי צקלה", ensure_ascii=False).encode('utf8')
    >>> json_string
    b'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'
    >>> print(json_string.decode())
    "ברי צקלה"
    

    If you are writing to a file, just use json.dump() and leave it to the file object to encode:

    with open('filename', 'w', encoding='utf8') as json_file:
        json.dump("ברי צקלה", json_file, ensure_ascii=False)
    

    Caveats for Python 2

    For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use io.open() instead of open() to produce a file object that encodes Unicode values for you as you write, then use json.dump() instead to write to that file:

    with io.open('filename', 'w', encoding='utf8') as json_file:
        json.dump(u"ברי צקלה", json_file, ensure_ascii=False)
    

    Do note that there is a bug in the json module where the ensure_ascii=False flag can produce a mix of unicode and str objects. The workaround for Python 2 then is:

    with io.open('filename', 'w', encoding='utf8') as json_file:
        data = json.dumps(u"ברי צקלה", ensure_ascii=False)
        # unicode(data) auto-decodes data to unicode if str
        json_file.write(unicode(data))
    

    In Python 2, when using byte strings (type str), encoded to UTF-8, make sure to also set the encoding keyword:

    >>> d={ 1: "ברי צקלה", 2: u"ברי צקלה" }
    >>> d
    {1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}
    
    >>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')
    >>> s
    u'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'
    >>> json.loads(s)['1']
    u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
    >>> json.loads(s)['2']
    u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
    >>> print json.loads(s)['1']
    ברי צקלה
    >>> print json.loads(s)['2']
    ברי צקלה
    

提交回复
热议问题