Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence

前端 未结 12 943
说谎
说谎 2020-11-21 23:25

sample code:

>>> import json
>>> json_string = json.dumps(\"ברי צקלה\")
>>> print json_string
\"\\u05d1\\u05e8\\u05d9 \\u05e6\\u05         


        
12条回答
  •  悲&欢浪女
    2020-11-22 00:11

    The following is my understanding var reading answer above and google.

    # coding:utf-8
    r"""
    @update: 2017-01-09 14:44:39
    @explain: str, unicode, bytes in python2to3
        #python2 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)
        #1.reload
        #importlib,sys
        #importlib.reload(sys)
        #sys.setdefaultencoding('utf-8') #python3 don't have this attribute.
        #not suggest even in python2 #see:http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script
        #2.overwrite /usr/lib/python2.7/sitecustomize.py or (sitecustomize.py and PYTHONPATH=".:$PYTHONPATH" python)
        #too complex
        #3.control by your own (best)
        #==> all string must be unicode like python3 (u'xx'|b'xx'.encode('utf-8')) (unicode 's disappeared in python3)
        #see: http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes
    
        #how to Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence
        #http://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence
    """
    
    from __future__ import print_function
    import json
    
    a = {"b": u"中文"}  # add u for python2 compatibility
    print('%r' % a)
    print('%r' % json.dumps(a))
    print('%r' % (json.dumps(a).encode('utf8')))
    a = {"b": u"中文"}
    print('%r' % json.dumps(a, ensure_ascii=False))
    print('%r' % (json.dumps(a, ensure_ascii=False).encode('utf8')))
    # print(a.encode('utf8')) #AttributeError: 'dict' object has no attribute 'encode'
    print('')
    
    # python2:bytes=str; python3:bytes
    b = a['b'].encode('utf-8')
    print('%r' % b)
    print('%r' % b.decode("utf-8"))
    print('')
    
    # python2:unicode; python3:str=unicode
    c = b.decode('utf-8')
    print('%r' % c)
    print('%r' % c.encode('utf-8'))
    """
    #python2
    {'b': u'\u4e2d\u6587'}
    '{"b": "\\u4e2d\\u6587"}'
    '{"b": "\\u4e2d\\u6587"}'
    u'{"b": "\u4e2d\u6587"}'
    '{"b": "\xe4\xb8\xad\xe6\x96\x87"}'
    
    '\xe4\xb8\xad\xe6\x96\x87'
    u'\u4e2d\u6587'
    
    u'\u4e2d\u6587'
    '\xe4\xb8\xad\xe6\x96\x87'
    
    #python3
    {'b': '中文'}
    '{"b": "\\u4e2d\\u6587"}'
    b'{"b": "\\u4e2d\\u6587"}'
    '{"b": "中文"}'
    b'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'
    
    b'\xe4\xb8\xad\xe6\x96\x87'
    '中文'
    
    '中文'
    b'\xe4\xb8\xad\xe6\x96\x87'
    """
    

提交回复
热议问题