How can I convert a dict to a unicode JSON string?

后端 未结 2 899
情话喂你
情话喂你 2021-02-09 02:05

This doesn\'t appear to be possible to me using the standard library json module. When using json.dumps it will automatically escape all non-ASCII char

2条回答
  •  伪装坚强ぢ
    2021-02-09 02:56

    Requirements

    • Make sure your python files are encoded in UTF-8. Or else your non-ascii characters will become question marks, ?. Notepad++ has excellent encoding options for this.

    • Make sure that you have the appropriate fonts included. If you want to display Japanese characters then you need to install Japanese fonts.

    • Make sure that your IDE supports displaying unicode characters. Otherwise you might get an UnicodeEncodeError error thrown.

    Example:

    UnicodeEncodeError: 'charmap' codec can't encode characters in position 22-23: character maps to 
    

    PyScripter works for me. It's included with "Portable Python" at http://portablepython.com/wiki/PortablePython3.2.1.1

    • Make sure you're using Python 3+, since this version offers better unicode support.

    Problem

    json.dumps() escapes unicode characters.

    Solution

    Read the update at the bottom. Or...

    Replace each escaped characters with the parsed unicode character.

    I created a simple lambda function called getStringWithDecodedUnicode that does just that.

    import re   
    getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )
    

    Here's getStringWithDecodedUnicode as a regular function.

    def getStringWithDecodedUnicode( value ):
        findUnicodeRE = re.compile( '\\\\u([\da-f]{4})' )
        def getParsedUnicode(x):
            return chr( int( x.group(1), 16 ) )
    
        return  findUnicodeRE.sub(getParsedUnicode, str( value ) )
    

    Example

    testJSONWithUnicode.py (Using PyScripter as the IDE)

    import re
    import json
    getStringWithDecodedUnicode = lambda str : re.sub( '\\\\u([\da-f]{4})', (lambda x : chr( int( x.group(1), 16 ) )), str )
    
    data = {"Japan":"日本"}
    jsonString = json.dumps( data )
    print( "json.dumps({0}) = {1}".format( data, jsonString ) )
    jsonString = getStringWithDecodedUnicode( jsonString )
    print( "Decoded Unicode: %s" % jsonString )
    

    Output

    json.dumps({'Japan': '日本'}) = {"Japan": "\u65e5\u672c"}
    Decoded Unicode: {"Japan": "日本"}
    

    Update

    Or... just pass ensure_ascii=False as an option for json.dumps.

    Note: You need to meet the requirements that I outlined at the beginning or else this isn't going to work.

    import json
    data = {'navn': 'Åge', 'stilling': 'Lærling'}
    result = json.dumps(d, ensure_ascii=False)
    print( result ) # prints '{"stilling": "Lærling", "navn": "Åge"}'
    

提交回复
热议问题