How to get string objects instead of Unicode from JSON?

前端 未结 21 842
伪装坚强ぢ
伪装坚强ぢ 2020-11-22 14:43

I\'m using Python 2 to parse JSON from ASCII encoded text files.

When loading these files with either json or simplejson, all my

相关标签:
21条回答
  • 2020-11-22 15:18

    There exists an easy work-around.

    TL;DR - Use ast.literal_eval() instead of json.loads(). Both ast and json are in the standard library.

    While not a 'perfect' answer, it gets one pretty far if your plan is to ignore Unicode altogether. In Python 2.7

    import json, ast
    d = { 'field' : 'value' }
    print "JSON Fail: ", json.loads(json.dumps(d))
    print "AST Win:", ast.literal_eval(json.dumps(d))
    

    gives:

    JSON Fail:  {u'field': u'value'}
    AST Win: {'field': 'value'}
    

    This gets more hairy when some objects are really Unicode strings. The full answer gets hairy quickly.

    0 讨论(0)
  • 2020-11-22 15:18

    Just use pickle instead of json for dump and load, like so:

        import json
        import pickle
    
        d = { 'field1': 'value1', 'field2': 2, }
    
        json.dump(d,open("testjson.txt","w"))
    
        print json.load(open("testjson.txt","r"))
    
        pickle.dump(d,open("testpickle.txt","w"))
    
        print pickle.load(open("testpickle.txt","r"))
    

    The output it produces is (strings and integers are handled correctly):

        {u'field2': 2, u'field1': u'value1'}
        {'field2': 2, 'field1': 'value1'}
    
    0 讨论(0)
  • 2020-11-22 15:21

    I ran into this problem too, and having to deal with JSON, I came up with a small loop that converts the unicode keys to strings. (simplejson on GAE does not return string keys.)

    obj is the object decoded from JSON:

    if NAME_CLASS_MAP.has_key(cls):
        kwargs = {}
        for i in obj.keys():
            kwargs[str(i)] = obj[i]
        o = NAME_CLASS_MAP[cls](**kwargs)
        o.save()
    

    kwargs is what I pass to the constructor of the GAE application (which does not like unicode keys in **kwargs)

    Not as robust as the solution from Wells, but much smaller.

    0 讨论(0)
  • 2020-11-22 15:24

    I rewrote Wells's _parse_json() to handle cases where the json object itself is an array (my use case).

    def _parseJSON(self, obj):
        if isinstance(obj, dict):
            newobj = {}
            for key, value in obj.iteritems():
                key = str(key)
                newobj[key] = self._parseJSON(value)
        elif isinstance(obj, list):
            newobj = []
            for value in obj:
                newobj.append(self._parseJSON(value))
        elif isinstance(obj, unicode):
            newobj = str(obj)
        else:
            newobj = obj
        return newobj
    
    0 讨论(0)
  • 2020-11-22 15:25

    Mike Brennan's answer is close, but there is no reason to re-traverse the entire structure. If you use the object_hook_pairs (Python 2.7+) parameter:

    object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict will remember the order of insertion). If object_hook is also defined, the object_pairs_hook takes priority.

    With it, you get each JSON object handed to you, so you can do the decoding with no need for recursion:

    def deunicodify_hook(pairs):
        new_pairs = []
        for key, value in pairs:
            if isinstance(value, unicode):
                value = value.encode('utf-8')
            if isinstance(key, unicode):
                key = key.encode('utf-8')
            new_pairs.append((key, value))
        return dict(new_pairs)
    
    In [52]: open('test.json').read()
    Out[52]: '{"1": "hello", "abc": [1, 2, 3], "def": {"hi": "mom"}, "boo": [1, "hi", "moo", {"5": "some"}]}'                                        
    
    In [53]: json.load(open('test.json'))
    Out[53]: 
    {u'1': u'hello',
     u'abc': [1, 2, 3],
     u'boo': [1, u'hi', u'moo', {u'5': u'some'}],
     u'def': {u'hi': u'mom'}}
    
    In [54]: json.load(open('test.json'), object_pairs_hook=deunicodify_hook)
    Out[54]: 
    {'1': 'hello',
     'abc': [1, 2, 3],
     'boo': [1, 'hi', 'moo', {'5': 'some'}],
     'def': {'hi': 'mom'}}
    

    Notice that I never have to call the hook recursively since every object will get handed to the hook when you use the object_pairs_hook. You do have to care about lists, but as you can see, an object within a list will be properly converted, and you don't have to recurse to make it happen.

    EDIT: A coworker pointed out that Python2.6 doesn't have object_hook_pairs. You can still use this will Python2.6 by making a very small change. In the hook above, change:

    for key, value in pairs:
    

    to

    for key, value in pairs.iteritems():
    

    Then use object_hook instead of object_pairs_hook:

    In [66]: json.load(open('test.json'), object_hook=deunicodify_hook)
    Out[66]: 
    {'1': 'hello',
     'abc': [1, 2, 3],
     'boo': [1, 'hi', 'moo', {'5': 'some'}],
     'def': {'hi': 'mom'}}
    

    Using object_pairs_hook results in one less dictionary being instantiated for each object in the JSON object, which, if you were parsing a huge document, might be worth while.

    0 讨论(0)
  • 2020-11-22 15:28

    While there are some good answers here, I ended up using PyYAML to parse my JSON files, since it gives the keys and values as str type strings instead of unicode type. Because JSON is a subset of YAML it works nicely:

    >>> import json
    >>> import yaml
    >>> list_org = ['a', 'b']
    >>> list_dump = json.dumps(list_org)
    >>> list_dump
    '["a", "b"]'
    >>> json.loads(list_dump)
    [u'a', u'b']
    >>> yaml.safe_load(list_dump)
    ['a', 'b']
    

    Notes

    Some things to note though:

    • I get string objects because all my entries are ASCII encoded. If I would use unicode encoded entries, I would get them back as unicode objects — there is no conversion!

    • You should (probably always) use PyYAML's safe_load function; if you use it to load JSON files, you don't need the "additional power" of the load function anyway.

    • If you want a YAML parser that has more support for the 1.2 version of the spec (and correctly parses very low numbers) try Ruamel YAML: pip install ruamel.yaml and import ruamel.yaml as yaml was all I needed in my tests.

    Conversion

    As stated, there is no conversion! If you can't be sure to only deal with ASCII values (and you can't be sure most of the time), better use a conversion function:

    I used the one from Mark Amery a couple of times now, it works great and is very easy to use. You can also use a similar function as an object_hook instead, as it might gain you a performance boost on big files. See the slightly more involved answer from Mirec Miskuf for that.

    0 讨论(0)
提交回复
热议问题