How to change json encoding behaviour for serializable python object?

前端 未结 13 1172
无人及你
无人及你 2020-12-02 09:47

It is easy to change the format of an object which is not JSON serializable eg datetime.datetime.

My requirement, for debugging purposes, is to alter the way some cu

相关标签:
13条回答
  • 2020-12-02 10:15

    As others have pointed out already, the default handler only gets called for values that aren't one of the recognised types. My suggested solution to this problem is to preprocess the object you want to serialize, recursing over lists, tuples and dictionaries, but wrapping every other value in a custom class.

    Something like this:

    def debug(obj):
        class Debug:
            def __init__(self,obj):
                self.originalObject = obj
        if obj.__class__ == list:
            return [debug(item) for item in obj]
        elif obj.__class__ == tuple:
            return (debug(item) for item in obj)
        elif obj.__class__ == dict:
            return dict((key,debug(obj[key])) for key in obj)
        else:
            return Debug(obj)
    

    You would call this function, before passing your object to json.dumps, like this:

    test_json = debug(test_json)
    print(json.dumps(test_json,default=json_debug_handler))
    

    Note that this code is checking for objects whose class exactly matches a list, tuple or dictionary, so any custom objects that are extended from those types will be wrapped rather than parsed. As a result, the regular lists, tuples, and dictionaries will be serialized as usual, but all other values will be passed on to the default handler.

    The end result of all this, is that every value that reaches the the default handler is guaranteed to be wrapped in one of these Debug classes. So the first thing you are going to want to do is extract the original object, like this:

    obj = obj.originalObject
    

    You can then check the original object's type and handle whichever types need special processing. For everything else, you should just return the original object (so the last return from the handler should be return obj not return None).

    def json_debug_handler(obj):
        obj = obj.originalObject      # Add this line
        print("object received:")
        print type(obj)
        print("\n\n")
        if  isinstance(obj, datetime.datetime):
            return obj.isoformat()
        elif isinstance(obj,mDict):
            return {'orig':obj, 'attrs': vars(obj)}
        elif isinstance(obj,mList):
            return {'orig':obj, 'attrs': vars(obj)}
        else:
            return obj                # Change this line
    

    Note that this code doesn't check for values that aren't serializable. These will fall through the final return obj, then will be rejected by the serializer and passed back to the default handler again - only this time without the Debug wrapper.

    If you need to deal with that scenario, you could add a check at the top of the handler like this:

    if not hasattr(obj, 'originalObject'):
        return None
    

    Ideone demo: http://ideone.com/tOloNq

    0 讨论(0)
  • 2020-12-02 10:17

    It seems that to achieve the behavior you want, with the given restrictions, you'll have to delve into the JSONEncoder class a little. Below I've written out a custom JSONEncoder that overrides the iterencode method to pass a custom isinstance method to _make_iterencode. It isn't the cleanest thing in the world, but seems to be the best given the options and it keeps customization to a minimum.

    # customencoder.py
    from json.encoder import (_make_iterencode, JSONEncoder,
                              encode_basestring_ascii, FLOAT_REPR, INFINITY,
                              c_make_encoder, encode_basestring)
    
    
    class CustomObjectEncoder(JSONEncoder):
    
        def iterencode(self, o, _one_shot=False):
            """
            Most of the original method has been left untouched.
    
            _one_shot is forced to False to prevent c_make_encoder from
            being used. c_make_encoder is a funcion defined in C, so it's easier
            to avoid using it than overriding/redefining it.
    
            The keyword argument isinstance for _make_iterencode has been set
            to self.isinstance. This allows for a custom isinstance function
            to be defined, which can be used to defer the serialization of custom
            objects to the default method.
            """
            # Force the use of _make_iterencode instead of c_make_encoder
            _one_shot = False
    
            if self.check_circular:
                markers = {}
            else:
                markers = None
            if self.ensure_ascii:
                _encoder = encode_basestring_ascii
            else:
                _encoder = encode_basestring
            if self.encoding != 'utf-8':
                def _encoder(o, _orig_encoder=_encoder, _encoding=self.encoding):
                    if isinstance(o, str):
                        o = o.decode(_encoding)
                    return _orig_encoder(o)
    
            def floatstr(o, allow_nan=self.allow_nan,
                         _repr=FLOAT_REPR, _inf=INFINITY, _neginf=-INFINITY):
                if o != o:
                    text = 'NaN'
                elif o == _inf:
                    text = 'Infinity'
                elif o == _neginf:
                    text = '-Infinity'
                else:
                    return _repr(o)
    
                if not allow_nan:
                    raise ValueError(
                        "Out of range float values are not JSON compliant: " +
                        repr(o))
    
                return text
    
            # Instead of forcing _one_shot to False, you can also just
            # remove the first part of this conditional statement and only
            # call _make_iterencode
            if (_one_shot and c_make_encoder is not None
                    and self.indent is None and not self.sort_keys):
                _iterencode = c_make_encoder(
                    markers, self.default, _encoder, self.indent,
                    self.key_separator, self.item_separator, self.sort_keys,
                    self.skipkeys, self.allow_nan)
            else:
                _iterencode = _make_iterencode(
                    markers, self.default, _encoder, self.indent, floatstr,
                    self.key_separator, self.item_separator, self.sort_keys,
                    self.skipkeys, _one_shot, isinstance=self.isinstance)
            return _iterencode(o, 0)
    

    You can now subclass the CustomObjectEncoder so it correctly serializes your custom objects. The CustomObjectEncoder can also do cool stuff like handle nested objects.

    # test.py
    import json
    import datetime
    from customencoder import CustomObjectEncoder
    
    
    class MyEncoder(CustomObjectEncoder):
    
        def isinstance(self, obj, cls):
            if isinstance(obj, (mList, mDict)):
                return False
            return isinstance(obj, cls)
    
        def default(self, obj):
            """
            Defines custom serialization.
    
            To avoid circular references, any object that will always fail
            self.isinstance must be converted to something that is
            deserializable here.
            """
            if isinstance(obj, datetime.datetime):
                return obj.isoformat()
            elif isinstance(obj, mDict):
                return {"orig": dict(obj), "attrs": vars(obj)}
            elif isinstance(obj, mList):
                return {"orig": list(obj), "attrs": vars(obj)}
            else:
                return None
    
    
    class mList(list):
        pass
    
    
    class mDict(dict):
        pass
    
    
    def main():
        zelda = mList(['zelda'])
        zelda.src = "oldschool"
        games = mList(['mario', 'contra', 'tetris', zelda])
        games.src = 'console'
        scores = mDict({'dp': 10, 'pk': 45})
        scores.processed = "unprocessed"
        test_json = {'games': games, 'scores': scores,
                     'date': datetime.datetime.now()}
        print(json.dumps(test_json, cls=MyEncoder))
    
    if __name__ == '__main__':
        main()
    
    0 讨论(0)
  • 2020-12-02 10:18

    Try the below. It produces the output you want and looks relatively simple. The only real difference from your encoder class is that we should override both decode and encode methods (since the latter is still called for types the encoder knows how to handle).

    import json
    import datetime
    
    class JSONDebugEncoder(json.JSONEncoder):
        # transform objects known to JSONEncoder here
        def encode(self, o, *args, **kw):
            for_json = o
            if isinstance(o, mDict):
                for_json = { 'orig' : o, 'attrs' : vars(o) }
            elif isinstance(o, mList):
                for_json = { 'orig' : o, 'attrs' : vars(o) }
            return super(JSONDebugEncoder, self).encode(for_json, *args, **kw)
    
        # handle objects not known to JSONEncoder here
        def default(self, o, *args, **kw):
            if isinstance(o, datetime.datetime):
                return o.isoformat()
            else:
                return super(JSONDebugEncoder, self).default(o, *args, **kw)
    
    
    class mDict(dict):
        pass
    
    class mList(list):
        pass
    
    def test_debug_json():
        games = mList(['mario','contra','tetris'])
        games.src = 'console'
        scores = mDict({'dp':10,'pk':45})
        scores.processed = "unprocessed"
        test_json = { 'games' : games , 'scores' : scores , 'date': datetime.datetime.now() }
        print(json.dumps(test_json,cls=JSONDebugEncoder))
    
    if __name__ == '__main__':
        test_debug_json()
    
    0 讨论(0)
  • 2020-12-02 10:20

    If you are able to change the way json.dumps is called. You can do all the processing required before the JSON encoder gets his hands on it. This version does not use any kind of copying and will edit the structures in-place. You can add copy() if required.

    import datetime
    import json
    import collections
    
    
    def json_debug_handler(obj):
        print("object received:")
        print type(obj)
        print("\n\n")
        if isinstance(obj, collections.Mapping):
            for key, value in obj.iteritems():
                if isinstance(value, (collections.Mapping, collections.MutableSequence)):
                    value = json_debug_handler(value)
    
                obj[key] = convert(value)
        elif isinstance(obj, collections.MutableSequence):
            for index, value in enumerate(obj):
                if isinstance(value, (collections.Mapping, collections.MutableSequence)):
                    value = json_debug_handler(value)
    
                obj[index] = convert(value)
        return obj
    
    def convert(obj):
        if  isinstance(obj, datetime.datetime):
            return obj.isoformat()
        elif isinstance(obj,mDict):
            return {'orig':obj , 'attrs': vars(obj)}
        elif isinstance(obj,mList):
            return {'orig':obj, 'attrs': vars(obj)}
        else:
            return obj
    
    
    class mDict(dict):
        pass
    
    
    class mList(list):
        pass
    
    
    def test_debug_json():
        games = mList(['mario','contra','tetris'])
        games.src = 'console'
        scores = mDict({'dp':10,'pk':45})
        scores.processed = "qunprocessed"
        test_json = { 'games' : games , 'scores' : scores , 'date': datetime.datetime.now() }
        print(json.dumps(json_debug_handler(test_json)))
    
    if __name__ == '__main__':
        test_debug_json()
    

    You call json_debug_handler on the object you are serializing before passing it to the json.dumps. With this pattern you could also easily reverse the changes and/or add extra conversion rules.

    edit:

    If you can't change how json.dumps is called, you can always monkeypatch it to do what you want. Such as doing this:

    json.dumps = lambda obj, *args, **kwargs: json.dumps(json_debug_handler(obj), *args, **kwargs)
    
    0 讨论(0)
  • 2020-12-02 10:26

    Can we just preprocess the test_json,to make it suitable for your requirement? It's easier to manipulate a python dict than write a useless Encode.

    import datetime
    import json
    class mDict(dict):
        pass
    
    class mList(list):
        pass
    
    def prepare(obj):
        if  isinstance(obj, datetime.datetime):
            return obj.isoformat()
        elif isinstance(obj, mDict):
            return {'orig':obj , 'attrs': vars(obj)}
        elif isinstance(obj, mList):
            return {'orig':obj, 'attrs': vars(obj)}
        else:
            return obj
    def preprocessor(toJson):
        ret ={}
        for key, value in toJson.items():
            ret[key] = prepare(value)
        return ret
    if __name__ == '__main__':
        def test_debug_json():
            games = mList(['mario','contra','tetris'])
            games.src = 'console'
            scores = mDict({'dp':10,'pk':45})
            scores.processed = "unprocessed"
            test_json = { 'games' : games, 'scores' : scores , 'date': datetime.datetime.now() }
            print(json.dumps(preprocessor(test_json)))
        test_debug_json()
    
    0 讨论(0)
  • 2020-12-02 10:32

    The default function is only called when the node being dumped isn't natively serializable, and your mDict classes serialize as-is. Here's a little demo that shows when default is called and when not:

    import json
    
    def serializer(obj):
        print 'serializer called'
        return str(obj)
    
    class mDict(dict):
        pass
    
    class mSet(set):
        pass
    
    d = mDict(dict(a=1))
    print json.dumps(d, default=serializer)
    
    s = mSet({1, 2, 3,})
    print json.dumps(s, default=serializer)
    

    And the output:

    {"a": 1}
    serializer called
    "mSet([1, 2, 3])"
    

    Note that sets are not natively serializable, but dicts are.

    Since your m___ classes are serializable, your handler is never called.

    Update #1 -----

    You could change JSON encoder code. The details of how to do this depend on which JSON implementation you're using. For example in simplejson, the relevant code is this, in encode.py:

    def _iterencode(o, _current_indent_level):
        ...
            for_json = _for_json and getattr(o, 'for_json', None)
            if for_json and callable(for_json):
                ...
            elif isinstance(o, list):
                ...
            else:
                _asdict = _namedtuple_as_object and getattr(o, '_asdict', None)
                if _asdict and callable(_asdict):
                    for chunk in _iterencode_dict(_asdict(),
                            _current_indent_level):
                        yield chunk
                elif (_tuple_as_array and isinstance(o, tuple)):
                    ...
                elif isinstance(o, dict):
                    ...
                elif _use_decimal and isinstance(o, Decimal):
                    ...
                else:
                    ...
                    o = _default(o)
                    for chunk in _iterencode(o, _current_indent_level):
                        yield chunk
                    ...
    

    In other words, there is a hard-wired behavior that calls default only when the node being encoded isn't one of the recognized base types. You could override this in one of several ways:

    1 -- subclass JSONEncoder as you've done above, but add a parameter to its initializer that specifies the function to be used in place of the standard _make_iterencode, in which you add a test that would call default for classes that meet your criteria. This is a clean approach since you aren't changing the JSON module, but you would be reiterating a lot of code from the original _make_iterencode. (Other variations on this approach include monkeypatching _make_iterencode or its sub-function _iterencode_dict).

    2 -- alter the JSON module source, and use the __debug__ constant to change behavior:

    def _iterencode(o, _current_indent_level):
        ...
            for_json = _for_json and getattr(o, 'for_json', None)
            if for_json and callable(for_json):
                ...
            elif isinstance(o, list):
                ...
            ## added code below
            elif __debug__:
                o = _default(o)
                for chunk in _iterencode(o, _current_indent_level):
                    yield chunk
            ## added code above
            else:
                ...
    

    Ideally the JSONEncoder class would provide a parameter to specify "use default for all types", but it doesn't. The above is a simple one-time change that does what you're looking for.

    0 讨论(0)
提交回复
热议问题