Python decode nested JSON in JSON

后端 未结 2 928
长发绾君心
长发绾君心 2021-02-06 18:15

I\'m dealing with an API that unfortunately is returning malformed (or \"weirdly formed,\" rather -- thanks @fjarri) JSON, but on the positive side I think it may be an opportun

相关标签:
2条回答
  • 2021-02-06 18:41

    Your main issue is that your object_hook function should not be recursing. json.loads() takes care of the recursing itself and calls your function every time it finds a dictionary (aka obj will always be a dictionary). So instead you just want to modify the problematic keys and return the dict -- this should do what you are looking for:

    def flatten_hook(obj):
        for key, value in obj.iteritems():
            if isinstance(value, basestring):
                try:
                    obj[key] = json.loads(value, object_hook=flatten_hook)
                except ValueError:
                    pass
        return obj
    
    pprint(json.loads(my_input, object_hook=flatten_hook))
    

    However, if you know the problematic (double-encoded) entry always take on a specific form (e.g. key == 'timezone_id') it is probably safer to just call json.loads() on those keys only, as Matt Anderson suggests in his answer.

    0 讨论(0)
  • 2021-02-06 18:54

    So, the object_hook in the json loader is going to be called each time the json loader is finished constructing a dictionary. That is, the first thing it is called on is the inner-most dictionary, working outwards.

    The dictionary that the object_hook callback is given is replaced by what that function returns.

    So, you don't need to recurse yourself. The loader is giving you access to the inner-most things first by its nature.

    I think this will work for you:

    def hook(obj):
        value = obj.get("timezone_id")
        # this is python 3 specific; I would check isinstance against 
        # basestring in python 2
        if value and isinstance(value, str):
            obj["timezone_id"] = json.loads(value, object_hook=hook)
        return obj
    data = json.loads(my_input, object_hook=hook)
    

    It seems to have the effect I think you're looking for when I test it.

    I probably wouldn't try to decode every string value -- I would strategically just call it where you expect there to be a json object double encoding to exist. If you try to decode every string, you might accidentally decode something that is supposed to be a string (like the string "12345" when that is intended to be a string returned by the API).

    Also, your existing function is more complicated than it needs to be, might work as-is if you always returned obj (whether you update its contents or not).

    0 讨论(0)
提交回复
热议问题