I\'m dealing with an API that unfortunately is returning malformed (or \"weirdly formed,\" rather -- thanks @fjarri) JSON, but on the positive side I think it may be an opportun
Your main issue is that your object_hook
function should not be recursing. json.loads()
takes care of the recursing itself and calls your function every time it finds a dictionary (aka obj
will always be a dictionary). So instead you just want to modify the problematic keys and return the dict -- this should do what you are looking for:
def flatten_hook(obj):
for key, value in obj.iteritems():
if isinstance(value, basestring):
try:
obj[key] = json.loads(value, object_hook=flatten_hook)
except ValueError:
pass
return obj
pprint(json.loads(my_input, object_hook=flatten_hook))
However, if you know the problematic (double-encoded) entry always take on a specific form (e.g. key == 'timezone_id'
) it is probably safer to just call json.loads()
on those keys only, as Matt Anderson suggests in his answer.
So, the object_hook
in the json loader is going to be called each time the json loader is finished constructing a dictionary. That is, the first thing it is called on is the inner-most dictionary, working outwards.
The dictionary that the object_hook
callback is given is replaced by what that function returns.
So, you don't need to recurse yourself. The loader is giving you access to the inner-most things first by its nature.
I think this will work for you:
def hook(obj):
value = obj.get("timezone_id")
# this is python 3 specific; I would check isinstance against
# basestring in python 2
if value and isinstance(value, str):
obj["timezone_id"] = json.loads(value, object_hook=hook)
return obj
data = json.loads(my_input, object_hook=hook)
It seems to have the effect I think you're looking for when I test it.
I probably wouldn't try to decode every string value -- I would strategically just call it where you expect there to be a json object double encoding to exist. If you try to decode every string, you might accidentally decode something that is supposed to be a string (like the string "12345"
when that is intended to be a string returned by the API).
Also, your existing function is more complicated than it needs to be, might work as-is if you always returned obj
(whether you update its contents or not).