I have a very large JSON-like file, but it is not using proper JSON syntax: the object keys are not quoted. I\'d like to write a script to fix the file, so that I can load
I met this old question while looking for ways to parse sloppy JSON shorthand into python.
my input looks like this:
'{lat: 8.5, lon: -80.0}'
and, as said, it has to be sloppy with spaces, it could just as well be:
'{lat:8.5,lon:-80.0}'
I like the YAML hint, but it doesn't go well with sloppy spacing, and I do not wish to add one more dependency to my already longish list, so I tried the regex solution, and it wasn't good enough for my case.
my solution looks like this:
re.sub(r'(\w+)[ ]*(?=:)', r'"\g<1>"', input_string)
it defines one group, holding alphanumeric data, it allows for whitespace to follow, it anchors to a semicolon, it replaces the matched substring with group one, enclosed in double quotes. it leaves alone all the rest. this pattern will not be matched if the key is already quoted.
in particular:
>>> re.sub(r'(\w+)[ ]*(?=:)', r'"\g<1>"',
... '{abc : "xyz", cde : {a:"b", c: 0}, fgh : ["hfz"], 123: 123}')
'{"abc": "xyz", "cde": {"a":"b", "c": 0}, "fgh": ["hfz"], "123": 123}'
>>> re.sub(r'(\w+)[ ]*(?=:)', r'"\g<1>"', _)
'{"abc": "xyz", "cde": {"a":"b", "c": 0}, "fgh": ["hfz"], "123": 123}'
>>>