Fix unquoted keys in JSON-like file so that it uses correct JSON syntax

前端 未结 3 1562
野趣味
野趣味 2021-01-17 02:41

I have a very large JSON-like file, but it is not using proper JSON syntax: the object keys are not quoted. I\'d like to write a script to fix the file, so that I can load

相关标签:
3条回答
  • 2021-01-17 02:59

    Rather than a potentially fragile regex solution, you can take advantage of the fact that while your log file isn't valid JSON, it is valid YAML. Using the PyYAML library, you can load it into a Python data structure and then write it back out as valid JSON:

    import json
    import yaml
    
    with open("original.log") as f:
        data = yaml.load(f)
    
    with open("jsonified.log", "w") as f:
        json.dump(data, f)
    
    0 讨论(0)
  • 2021-01-17 03:09

    I met this old question while looking for ways to parse sloppy JSON shorthand into python.

    my input looks like this:

    '{lat: 8.5, lon: -80.0}'
    

    and, as said, it has to be sloppy with spaces, it could just as well be:

    '{lat:8.5,lon:-80.0}'
    

    I like the YAML hint, but it doesn't go well with sloppy spacing, and I do not wish to add one more dependency to my already longish list, so I tried the regex solution, and it wasn't good enough for my case.

    my solution looks like this:

    re.sub(r'(\w+)[ ]*(?=:)', r'"\g<1>"', input_string)
    

    it defines one group, holding alphanumeric data, it allows for whitespace to follow, it anchors to a semicolon, it replaces the matched substring with group one, enclosed in double quotes. it leaves alone all the rest. this pattern will not be matched if the key is already quoted.

    in particular:

    >>> re.sub(r'(\w+)[ ]*(?=:)', r'"\g<1>"', 
    ... '{abc : "xyz", cde : {a:"b", c: 0}, fgh : ["hfz"], 123: 123}')
    '{"abc": "xyz", "cde": {"a":"b", "c": 0}, "fgh": ["hfz"], "123": 123}'
    >>> re.sub(r'(\w+)[ ]*(?=:)', r'"\g<1>"', _)
    '{"abc": "xyz", "cde": {"a":"b", "c": 0}, "fgh": ["hfz"], "123": 123}'
    >>> 
    
    0 讨论(0)
  • 2021-01-17 03:14

    I suggest matching whole words that are not enclosed into double quotation marks and adding quotation marks around them:

    import re
    p = re.compile(r'(?<!")\b\w+\b(?!")')
    test_str = "{abc : \"xyz\", cde : {}, fgh : [\"hfz\"]}"
    print re.sub(p, r'"\g<0>"', test_str)
    

    See IDEONE demo, output:

    {"abc" : "xyz", "cde" : {}, "fgh" : ["hfz"]}
    
    0 讨论(0)
提交回复
热议问题