Writing to JSON file, then reading this same file and getting “JSONDecodeError: Extra data”

后端 未结 2 1783
暗喜
暗喜 2021-01-29 02:16

I have a very large json file (9GB). I\'m reading in one object from it at a time, and then deleting key-value pairs in this object when the key is not in the list fields<

2条回答
  •  情歌与酒
    2021-01-29 03:01

    Here's code that seems to work with your sample input. As I said in a comment the file you are dealing with is in something called JSON Lines format rather than JSON format.

    Since you appear to want the cleaned version in that same format (in other words, not converted to standard JSON format, as I thought a one point), here's how to do that:

    import json
    
    path_to_file = "sample_input.json"
    cleaned_file = "cleaned.json"
    
    # Fields to keep.
    fields = ["skills", "industry", "summary", "education", "experience"]
    
    # Clean profiles in JSON Lines format file.
    with open(path_to_file, encoding='UTF8') as inf, \
         open(cleaned_file, 'w', encoding='UTF8') as outf:
    
        for line in inf:
            profile = json.loads(line)  # Read a profile object.
            for key in list(profile.keys()):  # Remove unwanted fields it.
                if key not in fields:
                    del profile[key]
            outf.write(json.dumps(profile) + '\n') # Write cleaned profile to new file
    
    # Test whether it worked.
    with open(cleaned_file, encoding='UTF8') as cleaned:
        for line in cleaned:
            profile = json.loads(line)
            print(json.dumps(profile, indent=4))
    

提交回复
热议问题