问题
I have the following data in my JSON file:
{
"first": {
"name": "James",
"age": 30
},
"second": {
"name": "Max",
"age": 30
},
"third": {
"name": "Norah",
"age": 30
},
"fourth": {
"name": "Sam",
"age": 30
}
}
I want to print the top-level key and object as follows:
import json
import ijson
fname = "data.json"
with open(fname) as f:
raw_data = f.read()
data = json.loads(raw_data)
for k in data.keys():
print k, data[k]
OUTPUT:
second {u'age': 30, u'name': u'Max'}
fourth {u'age': 30, u'name': u'Sam'}
third {u'age': 30, u'name': u'Norah'}
first {u'age': 30, u'name': u'James'}
So, far so good. However if I want to this same thing for a huge file, I would have to read it all in-memory. This very slow and requires lots of memory.
I want use an incremental JSON parser ( ijson
in this case ) to achieve what I described earlier:
The above code was taken from: No access to top level elements with ijson?
with open(fname) as f:
json_obj = ijson.items(f,'').next() # '' loads everything as only one object.
for (key, value) in json_obj.items():
print key + " -> " + str(value)
This is not suitable either, because it also reads the whole file in memory. This not truly incremental.
How can I do incremental parsing of top-level keys and corresponding objects, of a JSON file in Python?
回答1:
Since essentially json files are text files, consider stripping the top level as string. Basically, use a read file iterable approach where you concatenate a string with each line and then break out of the loop once the string contains the double braces }}
signaling the end of the top level. Of course the double brace condition must strip out spaces and line breaks.
toplevelstring = ''
with open('data.json') as f:
for line in f:
if not '}}' in toplevelstring.replace('\n', '').replace('\s+',''):
toplevelstring = toplevelstring + line
else:
break
data = json.loads(toplevelstring)
Now if your larger json is wrapped in square brackets or other braces, still run above routine but add the below line to slice out first character, [
, and last two characters for comma and line break after top level's final brace:
[{
"first": {
"name": "James",
"age": 30
},
"second": {
"name": "Max",
"age": 30
},
"third": {
"name": "Norah",
"age": 30
},
"fourth": {
"name": "Sam",
"age": 30
}
},
{
"data1": {
"id": "AAA",
"type": 55
},
"data2": {
"id": "BBB",
"type": 1601
},
"data3": {
"id": "CCC",
"type": 817
}
}]
...
toplevelstring = toplevelstring[1:-2]
data = json.loads(toplevelstring)
回答2:
Answer from github issue [file name changed]
import ijson from ijson.common import ObjectBuilder def objects(file): key = '-' for prefix, event, value in ijson.parse(file): if prefix == '' and event == 'map_key': # found new object at the root key = value # mark the key value builder = ObjectBuilder() elif prefix.startswith(key): # while at this key, build the object builder.event(event, value) if event == 'end_map': # found the end of an object at the current key, yield yield key, builder.value for key, value in objects(open('data.json', 'rb')): print(key, value)
来源:https://stackoverflow.com/questions/36766532/read-top-level-json-dictionary-incrementally-using-python-ijson