问题
I have a large json file (2.4 GB). I want to parse it in python. The data looks like the following:
[
{
"host": "a.com",
"ip": "1.2.2.3",
"port": 8
},
{
"host": "b.com",
"ip": "2.5.0.4",
"port": 3
},
{
"host": "c.com",
"ip": "9.17.6.7",
"port": 4
}
]
I run this python script parser.py
to load the data for parsing::
import json
from pprint import pprint
with open('mydata.json') as f:
data = json.load(f)
Traceback (most recent call last): File "parser.py", line xx, in data = json.load(f) File "/usr/lib/python3.6/json/init.py", line 296, in load return loads(fp.read(), MemoryError
1) Can you please advise me how to load large files for parsing without such an error?
2) Any alternative methods?
回答1:
The problem is because the file is too large to load into the program, so you must load in sections at a time.
I would recommend using ijson or json-streamer which can load in the json file iteratively instead of trying to load the whole file into memory at once.
Here's an example of using ijson:
import ijson
entry = {} # Keeps track of values for each json item
parser = ijson.parse(open('mydata.json'))
for prefix, event, value in parser:
# Start of item map
if (prefix, event) == ('item', 'start_map'):
entry = {} # Start of a new json item
elif prefix.endswith('.host'):
entry['host'] = value # Add value to entry
elif prefix.endswith('.ip'):
entry['ip'] = value
elif prefix.endswith('.port'):
entry['port'] = value
elif (prefix, event) == ('item', 'end_map'):
print(entry) # Do something with complete entry object
Each prefix
stores the prefix path for the current item being interated in the json. The event
is used to detect the start/end of maps or arrays. And the value
is used to store the value of the current object being iterated on.
来源:https://stackoverflow.com/questions/52007722/usr-lib-python3-6-json-init-py-line-296-in-load-return-loadsfp-read-mem