“/usr/lib/python3.6/json/init.py”, line 296, in load return loads(fp.read(), MemoryError

问题

I have a large json file (2.4 GB). I want to parse it in python. The data looks like the following:

[
{
  "host": "a.com",
  "ip": "1.2.2.3",
  "port": 8
},
{
  "host": "b.com",
  "ip": "2.5.0.4",
  "port": 3

},
{
  "host": "c.com",
  "ip": "9.17.6.7",
  "port": 4
}
]

I run this python script parser.py to load the data for parsing::

import json
from pprint import pprint


with open('mydata.json') as f:
    data = json.load(f)

Traceback (most recent call last): File "parser.py", line xx, in data = json.load(f) File "/usr/lib/python3.6/json/init.py", line 296, in load return loads(fp.read(), MemoryError

1) Can you please advise me how to load large files for parsing without such an error?

2) Any alternative methods?

回答1:

The problem is because the file is too large to load into the program, so you must load in sections at a time.
I would recommend using ijson or json-streamer which can load in the json file iteratively instead of trying to load the whole file into memory at once.

Here's an example of using ijson:

import ijson

entry = {}  # Keeps track of values for each json item
parser = ijson.parse(open('mydata.json'))

for prefix, event, value in parser:
    # Start of item map
    if (prefix, event) == ('item', 'start_map'):
        entry = {}  # Start of a new json item
    elif prefix.endswith('.host'):
        entry['host'] = value  # Add value to entry
    elif prefix.endswith('.ip'):
        entry['ip'] = value
    elif prefix.endswith('.port'):
        entry['port'] = value
    elif (prefix, event) == ('item', 'end_map'):
        print(entry)  # Do something with complete entry object

Each prefix stores the prefix path for the current item being interated in the json. The event is used to detect the start/end of maps or arrays. And the value is used to store the value of the current object being iterated on.

来源：https://stackoverflow.com/questions/52007722/usr-lib-python3-6-json-init-py-line-296-in-load-return-loadsfp-read-mem

标签

python

arrays

json

parsing

jsonparser