I exported some data from my database in the form of JSON, which is essentially just one [list] with a bunch (900K) of {objects} inside it.
Trying to import it on my
I know this is question is from a while back, but I think this new solution is hassle-free.
You can use pandas 0.21.0
which supports a chunksize parameter as part of read_json
. You can load one chunk at a time and save the json:
import pandas as pd
chunks = pd.read_json('file.json', lines=True, chunksize = 20)
for i, c in enumerate(chunks):
c.to_json('chunk_{}.json'.format(i))
Assuming you have the option to go back and export the data again...:
pg_dump - extract a PostgreSQL database into a script file or other archive file.
pg_restore - restore a PostgreSQL database from an archive file created by pg_dump.
If that's no use, it might be useful to know what you're going to be doing with the output so that another suggestion can hit the mark.
In Python:
import json
with open('file.json') as infile:
o = json.load(infile)
chunkSize = 1000
for i in xrange(0, len(o), chunkSize):
with open('file_' + str(i//chunkSize) + '.json', 'w') as outfile:
json.dump(o[i:i+chunkSize], outfile)
I turned phihag's and mark's work into a tiny script (gist)
also copied below:
#!/usr/bin/env python
# based on http://stackoverflow.com/questions/7052947/split-95mb-json-array-into-smaller-chunks
# usage: python json-split filename.json
# produces multiple filename_0.json of 1.49 MB size
import json
import sys
with open(sys.argv[1],'r') as infile:
o = json.load(infile)
chunkSize = 4550
for i in xrange(0, len(o), chunkSize):
with open(sys.argv[1] + '_' + str(i//chunkSize) + '.json', 'w') as outfile:
json.dump(o[i:i+chunkSize], outfile)