Split huge (95Mb) JSON array into smaller chunks?

后端未结

关注

 4  1119

I exported some data from my database in the form of JSON, which is essentially just one [list] with a bunch (900K) of {objects} inside it.

Trying to import it on my

相关标签:

4条回答

心在旅途

2021-01-18 06:21
I know this is question is from a while back, but I think this new solution is hassle-free.

You can use pandas 0.21.0 which supports a chunksize parameter as part of read_json. You can load one chunk at a time and save the json:
```
import pandas as pd
chunks = pd.read_json('file.json', lines=True, chunksize = 20)
for i, c in enumerate(chunks):
    c.to_json('chunk_{}.json'.format(i))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2021-01-18 06:29

Assuming you have the option to go back and export the data again...:

pg_dump - extract a PostgreSQL database into a script file or other archive file.

pg_restore - restore a PostgreSQL database from an archive file created by pg_dump.

If that's no use, it might be useful to know what you're going to be doing with the output so that another suggestion can hit the mark.

0 讨论(0)
发布评论:

提交评论
- 加载中...

不思量自难忘°

2021-01-18 06:32

In Python:

import json
with open('file.json') as infile:
  o = json.load(infile)
  chunkSize = 1000
  for i in xrange(0, len(o), chunkSize):
    with open('file_' + str(i//chunkSize) + '.json', 'w') as outfile:
      json.dump(o[i:i+chunkSize], outfile)

0 讨论(0)

情歌与酒

2021-01-18 06:40

I turned phihag's and mark's work into a tiny script (gist)

also copied below:

#!/usr/bin/env python 
# based on  http://stackoverflow.com/questions/7052947/split-95mb-json-array-into-smaller-chunks
# usage: python json-split filename.json
# produces multiple filename_0.json of 1.49 MB size

import json
import sys

with open(sys.argv[1],'r') as infile:
    o = json.load(infile)
    chunkSize = 4550
    for i in xrange(0, len(o), chunkSize):
        with open(sys.argv[1] + '_' + str(i//chunkSize) + '.json', 'w') as outfile:
            json.dump(o[i:i+chunkSize], outfile)

0 讨论(0)