Split huge (95Mb) JSON array into smaller chunks?

后端 未结 4 1111
不知归路
不知归路 2021-01-18 06:04

I exported some data from my database in the form of JSON, which is essentially just one [list] with a bunch (900K) of {objects} inside it.

Trying to import it on my

相关标签:
4条回答
  • 2021-01-18 06:21

    I know this is question is from a while back, but I think this new solution is hassle-free.

    You can use pandas 0.21.0 which supports a chunksize parameter as part of read_json. You can load one chunk at a time and save the json:

    import pandas as pd
    chunks = pd.read_json('file.json', lines=True, chunksize = 20)
    for i, c in enumerate(chunks):
        c.to_json('chunk_{}.json'.format(i))
    
    0 讨论(0)
  • 2021-01-18 06:29

    Assuming you have the option to go back and export the data again...:

    pg_dump - extract a PostgreSQL database into a script file or other archive file.

    pg_restore - restore a PostgreSQL database from an archive file created by pg_dump.

    If that's no use, it might be useful to know what you're going to be doing with the output so that another suggestion can hit the mark.

    0 讨论(0)
  • 2021-01-18 06:32

    In Python:

    import json
    with open('file.json') as infile:
      o = json.load(infile)
      chunkSize = 1000
      for i in xrange(0, len(o), chunkSize):
        with open('file_' + str(i//chunkSize) + '.json', 'w') as outfile:
          json.dump(o[i:i+chunkSize], outfile)
    
    0 讨论(0)
  • 2021-01-18 06:40

    I turned phihag's and mark's work into a tiny script (gist)

    also copied below:

    #!/usr/bin/env python 
    # based on  http://stackoverflow.com/questions/7052947/split-95mb-json-array-into-smaller-chunks
    # usage: python json-split filename.json
    # produces multiple filename_0.json of 1.49 MB size
    
    import json
    import sys
    
    with open(sys.argv[1],'r') as infile:
        o = json.load(infile)
        chunkSize = 4550
        for i in xrange(0, len(o), chunkSize):
            with open(sys.argv[1] + '_' + str(i//chunkSize) + '.json', 'w') as outfile:
                json.dump(o[i:i+chunkSize], outfile)
    
    0 讨论(0)
提交回复
热议问题