Opening A large JSON file in Python with no newlines for csv conversion Python 2.6.6

前端 未结 1 962
孤城傲影
孤城傲影 2020-12-21 14:15

I am attempting to convert a very large json file to csv. I have been able to convert a small file of this type to a 10 record (for example) csv file. However, when trying t

相关标签:
1条回答
  • 2020-12-21 14:57

    I wound up using a chunk size of 8388608 (0x800000 hex) in order to process the files. I then processed the lines that had been read in as part of the loop, keeping count of rows processed and rows discarded. At each process function, I added the number to the totals so that I could keep track of total records processed.

    This appears to be the way that it needs to go.

    Next time a question like this is asked, please emphasize that a large chunk size must be specified and not the 2048 as shown in the original answer.

    The loop goes

    first = True
    for data in self.json_parse(inf):
      records = len(data['MainKey'])
      columns = len(data['MainKey'][0]['Fields'])
      if first:
        # Initialize output as DictWriter
        ofile, outf, fields = self.init_csv(csvname, data, records, columns)
        first = False
      reccount, errcount = self.parse_records(outf, data, fields, records)
    

    Within the parsing routine

    for rec in range(records):
      currec = data['MainKey'][rec]
      # If each column count can be different
      columns = len(currec['Fields'])
      retval, valrec = self.build_csv_row(currec, columns, fields)
    

    To parse the columns use

    for col in columns:
      dataname = currec['Fields'][col]['name']
      dataval = currec['Fields'][col]['Values']['value']
    

    Thus the references now work and the processing is handled correctly. The large chunk apparently allows the processing to be fast enough to handle the data while being small enough not to overload the system.

    0 讨论(0)
提交回复
热议问题