Opening A large JSON file in Python with no newlines for csv conversion Python 2.6.6

前端未结

关注

 1  962

I am attempting to convert a very large json file to csv. I have been able to convert a small file of this type to a 10 record (for example) csv file. However, when trying t

相关标签:

1条回答

迷失自我

2020-12-21 14:57
I wound up using a chunk size of 8388608 (0x800000 hex) in order to process the files. I then processed the lines that had been read in as part of the loop, keeping count of rows processed and rows discarded. At each process function, I added the number to the totals so that I could keep track of total records processed.

This appears to be the way that it needs to go.

Next time a question like this is asked, please emphasize that a large chunk size must be specified and not the 2048 as shown in the original answer.

The loop goes
```
first = True
for data in self.json_parse(inf):
  records = len(data['MainKey'])
  columns = len(data['MainKey'][0]['Fields'])
  if first:
    # Initialize output as DictWriter
    ofile, outf, fields = self.init_csv(csvname, data, records, columns)
    first = False
  reccount, errcount = self.parse_records(outf, data, fields, records)
```
Within the parsing routine
```
for rec in range(records):
  currec = data['MainKey'][rec]
  # If each column count can be different
  columns = len(currec['Fields'])
  retval, valrec = self.build_csv_row(currec, columns, fields)
```
To parse the columns use
```
for col in columns:
  dataname = currec['Fields'][col]['name']
  dataval = currec['Fields'][col]['Values']['value']
```
Thus the references now work and the processing is handled correctly. The large chunk apparently allows the processing to be fast enough to handle the data while being small enough not to overload the system.
0 讨论(0)
发布评论:

提交评论
- 加载中...