Looking for a more efficient way to reorganize a massive CSV in Python

后端 未结 2 692
刺人心
刺人心 2021-01-20 16:20

I\'ve been working on a problem where I have data from a large output .txt file, and now have to parse and reorganize certain values in the the form of a .csv.

I\'ve

相关标签:
2条回答
  • 2021-01-20 16:37

    If the CSV file would fit into your RAM (e.g. less than 2GB), then you can just read the whole thing and do a sort on it:

    data = list(csv.reader(fn))
    data.sort(key=lambda line:line[0])
    csv.writer(outfn).writerows(data)
    

    That shouldn't take nearly as long if you don't thrash. Note that .sort is a stable sort, so it will preserve the time order of your file when the keys are equal.

    If it won't fit into RAM, you will probably want to do something a bit clever. For example, you can store the file offsets of each line, along with the necessary information from the line (timestamp and flight ID), then sort on those, and write the output file using the line offset information.

    0 讨论(0)
  • 2021-01-20 17:00

    You can try the UNIX sort utility:

    sort -n -s -t, -k1,1 infile.csv > outfile.csv
    

    -t sets the delimiter and -k sets the sort key. -s stabilizes the sort, and -n uses numeric comparison.

    0 讨论(0)
提交回复
热议问题