Looking for a more efficient way to reorganize a massive CSV in Python

后端 未结 2 689
刺人心
刺人心 2021-01-20 16:20

I\'ve been working on a problem where I have data from a large output .txt file, and now have to parse and reorganize certain values in the the form of a .csv.

I\'ve

2条回答
  •  -上瘾入骨i
    2021-01-20 16:37

    If the CSV file would fit into your RAM (e.g. less than 2GB), then you can just read the whole thing and do a sort on it:

    data = list(csv.reader(fn))
    data.sort(key=lambda line:line[0])
    csv.writer(outfn).writerows(data)
    

    That shouldn't take nearly as long if you don't thrash. Note that .sort is a stable sort, so it will preserve the time order of your file when the keys are equal.

    If it won't fit into RAM, you will probably want to do something a bit clever. For example, you can store the file offsets of each line, along with the necessary information from the line (timestamp and flight ID), then sort on those, and write the output file using the line offset information.

提交回复
热议问题