Writing large Pandas Dataframes to CSV file in chunks

前端 未结 3 438
无人共我
无人共我 2020-12-01 08:39

How do I write out a large data file to a CSV file in chunks?

I have a set of large data files (1M rows x 20 cols). However, only 5 or so columns of that data is of

相关标签:
3条回答
  • 2020-12-01 08:49

    Solution:

    header = True
    for chunk in chunks:
    
        chunk.to_csv(os.path.join(folder, new_folder, "new_file_" + filename),
            header=header, cols=[['TIME','STUFF']], mode='a')
    
        header = False
    

    Notes:

    • The mode='a' tells pandas to append.
    • We only write a column header on the first chunk.
    0 讨论(0)
  • 2020-12-01 09:06

    Check out the chunksize argument in the to_csv method. Here are the docs.

    Writing to file would look like:

    df.to_csv("path/to/save/file.csv", chunksize=1000, cols=['TIME','STUFF'])
    
    0 讨论(0)
  • 2020-12-01 09:10

    Why don't you only read the columns of interest and then save it?

    file_in = os.path.join(folder, filename)
    file_out = os.path.join(folder, new_folder, 'new_file' + filename)
    
    df = pd.read_csv(file_in, sep='\t', skiprows=(0, 1, 2), header=0, names=['TIME', 'STUFF'])
    df.to_csv(file_out)
    
    0 讨论(0)
提交回复
热议问题