Writing large Pandas Dataframes to CSV file in chunks

前端未结

关注

 3  438

无人共我

How do I write out a large data file to a CSV file in chunks?

I have a set of large data files (1M rows x 20 cols). However, only 5 or so columns of that data is of

相关标签:

3条回答

暗喜

2020-12-01 08:49

Solution:

header = True
for chunk in chunks:

    chunk.to_csv(os.path.join(folder, new_folder, "new_file_" + filename),
        header=header, cols=[['TIME','STUFF']], mode='a')

    header = False

Notes:

The mode='a' tells pandas to append.
We only write a column header on the first chunk.

0 讨论(0)

情深已故

2020-12-01 09:06
Check out the chunksize argument in the to_csv method. Here are the docs.

Writing to file would look like:
```
df.to_csv("path/to/save/file.csv", chunksize=1000, cols=['TIME','STUFF'])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

温柔的废话

2020-12-01 09:10

Why don't you only read the columns of interest and then save it?

file_in = os.path.join(folder, filename)
file_out = os.path.join(folder, new_folder, 'new_file' + filename)

df = pd.read_csv(file_in, sep='\t', skiprows=(0, 1, 2), header=0, names=['TIME', 'STUFF'])
df.to_csv(file_out)

0 讨论(0)