How do I read a large csv file with pandas?

后端 未结 15 1852
隐瞒了意图╮
隐瞒了意图╮ 2020-11-21 07:12

I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error:

MemoryError                               Traceback (most recen         


        
15条回答
  •  你的背包
    2020-11-21 07:19

    Chunking shouldn't always be the first port of call for this problem.

    1. Is the file large due to repeated non-numeric data or unwanted columns?

      If so, you can sometimes see massive memory savings by reading in columns as categories and selecting required columns via pd.read_csv usecols parameter.

    2. Does your workflow require slicing, manipulating, exporting?

      If so, you can use dask.dataframe to slice, perform your calculations and export iteratively. Chunking is performed silently by dask, which also supports a subset of pandas API.

    3. If all else fails, read line by line via chunks.

      Chunk via pandas or via csv library as a last resort.

提交回复
热议问题