I am trying to read a large csv file (aprox. 6 GB) in pandas and i am getting a memory error:
MemoryError Traceback (most recen
Chunking shouldn't always be the first port of call for this problem.
Is the file large due to repeated non-numeric data or unwanted columns?
If so, you can sometimes see massive memory savings by reading in columns as categories and selecting required columns via pd.read_csv usecols
parameter.
Does your workflow require slicing, manipulating, exporting?
If so, you can use dask.dataframe to slice, perform your calculations and export iteratively. Chunking is performed silently by dask, which also supports a subset of pandas API.
If all else fails, read line by line via chunks.
Chunk via pandas or via csv library as a last resort.