I have recently started using Jupyter Lab and my problem is that I work with quite large datasets (usually the dataset itself is approx. 1/4 of my computer RAM). After few trans
I think you should use chunks. Like that:
df_chunk = pd.read_csv(r'../input/data.csv', chunksize=1000000)
chunk_list = [] # append each chunk df here
# Each chunk is in df format
for chunk in df_chunk:
# perform data filtering
chunk_filter = chunk_preprocessing(chunk)
# Once the data filtering is done, append the chunk to list
chunk_list.append(chunk_filter)
# concat the list into dataframe
df_concat = pd.concat(chunk_list)
For more information check it out: https://towardsdatascience.com/why-and-how-to-use-pandas-with-large-data-9594dda2ea4c
I suggest don't append a list again(probably the RAM will overload again). You should finish your job in that for loop.