Error tokenizing data. C error: out of memory pandas python, large file csv

前端 未结 4 1553

I have a large csv file of 3.5 go and I want to read it using pandas.

This is my code:

import pandas as pd
tp = pd.read_csv(\'train_2011_2012_2013.csv\',         


        
相关标签:
4条回答
  • 2021-02-02 01:36

    You may try to add parameter engine='python. It loads the data slower but it helped in my situation.

    0 讨论(0)
  • 2021-02-02 01:42

    This error could also be caused by the chunksize=20000000. Decreasing that fixed the issue in my case. In ℕʘʘḆḽḘ's solution chunksize is also decreased which might have done the trick.

    0 讨论(0)
  • 2021-02-02 01:49

    try this bro:

    mylist = []
    
    for chunk in  pd.read_csv('train_2011_2012_2013.csv', sep=';', chunksize=20000):
        mylist.append(chunk)
    
    big_data = pd.concat(mylist, axis= 0)
    del mylist
    
    0 讨论(0)
  • 2021-02-02 01:55

    You may try setting error_bad_lines = False when calling the csv file i.e.

    import pandas as pd
    df = pd.read_csv('my_big_file.csv', error_bad_lines = False)
    
    0 讨论(0)
提交回复
热议问题