How to input large data into python pandas using looping or parallel computing?

后端 未结 5 636
谎友^
谎友^ 2021-02-13 21:46

I have a csv file of 8gb and I am not able to run the code as it shows memory error.

file = \"./data.csv\"
df = pd.read_csv(file, sep=\"/\", header=0, dtype=str         


        
5条回答
  •  臣服心动
    2021-02-13 22:49

    Use the chunksize parameter to read one chunk at the time and save the files to disk. This will split the original file in equal parts by 100000 rows each:

    file = "./data.csv"
    chunks = pd.read_csv(file, sep="/", header=0, dtype=str, chunksize = 100000)
    
    for it, chunk in enumerate(chunks):
        chunk.to_csv('chunk_{}.csv'.format(it), sep="/") 
    

    If you know the number of rows of the original file you can calculate the exact chunksize to split the file in 8 equal parts (nrows/8).

提交回复
热议问题