How to input large data into python pandas using looping or parallel computing?

后端 未结 5 638
谎友^
谎友^ 2021-02-13 21:46

I have a csv file of 8gb and I am not able to run the code as it shows memory error.

file = \"./data.csv\"
df = pd.read_csv(file, sep=\"/\", header=0, dtype=str         


        
5条回答
  •  广开言路
    2021-02-13 22:28

    import numpy as np
    from multiprocessing import Pool
    
    def processor(df):
    
        # Some work
    
        df.sort_values('id', inplace=True)
        return df
    
    size = 8
    df_split = np.array_split(df, size)
    
    cores = 8
    pool = Pool(cores)
    for n, frame in enumerate(pool.imap(processor, df_split), start=1):
        frame.to_csv('{}'.format(n))
    pool.close()
    pool.join()
    

提交回复
热议问题