pandas multiprocessing apply

后端 未结 8 1371
借酒劲吻你
借酒劲吻你 2020-11-28 06:02

I\'m trying to use multiprocessing with pandas dataframe, that is split the dataframe to 8 parts. apply some function to each part using apply (with each part processed in d

相关标签:
8条回答
  • 2020-11-28 06:37

    Install Pyxtension that simplifies using parallel map and use like this:

    from pyxtension.streams import stream
    
    big_df = pd.concat(stream(np.array_split(df, multiprocessing.cpu_count())).mpmap(process))
    
    0 讨论(0)
  • 2020-11-28 06:38

    This worked well for me:

    rows_iter = (row for _, row in df.iterrows())
    
    with multiprocessing.Pool() as pool:
        df['new_column'] = pool.map(process_apply, rows_iter)
    
    0 讨论(0)
提交回复
热议问题