How to share pandas DataFrame object between processes?

前端 未结 1 670
谎友^
谎友^ 2021-01-02 15:31

This question has the same point of the link that I posted before.

( Is there a good way to avoid memory deep copy or to reduce time spent in multiprocessing

相关标签:
1条回答
  • 2021-01-02 15:53

    You can use a Namespace Manager, the following code works as you expect.

    #-*- coding: UTF-8 -*-'
    import pandas as pd
    import numpy as np
    from multiprocessing import *
    import multiprocessing.sharedctypes as sharedctypes
    import ctypes
    
    def add_new_derived_column(ns):
        dataframe2 = ns.df
        dataframe2['new_column']=dataframe2['A']+dataframe2['B'] / 2
        print (dataframe2.head())
        ns.df = dataframe2
    
    if __name__ == "__main__":
    
        mgr = Manager()
        ns = mgr.Namespace()
    
        dataframe = pd.DataFrame(np.random.randn(100000, 2), columns=['A', 'B'])
        ns.df = dataframe
        print (dataframe.head())
    
        # then I pass the "shared_df_obj" to Mulitiprocessing.Process object
        process=Process(target=add_new_derived_column, args=(ns,))
        process.start()
        process.join()
    
        print (ns.df.head())
    
    0 讨论(0)
提交回复
热议问题