Large Pandas Dataframe parallel processing

后端 未结 2 2046
感情败类
感情败类 2021-02-07 20:38

I am accessing a very large Pandas dataframe as a global variable. This variable is accessed in parallel via joblib.

Eg.

df = db.query(\"select id, a_lo         


        
2条回答
  •  无人共我
    2021-02-07 21:20

    Python multiprocessing is typically done using separate processes, as you noted, meaning that the processes don't share memory. There's a potential workaround if you can get things to work with np.memmap as mentioned a little farther down the joblib docs, though dumping to disk will obviously add some overhead of its own: https://pythonhosted.org/joblib/parallel.html#working-with-numerical-data-in-shared-memory-memmaping

提交回复
热议问题