I am accessing a very large Pandas dataframe as a global variable. This variable is accessed in parallel via joblib.
Eg.
df = db.query(\"select id, a_lo
Python multiprocessing is typically done using separate processes, as you noted, meaning that the processes don't share memory. There's a potential workaround if you can get things to work with np.memmap
as mentioned a little farther down the joblib docs, though dumping to disk will obviously add some overhead of its own: https://pythonhosted.org/joblib/parallel.html#working-with-numerical-data-in-shared-memory-memmaping