问题
I need to run an embarrassingly parallel for loop. After a quick search, I found package joblib for python. I did a simple test as posted on the package's website. Here is the test
from math import sqrt
from joblib import Parallel, delayed
import multiprocessing
%timeit [sqrt(i ** 2) for i in range(10)]
result: 3.89 µs ± 38.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
num_cores = multiprocessing.cpu_count()
%timeit Parallel(n_jobs=num_cores)(delayed(sqrt)(i ** 2) for i in range(10))
result: 600 ms ± 40 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
If I understand the results correctly, using the joblib does not only increase the speed but make in it slower? Did I miss something here, Thank you
回答1:
Joblib
creates new processes to run the functions you want to execute in parallel. However, creating processes can take some time (around 500ms), especially now that joblib uses spawn
to create new processes (and not fork
).
Because the function you want to run in parallel is very fast to run, the result of %timeit
here mostly shows the overhead of process creation. If you choose a function that runs during a time that is not negligible compared to the time required to start new processes, you will see some improvements in performance:
Here is a sample you can run to test this:
import time
import joblib
from joblib import Parallel, delayed
def f(x):
time.sleep(1)
return x
def bench_joblib(n_jobs):
start_time = time.time()
Parallel(n_jobs=n_jobs)(delayed(f)(x) for x in range(4))
print('running 4 times f using n_jobs = {} : {:.2f}s'.format(
n_jobs, time.time()-start_time))
if __name__ == "__main__":
bench_joblib(1)
bench_joblib(4)
I got, using python 3.7 and joblib 0.12.5
running 4 times f using n_jobs = 1 : 4.01s
running 4 times f using n_jobs = 4 : 1.34s
来源:https://stackoverflow.com/questions/48349980/python-joblib-performance