I have setup simple experiment to check importance of the multi core CPU while running sklearn GridSearchCV
with KNeighborsClassifier
. The results
Here are some reasons which might be a cause of this behaviour
n_job
n_job=1
and n_job=2
the time per thread(Time per model evaluation by GridSearchCV to fully train the model and test it) was 2.9s (overall time ~2 mins)n_job=3
, time was 3.4s (overall time 1.4 mins)n_job=4
, time was 3.8s (overall time 58 secs)n_job=5
, time was 4.2s (overall time 51 secs)n_job=6
, time was 4.2s (overall time ~49 secs)n_job=7
, time was 4.2s (overall time ~49 secs)n_job=8
, time was 4.2s (overall time ~49 secs)Now as you can see, time per thread increased but overall time seem to decrease (although beyond n_job=4 the different was not exactly linear) and remained constained with
n_jobs>=6` This is due to the fact that there is a cost incurred with initializing and releaseing threads. See this github issue and this issue.
Also, there might be other bottlenecks like data being to large to be broadcasted to all threads at the same time, thread pre-emption over RAM (or other resouces,etc.), how data is pushed into each thread, etc.
I suggest you to read about Ahmdal's Law which states that there is a theoretical bound on the speedup that can be achieved through parallelization which is given by the formula Image Source : Ahmdal's Law : Wikipedia
Finally, it might be due to the data size and the complexity of the model you use for training as well.
Here is a blog post explaining the same issue regarding multithreading.