R: How to use parallelMap (with mlr, xgboost) on linux server? Unexpected performance compared to windows

廉价感情. 提交于 2019-12-24 16:58:17

问题


I am trying to parallelize at the tuning hyperparameter level an xgboost model that I am tuning in mlr and am trying to parallelize with parallelMap. I have code that works successfully on my windows machine (with only 8 cores) and would like to make use of a linux server (with 72 cores). I have not been able to successfully gain any computational advantage moving to the server, and I think this is a result of holes in my understanding of the parallelMap parameters.

I do not understand the differences in multicore vs local vs socket as "modes" in parallelMap. Based on my reading, I think that multicore would work for my situation, but I am not sure. I used socket successfully on my windows machine and have tried both socket and multicore on my linux server, with unsuccessful results.

parallelStart(mode="socket", cpu=8, level="mlr.tuneParams")

but it is my understanding that socket might be unnecessary or perhaps slow for parallelizing over many cores that do not need to communicate with each other, as is the case with parallelizing hyperparameter tuning.

To elaborate on my unsuccessful results on my linux server: I am not getting errors, but things that would take <24 hours in serial are taking > 2 weeks in parallel. Looking at the processes, I can see that I am indeed using several cores.

Each individual call xgboost runs in the matter of a few minutes, and I am not trying to speed that up. I am only trying to tune hyperparmeters over several cores.

I was concerned that perhaps my very slow results on my linux server were due to attempts by xgboost to make use of the available cores in model building, so I fed nthread = 1 to xgboost via mlr to ensure that does not happen. Nonetheless, my code seems to run much slower on my larger linux server than it does on my smaller windows computer -- any thoughts as to what might be happening?

Thanks so very much.

xgb_learner_tune <- makeLearner(
  "classif.xgboost",
  predict.type = "response",
  par.vals = list(
    objective = "binary:logistic",
    eval_metric = "map",
    nthread=1))

library(parallelMap)
parallelStart(mode="multicore", cpu=8, level="mlr.tuneParams")

tuned_params_trim <- tuneParams(
  learner = xgb_learner_tune,
  task = trainTask,
  resampling = resample_desc,
  par.set = xgb_params,
  control = control,
  measures = list(ppv, tpr, tnr, mmce)
)
parallelStop()

Edit

I am still surprised by my lack of performance improvement attempting to parallelize at the tuning level. Are my expectations unfair? I am getting substantially slower performance with parallelMap than tuning in serial for the below process:

numeric_ps = makeParamSet(
  makeNumericParam("C", lower = 0.5, upper = 2.0),
  makeNumericParam("sigma", lower = 0.5, upper = 2.0)
)
ctrl = makeTuneControlRandom(maxit=1024L)
rdesc = makeResampleDesc("CV", iters = 3L)

#In serial
start.time.serial <- Sys.time()
res.serial = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
                 par.set = numeric_ps, control = ctrl)
stop.time.serial <- Sys.time()
stop.time.serial - start.time.serial

#In parallel with 2 CPUs
start.time.parallel.2 <- Sys.time()
parallelStart(mode="multicore", cpu=2, level="mlr.tuneParams")
res.parallel.2 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
                 par.set = numeric_ps, control = ctrl)
parallelStop()
stop.time.parallel.2 <- Sys.time()
stop.time.parallel.2 - start.time.parallel.2

#In parallel with 16 CPUs
start.time.parallel.16 <- Sys.time()
parallelStart(mode="multicore", cpu=16, level="mlr.tuneParams")
res.parallel.16 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
                          par.set = numeric_ps, control = ctrl)
parallelStop()
stop.time.parallel.16 <- Sys.time()
stop.time.parallel.16 - start.time.parallel.16 

My console output is (tuning details omitted):

> stop.time.serial - start.time.serial
Time difference of 33.0646 secs

> stop.time.parallel - start.time.parallel
Time difference of 2.49616 mins

> stop.time.parallel.16 - start.time.parallel.16
Time difference of 2.533662 mins

I would have expected things to be faster in parallel. Is that unreasonable for this example? If so, when should I expect performance improvements in parallel?

Looking at the terminal, I do seem to be using 2 (and 16) threads/processes (apologies if my terminology is incorrect).

Thanks so much for any further input.


回答1:


This question is more about guessing whats wrong in your setup than actually providing a "real" answer. Maybe you could also change the title as you did not get "unexpected results".

Some points:

  • nthread = 1 is already the default for xgboost in mlr
  • multicore is the preferred mode on UNIX systems
  • If your local machine is faster than your server, than either your calculations finish very quickly and the CPU freq between both is substantially different or you should think about parallelizing another level than mlr.tuneParams (see here for more information)

Edit

Everythings fine on my machine. Looks like a local problem on your side.

library(mlr)
#> Loading required package: ParamHelpers
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
library(parallelMap)

numeric_ps = makeParamSet(
  makeNumericParam("C", lower = 0.5, upper = 2.0),
  makeNumericParam("sigma", lower = 0.5, upper = 2.0)
)
ctrl = makeTuneControlRandom(maxit=1024L)
rdesc = makeResampleDesc("CV", iters = 3L)

#In serial
start.time.serial <- Sys.time()
res.serial = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
  par.set = numeric_ps, control = ctrl)
#> [Tune] Started tuning learner classif.ksvm for parameter set:
#>          Type len Def   Constr Req Tunable Trafo
#> C     numeric   -   - 0.5 to 2   -    TRUE     -
#> sigma numeric   -   - 0.5 to 2   -    TRUE     -
#> With control class: TuneControlRandom
#> Imputation value: 1
stop.time.serial <- Sys.time()
stop.time.serial - start.time.serial
#> Time difference of 31.28781 secs


#In parallel with 2 CPUs
start.time.parallel.2 <- Sys.time()
parallelStart(mode="multicore", cpu=2, level="mlr.tuneParams")
#> Starting parallelization in mode=multicore with cpus=2.
res.parallel.2 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
  par.set = numeric_ps, control = ctrl)
#> [Tune] Started tuning learner classif.ksvm for parameter set:
#>          Type len Def   Constr Req Tunable Trafo
#> C     numeric   -   - 0.5 to 2   -    TRUE     -
#> sigma numeric   -   - 0.5 to 2   -    TRUE     -
#> With control class: TuneControlRandom
#> Imputation value: 1
#> Mapping in parallel: mode = multicore; level = mlr.tuneParams; cpus = 2; elements = 1024.
#> [Tune] Result: C=1.12; sigma=0.647 : mmce.test.mean=0.0466667
parallelStop()
#> Stopped parallelization. All cleaned up.
stop.time.parallel.2 <- Sys.time()
stop.time.parallel.2 - start.time.parallel.2
#> Time difference of 16.13145 secs


#In parallel with 4 CPUs
start.time.parallel.16 <- Sys.time()
parallelStart(mode="multicore", cpu=4, level="mlr.tuneParams")
#> Starting parallelization in mode=multicore with cpus=4.
res.parallel.16 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
  par.set = numeric_ps, control = ctrl)
#> [Tune] Started tuning learner classif.ksvm for parameter set:
#>          Type len Def   Constr Req Tunable Trafo
#> C     numeric   -   - 0.5 to 2   -    TRUE     -
#> sigma numeric   -   - 0.5 to 2   -    TRUE     -
#> With control class: TuneControlRandom
#> Imputation value: 1
#> Mapping in parallel: mode = multicore; level = mlr.tuneParams; cpus = 4; elements = 1024.
#> [Tune] Result: C=0.564; sigma=0.5 : mmce.test.mean=0.0333333
parallelStop()
#> Stopped parallelization. All cleaned up.
stop.time.parallel.16 <- Sys.time()
stop.time.parallel.16 - start.time.parallel.16 
#> Time difference of 10.14408 secs

Created on 2019-06-14 by the reprex package (v0.3.0)



来源:https://stackoverflow.com/questions/55978153/r-how-to-use-parallelmap-with-mlr-xgboost-on-linux-server-unexpected-perfor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!