should I pre-install cran r packages on worker nodes when using sparkr

前端 未结 3 1456
甜味超标
甜味超标 2021-01-14 15:42

I want to use r packages on cran such as forecast etc with sparkr and meet following two problems.

  1. Should I pre-install all those packages on w

3条回答
  •  滥情空心
    2021-01-14 16:10

    a better choice is to pass your local R package by spark-submit archive option, which means you do not need install R package in each worker and do not install and compile R package while running SparkR::dapply for time consuming waiting. for example:

    Sys.setenv("SPARKR_SUBMIT_ARGS"="--master yarn-client --num-executors 40 --executor-cores 10 --executor-memory 8G --driver-memory 512M --jars /usr/lib/hadoop/lib/hadoop-lzo-0.4.15-cdh5.11.1.jar --files /etc/hive/conf/hive-site.xml --archives /your_R_packages/3.5.zip --files xgboost.model sparkr-shell")

    when call SparkR::dapply function, let it call .libPaths("./3.5.zip/3.5") first. And you need notice that the server version R version must be equal your zip file R version.

提交回复
热议问题