I want to use r packages on cran such as forecast
etc with sparkr and meet following two problems.
Should I pre-install all those packages on w
a better choice is to pass your local R package by spark-submit archive option, which means you do not need install R package in each worker and do not install and compile R package while running SparkR::dapply
for time consuming waiting. for example:
Sys.setenv("SPARKR_SUBMIT_ARGS"="--master yarn-client --num-executors 40 --executor-cores 10 --executor-memory 8G --driver-memory 512M --jars /usr/lib/hadoop/lib/hadoop-lzo-0.4.15-cdh5.11.1.jar --files /etc/hive/conf/hive-site.xml --archives /your_R_packages/3.5.zip --files xgboost.model sparkr-shell")
when call SparkR::dapply
function, let it call .libPaths("./3.5.zip/3.5")
first. And you need notice that the server version R version must be equal your zip file R version.