sparkling-water

xgboost in pysparkling water throws an error: XGBoost is not available on all nodes

那年仲夏 提交于 2020-04-18 05:46:08
问题 I am trying to run xgboost from H2O package in a spark cluster. I am using h2o on an on-prem cluster on a Red Hat Enterprise Linux Server, versin:'3.10.0-1062.9.1.el7.x86_64'. I start H2O cluster inside the Spark environment .appName('APP1')\ .config('spark.executor.memory', '15g')\ .config('spark.executor.cores', '8')\ .config('spark.executor.instances','5')\ .config('spark.yarn.queue', "DS")\ .config('spark.yarn.executor.memoryOverhead', '1096')\ .enableHiveSupport()\ .getOrCreate() from

How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

ぐ巨炮叔叔 提交于 2020-01-23 12:36:13
问题 I have a saved h2o model in mojo format, and now I am trying to load it and use it to make predictions on a new dataset ( df ) as part of a spark app written in scala. Ideally, I wish to append a new row to the existing DataFrame containing the class probability based on this model. I can see how to apply a mojo to an individual row already in a RowData format (as per answer here), but I am not sure how to map over an existing DataFrame so that it is in the right format to make predictions

h2o deeplearning error when specifying nfolds for cross validation

回眸只為那壹抹淺笑 提交于 2019-12-13 05:01:10
问题 has this issue been resolved by now? I encounter the same error message. Usecase: I am doing binary classification using h2o's deeplearning() function. Below, I provide randomly generated data the same size as my actual usecase. System specs: # R version 3.3.2 (2016-10-31) # Platform: x86_64-w64-mingw32/x64 (64-bit) # Running under: Windows >= 8 x64 (build 9200) # h2o version h2o_3.20.0.2 I am currently learning how to use h2o, so I have played with that function quite a bit. Everything runs

Error with H20Context running PySparkling with Spark 2.1

岁酱吖の 提交于 2019-12-11 14:51:50
问题 I'm getting this error when trying to run a Pysparkling script on an AWS EMR cluster. I can get everything to work when downloading Sparkling water 2.1.8 and running it from a pysparkling shell. However, spark-submit does not seem to work. Error: NameError: name 'H2OContext' is not defined My spark-submit: spark-submit --packages ai.h2o:sparkling-water-core_2.11:2.1.7,ai.h2o:sparkling-water-examples_2.11:2.1.7 --conf spark.dynamicAllocation.enabled=false spark.py Python file from pysparkling

How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

五迷三道 提交于 2019-12-06 05:37:38
I have a saved h2o model in mojo format, and now I am trying to load it and use it to make predictions on a new dataset ( df ) as part of a spark app written in scala. Ideally, I wish to append a new row to the existing DataFrame containing the class probability based on this model. I can see how to apply a mojo to an individual row already in a RowData format (as per answer here ), but I am not sure how to map over an existing DataFrame so that it is in the right format to make predictions using the mojo model. I have worked with DataFrames a fair bit, but never with the underlying RDDs. Also