sparkling-water | 易学教程

How to Setup SPARK_HOME variable?

阅读更多关于 How to Setup SPARK_HOME variable?

来源： https://stackoverflow.com/questions/46613651/how-to-setup-spark-home-variable

xgboost in pysparkling water throws an error: XGBoost is not available on all nodes

阅读更多关于 xgboost in pysparkling water throws an error: XGBoost is not available on all nodes

问题 I am trying to run xgboost from H2O package in a spark cluster. I am using h2o on an on-prem cluster on a Red Hat Enterprise Linux Server, versin:'3.10.0-1062.9.1.el7.x86_64'. I start H2O cluster inside the Spark environment .appName('APP1')\ .config('spark.executor.memory', '15g')\ .config('spark.executor.cores', '8')\ .config('spark.executor.instances','5')\ .config('spark.yarn.queue', "DS")\ .config('spark.yarn.executor.memoryOverhead', '1096')\ .enableHiveSupport()\ .getOrCreate() from

How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

阅读更多关于 How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

问题 I have a saved h2o model in mojo format, and now I am trying to load it and use it to make predictions on a new dataset ( df ) as part of a spark app written in scala. Ideally, I wish to append a new row to the existing DataFrame containing the class probability based on this model. I can see how to apply a mojo to an individual row already in a RowData format (as per answer here), but I am not sure how to map over an existing DataFrame so that it is in the right format to make predictions

h2o deeplearning error when specifying nfolds for cross validation

阅读更多关于 h2o deeplearning error when specifying nfolds for cross validation

问题 has this issue been resolved by now? I encounter the same error message. Usecase: I am doing binary classification using h2o's deeplearning() function. Below, I provide randomly generated data the same size as my actual usecase. System specs: # R version 3.3.2 (2016-10-31) # Platform: x86_64-w64-mingw32/x64 (64-bit) # Running under: Windows >= 8 x64 (build 9200) # h2o version h2o_3.20.0.2 I am currently learning how to use h2o, so I have played with that function quite a bit. Everything runs

Error with H20Context running PySparkling with Spark 2.1

阅读更多关于 Error with H20Context running PySparkling with Spark 2.1

问题 I'm getting this error when trying to run a Pysparkling script on an AWS EMR cluster. I can get everything to work when downloading Sparkling water 2.1.8 and running it from a pysparkling shell. However, spark-submit does not seem to work. Error: NameError: name 'H2OContext' is not defined My spark-submit: spark-submit --packages ai.h2o:sparkling-water-core_2.11:2.1.7,ai.h2o:sparkling-water-examples_2.11:2.1.7 --conf spark.dynamicAllocation.enabled=false spark.py Python file from pysparkling

How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

阅读更多关于 How to map over DataFrame in spark to extract RowData and make predictions using h2o mojo model

I have a saved h2o model in mojo format, and now I am trying to load it and use it to make predictions on a new dataset ( df ) as part of a spark app written in scala. Ideally, I wish to append a new row to the existing DataFrame containing the class probability based on this model. I can see how to apply a mojo to an individual row already in a RowData format (as per answer here ), but I am not sure how to map over an existing DataFrame so that it is in the right format to make predictions using the mojo model. I have worked with DataFrames a fair bit, but never with the underlying RDDs. Also