问题
I am trying to run xgboost from H2O package in a spark cluster. I am using h2o on an on-prem cluster on a Red Hat Enterprise Linux Server, versin:'3.10.0-1062.9.1.el7.x86_64'.
I start H2O cluster inside the Spark environment
.appName('APP1')\
.config('spark.executor.memory', '15g')\
.config('spark.executor.cores', '8')\
.config('spark.executor.instances','5')\
.config('spark.yarn.queue', "DS")\
.config('spark.yarn.executor.memoryOverhead', '1096')\
.enableHiveSupport()\
.getOrCreate()
from pysparkling import *
import h2o
h2oConf = H2OConf()
hc = H2OContext.getOrCreate()
Connecting to H2O server at ... successful.
H2O cluster uptime: 13 secs
H2O cluster timezone: UTC
H2O data parsing timezone: UTC
H2O cluster version: 3.28.1.2
H2O cluster version age: 23 days
H2O cluster name: sparkling-water-app
H2O cluster total nodes: 5
H2O cluster free memory: 111.1 Gb
H2O cluster total cores: 160
H2O cluster allowed cores: 40
H2O cluster status: locked, healthy
H2O connection url: http
H2O connection proxy: None
H2O internal security: False
H2O API Extensions: XGBoost, Algos, Amazon S3, Sparkling Water REST API Extensions, AutoML, Core V3, TargetEncoder, Core V4
Python version: 2.7.13 final
Sparkling Water Context:
* Sparkling Water Version: 3.28.1.2-1-2.2
* H2O name: sparkling-water-app
* cluster size: 5
* list of used nodes:
(executorId, host, port)
and I see xgboost in H2O API Extensions. However, when I try to run it, I get the error that xgboost is not available on all nodes or it is not registered. I am importing xgboost from pysparkling.ml and Here is the output:
from pysparkling.ml import H2OXGBoost
estimator = H2OXGBoost(labelCol = "label")
model = estimator.fit(dt_train)
Py4JJavaErrorTraceback (most recent call last)
<ipython-input-44-a579ec322aa9> in <module>()
3
4
----> 5 model = estimator.fit(dt_train)
/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/ml/base.py in fit(self, dataset, params)
62 return self.copy(params)._fit(dataset)
63 else:
---> 64 return self._fit(dataset)
65 else:
66 raise ValueError("Params must be either a param map or a list/tuple of param maps, "
/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/ml/wrapper.py in _fit(self, dataset)
263
264 def _fit(self, dataset):
--> 265 java_model = self._fit_java(dataset)
266 return self._create_model(java_model)
267
/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/ml/wrapper.py in _fit_java(self, dataset)
260 """
261 self._transfer_params_to_java()
--> 262 return self._java_obj.fit(dataset._jdf)
263
264 def _fit(self, dataset):
/opt/continuum/anaconda3/envs/python27/lib/python2.7/site-packages/py4j/java_gateway.pyc in __call__(self, *args)
1158 answer = self.gateway_client.send_command(command)
1159 return_value = get_return_value(
-> 1160 answer, self.gateway_client, self.target_id, self.name)
1161
1162 for temp_arg in temp_args:
/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, **kw)
64 except py4j.protocol.Py4JJavaError as e:
65 s = e.java_exception.toString()
/opt/continuum/anaconda3/envs/python27/lib/python2.7/site-packages/py4j/protocol.pyc in get_return_value(answer, gateway_client, target_id, name)
318 raise Py4JJavaError(
319 "An error occurred while calling {0}{1}{2}.\n".
--> 320 format(target_id, ".", name), value)
321 else:
322 raise Py4JError(
Py4JJavaError: An error occurred while calling o6061.fit.
: water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for XGBoost model: XGBoost_model_1586296043974_6. Details: ERRR on field: XGBoost: XGBoost is not available on all nodes!
at water.exceptions.H2OModelBuilderIllegalArgumentException.makeFromBuilder(H2OModelBuilderIllegalArgumentException.java:19)
at hex.tree.xgboost.XGBoost.init(XGBoost.java:159)
at hex.tree.xgboost.XGBoost$XGBoostDriver.computeImpl(XGBoost.java:315)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:242)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1470)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
And when I use it within the grid search, this is the error that I get:
xgboost Grid Build progress: |████████████████████████████████████████████| 100%
Errors/Warnings building gridsearch model
Hyper-parameter: col_sample_rate, 0.6
Hyper-parameter: learn_rate, 0.01
Hyper-parameter: max_depth, 3
Hyper-parameter: sample_rate, 0.6
Hyper-parameter: tweedie_power, 1.75
failure_details: Algorithm 'XGBoost' is not registered. Available algos: [targetencoder,deeplearning,glm,glrm,kmeans,naivebayes,pca,svd,drf,gbm,isolationforest,aggregator,deepwater,word2vec,stackedensemble,coxph,generic,psvm]
failure_stack_traces: java.lang.IllegalStateException: Algorithm 'XGBoost' is not registered. Available algos: [targetencoder,deeplearning,glm,glrm,kmeans,naivebayes,pca,svd,drf,gbm,isolationforest,aggregator,deepwater,word2vec,stackedensemble,coxph,generic,psvm]
at hex.ModelBuilder.make(ModelBuilder.java:173)
at hex.ModelBuilder$TrainModelNestedRunnable.run(ModelBuilder.java:426)
at water.H2O$RunnableWrapperTask.compute2(H2O.java:1380)
at water.H2O$H2OCountedCompleter.compute1(H2O.java:1473)
at water.H2O$RunnableWrapperTask$Icer.compute1(H2O$RunnableWrapperTask$Icer.java)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1469)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
来源:https://stackoverflow.com/questions/61104483/xgboost-in-pysparkling-water-throws-an-error-xgboost-is-not-available-on-all-no