h2o

H2OGeneralizedLinearEstimator() - Prediction Error

旧巷老猫 提交于 2019-12-10 18:23:21
问题 I am trying to predict test times in a Kaggle comp using the H2OGeneralizedLinearEstimator function. The model trains normally in line 3 and the metrics are all reasonable. However when I come to the predict step I get an error despite the test data frame matching the train data frame. Has anyone seen this error before? h2o_glm = H2OGeneralizedLinearEstimator() h2o_glm.train(training_frame=train_h2o,y='y') h2o_glm_predictions = h2o_glm.predict(test_data=test_h2o).as_data_frame() test_pred =

Converting R dataframe to H2O Frame without writing to disk

不羁岁月 提交于 2019-12-10 18:17:11
问题 I know the as.h2o function from h2o library converts an R data.frame to an H2O frame. Two questions: Does as.h2o() write data to disk during conversion? How long is this data stored? Are there other options that avoids the temp step of writing to disk? 回答1: The exact path of running as.h2o on a data.frame, df : path <- write.csv(df) h2o.upload(path) remove.file(path) We temporarily write to disk the data.frame and then subsequently upload rather than import the file into H2O and as soon as

Saving H2o data frame

南笙酒味 提交于 2019-12-10 15:24:56
问题 I am working with 10GB training data frame. I use H2o library for faster computation. Each time I load the dataset, I should convert the data frame into H2o object which is taking so much time. Is there a way to store the converted H2o object ? (so that i can skip the as.H2o(trainingset) step each time I make trails on building models ) 回答1: After the first transformation with as.h2o(trainingset) you can export / save the file to disk and later import it again. my_h2o_training_file <- as.h2o

h2o model not fit in driver node's memory error

余生颓废 提交于 2019-12-10 15:18:08
问题 I ran GBM model through R code in H2O and got below error. The same code was running fine a couple of weeks. Wondering if this is H2O side error Or configuration on the user system? water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model: gbm-2017-04-18-15-29-53. Details: ERRR on field: _ntrees: The tree model will not fit in the driver node's memory (23.2 MB per tree x 1000 > 3.32 GB) - try decreasing ntrees and/or max_depth or increasing min_rows! 回答1:

conversion of pandas dataframe to h2o frame efficiently

时光毁灭记忆、已成空白 提交于 2019-12-10 13:02:04
问题 I have a Pandas dataframe which has Encoding: latin-1 and is delimited by ; . The dataframe is very large almost of size: 350000 x 3800 . I wanted to use sklearn initially but my dataframe has missing values ( NAN values ) so i could not use sklearn's random forests or GBM. So i had to use H2O's Distributed random forests for the Training of the dataset. The main Problem is the dataframe is not efficiently converted when i do h2o.H2OFrame(data) . I checked for the possibility for providing

Difference between random forest implementation

ぃ、小莉子 提交于 2019-12-10 12:21:20
问题 Is there a performance difference between the implementation of Random Forest in H2O and standard Random Forest library? Has anybody performed or done some analysis for these two implementations. 回答1: Here is an open benchmark you can start with. https://github.com/szilard/benchm-ml 回答2: I suppose you are looking for this: http://www.wise.io/tech/benchmarking-random-forest-part-1 来源: https://stackoverflow.com/questions/45190787/difference-between-random-forest-implementation

H2O - balance classes - cross validation

岁酱吖の 提交于 2019-12-10 11:34:18
问题 I would like to build a GBM model with H2O. My data set is imbalanced, so I am using the balance_classes parameter. For grid search (parameter tuning) I would like to use 5-fold cross validation. I am wondering how H2O deals with class balancing in that case. Will only the training folds be rebalanced? I want to be sure the test-fold is not rebalanced. Thank you. 回答1: In class imbalance settings, artificially balancing the test/validation set does not make any sense: these sets must remain

Is H2O target mean encoding available in Python?

 ̄綄美尐妖づ 提交于 2019-12-10 10:23:55
问题 I noticed H2O has released the target mean encoding http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/target-encoding.html It only comes with an R code example. Does anyone have a Python example? 回答1: Like this: from h2o.targetencoder import TargetEncoder # Fit target encoding on training data targetEncoder = TargetEncoder(x= ["addr_state", "purpose"], y = "bad_loan", fold_column = "cv_fold_te") targetEncoder.fit(ext_train) But this requires version at least 3.22 Here is a link to an

how to save/load a trained model in H2o?

无人久伴 提交于 2019-12-10 02:58:13
问题 The user tutorial says Navigate to Data > View All Choose to filter by the model key Hit Save Model Input for path: /data/h2o-training/... Hit Submit The problem is that I do not have this menu (H2o, 3.0.0.26, web interface) 回答1: I am, unfortunately, not familiar with the web interface but I can offer a workaround involving H2O in R. The functions h2o.saveModel(object, dir = "", name = "", filename = "", force = FALSE) and h2o.loadModel(path, conn = h2o.getConnection()) Should offer what you

Why connection is terminating

做~自己de王妃 提交于 2019-12-09 18:13:39
问题 I'm trying a random forest classification model by using H2O library inside R on a training set having 70 million rows and 25 numeric features.The total file size is 5.6 GB. The validation file's size is 1 GB. I have 16 GB RAM and 8 core CPU on my system. The system successfully able to read both of the files in H2O object. Then I'm giving below command to build the model: model <- h2o.randomForest(x = c(1:18,20:25), y = 19, training_frame = traindata, validation_frame = testdata, ntrees =