h2o | 易学教程

H2OGeneralizedLinearEstimator() - Prediction Error

阅读更多关于 H2OGeneralizedLinearEstimator() - Prediction Error

问题 I am trying to predict test times in a Kaggle comp using the H2OGeneralizedLinearEstimator function. The model trains normally in line 3 and the metrics are all reasonable. However when I come to the predict step I get an error despite the test data frame matching the train data frame. Has anyone seen this error before? h2o_glm = H2OGeneralizedLinearEstimator() h2o_glm.train(training_frame=train_h2o,y='y') h2o_glm_predictions = h2o_glm.predict(test_data=test_h2o).as_data_frame() test_pred =

Converting R dataframe to H2O Frame without writing to disk

阅读更多关于 Converting R dataframe to H2O Frame without writing to disk

问题 I know the as.h2o function from h2o library converts an R data.frame to an H2O frame. Two questions: Does as.h2o() write data to disk during conversion? How long is this data stored? Are there other options that avoids the temp step of writing to disk? 回答1: The exact path of running as.h2o on a data.frame, df : path <- write.csv(df) h2o.upload(path) remove.file(path) We temporarily write to disk the data.frame and then subsequently upload rather than import the file into H2O and as soon as

Saving H2o data frame

阅读更多关于 Saving H2o data frame

问题 I am working with 10GB training data frame. I use H2o library for faster computation. Each time I load the dataset, I should convert the data frame into H2o object which is taking so much time. Is there a way to store the converted H2o object ? (so that i can skip the as.H2o(trainingset) step each time I make trails on building models ) 回答1: After the first transformation with as.h2o(trainingset) you can export / save the file to disk and later import it again. my_h2o_training_file <- as.h2o

h2o model not fit in driver node's memory error

阅读更多关于 h2o model not fit in driver node's memory error

问题 I ran GBM model through R code in H2O and got below error. The same code was running fine a couple of weeks. Wondering if this is H2O side error Or configuration on the user system? water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for GBM model: gbm-2017-04-18-15-29-53. Details: ERRR on field: _ntrees: The tree model will not fit in the driver node's memory (23.2 MB per tree x 1000 > 3.32 GB) - try decreasing ntrees and/or max_depth or increasing min_rows! 回答1:

conversion of pandas dataframe to h2o frame efficiently

阅读更多关于 conversion of pandas dataframe to h2o frame efficiently

问题 I have a Pandas dataframe which has Encoding: latin-1 and is delimited by ; . The dataframe is very large almost of size: 350000 x 3800 . I wanted to use sklearn initially but my dataframe has missing values ( NAN values ) so i could not use sklearn's random forests or GBM. So i had to use H2O's Distributed random forests for the Training of the dataset. The main Problem is the dataframe is not efficiently converted when i do h2o.H2OFrame(data) . I checked for the possibility for providing

Difference between random forest implementation

阅读更多关于 Difference between random forest implementation

问题 Is there a performance difference between the implementation of Random Forest in H2O and standard Random Forest library? Has anybody performed or done some analysis for these two implementations. 回答1: Here is an open benchmark you can start with. https://github.com/szilard/benchm-ml 回答2: I suppose you are looking for this: http://www.wise.io/tech/benchmarking-random-forest-part-1 来源： https://stackoverflow.com/questions/45190787/difference-between-random-forest-implementation

H2O - balance classes - cross validation

阅读更多关于 H2O - balance classes - cross validation

问题 I would like to build a GBM model with H2O. My data set is imbalanced, so I am using the balance_classes parameter. For grid search (parameter tuning) I would like to use 5-fold cross validation. I am wondering how H2O deals with class balancing in that case. Will only the training folds be rebalanced? I want to be sure the test-fold is not rebalanced. Thank you. 回答1: In class imbalance settings, artificially balancing the test/validation set does not make any sense: these sets must remain

Is H2O target mean encoding available in Python?

阅读更多关于 Is H2O target mean encoding available in Python?

问题 I noticed H2O has released the target mean encoding http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/target-encoding.html It only comes with an R code example. Does anyone have a Python example? 回答1: Like this: from h2o.targetencoder import TargetEncoder # Fit target encoding on training data targetEncoder = TargetEncoder(x= ["addr_state", "purpose"], y = "bad_loan", fold_column = "cv_fold_te") targetEncoder.fit(ext_train) But this requires version at least 3.22 Here is a link to an

how to save/load a trained model in H2o?

阅读更多关于 how to save/load a trained model in H2o?

问题 The user tutorial says Navigate to Data > View All Choose to filter by the model key Hit Save Model Input for path: /data/h2o-training/... Hit Submit The problem is that I do not have this menu (H2o, 3.0.0.26, web interface) 回答1: I am, unfortunately, not familiar with the web interface but I can offer a workaround involving H2O in R. The functions h2o.saveModel(object, dir = "", name = "", filename = "", force = FALSE) and h2o.loadModel(path, conn = h2o.getConnection()) Should offer what you

Why connection is terminating

阅读更多关于 Why connection is terminating

问题 I'm trying a random forest classification model by using H2O library inside R on a training set having 70 million rows and 25 numeric features.The total file size is 5.6 GB. The validation file's size is 1 GB. I have 16 GB RAM and 8 core CPU on my system. The system successfully able to read both of the files in H2O object. Then I'm giving below command to build the model: model <- h2o.randomForest(x = c(1:18,20:25), y = 19, training_frame = traindata, validation_frame = testdata, ntrees =