h2o

Attribute selection in h2o

孤街醉人 提交于 2019-12-07 22:28:51
问题 I am very beginner in h2o and I want to know if there is any attribute selection capabilities in h2o framework so to be applied in h2oframes? 回答1: No there are not currently feature selection functions in H2O -- my advice would be to use Lasso regression (in H2O this means use GLM with alpha = 1.0 ) to do the feature selection, or simply allow whatever machine learning algorithm (e.g. GBM) you are planning to use to use all the features (they'll tend to ignore the bad ones, but it could still

How can I tell h2o deep learning grid to have AUC instead of residual deviance

纵然是瞬间 提交于 2019-12-07 17:15:41
问题 I would like to measure models performance by looking for AUC or Accuracy. In the grid search I get results with residual deviance ,how can I tell h2o deep learning grid to have AUC instead of residual deviance and present the results as atable like the one attached below ? train <- read.table(text = "target birds wolfs snakes 0 9 7 a 0 8 4 b 1 2 8 c 1 2 3 a 1 8 3 a 0 1 2 a 0 7 1 b 0 1 5 c 1 9 7 c 1 8 7 c 0 2 7 b 1 2 3 b 1 6 3 c 0 1 1 a 0 3 9 a 1 1 1 b ",header = TRUE) trainHex <- as.h2o

something similar to permutation accuracy importance in h2o package

坚强是说给别人听的谎言 提交于 2019-12-07 13:35:34
问题 I fitted a random forest for my multinomial target with the randomForest package in R. Looking for the variable importance I found out permutation accuracy importance which is what I was looking for my analysis. I fitted a random forest with the h2o package too, but the only measures it shows me are relative_importance, scaled_importance, percentage . My question is: can I extract a measure that shows me the level of the target which better classify the variable i want to take in exam?

XGBoost - H2O crashed due to an illegal memory access

此生再无相见时 提交于 2019-12-07 13:31:43
问题 H2O process crashed when doing a Grid Search with XGBoost: terminate called after throwing an instance of 'thrust::system::system_error' what(): /tmp/xgboost/plugin/updater_gpu/src/device_helpers.cuh(387): an illegal memory access was encountered After giving the INFO message below: 08-17 06:44:46.672 10.0.1.89:54321 14426 FJ-1-3 INFO: Checking convergence with logloss metric: 0.04519170911104479 --> 0.02811784326194906 (still improving) . 08-17 06:44:46.672 10.0.1.89:54321 14426 FJ-1-3 INFO:

How to tune hidden_dropout_ratios in h2o.grid in R

三世轮回 提交于 2019-12-07 11:37:08
问题 I want to tune a neural network with dropout using h2o in R. Here I provide a reproducible example for the iris dataset. I'm avoiding to tune eta and epsiplon (i.e. ADADELTA hyper-parameters) with the only purpose of making computations faster. require(h2o) h2o.init() data(iris) iris = iris[sample(1:nrow(iris)), ] irisTrain = as.h2o(iris[1:90, ]) irisValid = as.h2o(iris[91:120, ]) irisTest = as.h2o(iris[121:150, ]) hyper_params <- list( input_dropout_ratio = list(0, 0.15, 0.3), hidden_dropout

How do know how many deep learning epochs were done, from R?

丶灬走出姿态 提交于 2019-12-07 07:30:52
问题 Early stopping is turned on by default for h2o.deeplearning() . But, from R, how do I find out if it did stop early, and how many epochs it did? I've tried this: model = h2o.deeplearning(...) print(model) which tells me information on the layers, the MSE, R2, etc. but nothing about how many epochs were run. Over on Flow I can see the information (e.g. where the x-axis stops in the "Scoring History - Deviance" chart, or in the Scoring History table). 回答1: If your model is called m , then to

Is there a supported way to get list of features used by a H2O model during its training?

半城伤御伤魂 提交于 2019-12-06 18:16:25
This is my situation. I have over 400 features, many of which are probably useless and often zero. I would like to be able to: train an model with a subset of those features query that model for the features actually used to build that model build a H2OFrame containing just those features (I get a sparse list of non-zero values for each row I want to predict.) pass this newly constructed frame to H2OModel.predict() to get a prediction I am pretty sure what found is unsupported but works for now (v 3.13.0.341). Is there a more robust/supported way of doing this? model._model_json['output'][

H2O - balance classes - cross validation

醉酒当歌 提交于 2019-12-06 13:47:25
I would like to build a GBM model with H2O. My data set is imbalanced, so I am using the balance_classes parameter. For grid search (parameter tuning) I would like to use 5-fold cross validation. I am wondering how H2O deals with class balancing in that case. Will only the training folds be rebalanced? I want to be sure the test-fold is not rebalanced. Thank you. In class imbalance settings, artificially balancing the test/validation set does not make any sense: these sets must remain realistic , i.e. you want to test your classifier performance in the real world setting, where, say, the

unable to init h2o. can somebody help me with it

心已入冬 提交于 2019-12-06 13:14:29
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found. Attempting to start a local H2O server... Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode) Starting server from C:\Users\Ramakanth\Anaconda2\lib\site-packages\h2o\backend\bin\h2o.jar Ice root: c:\users\ramaka~1\appdata\local\temp\tmpeaff8n JVM stdout: c:\users\ramaka~1\appdata\local\temp\tmpeaff8n\h2o_Ramakanth_started_from_python.out JVM stderr: c:\users\ramaka~1\appdata\local\temp\tmpeaff8n\h2o_Ramakanth_started_from_python.err Traceback (most recent call last): File "", line 1, in File

Does or will H2O provide any pretrained vectors for use with h2o word2vec?

夙愿已清 提交于 2019-12-06 08:24:07
问题 H2O recently added word2vec in its API. It is great to be able to easily train your own word vectors on a corpus you provide yourself. However even greater possibilities exist from using big data and big computers, of the type that software vendors like Google or H2O.ai, but not so many end-users of H2O, may have access to, due to network bandwidth and compute power limitations. Word embeddings can be seen as a type of unsupervised learning. As such, great value can be had in a data science