h2o

How to parametrize class and implement method depends on type in Scala

泄露秘密 提交于 2019-12-01 14:47:16
This is what I tried. Depends on what does user put into the function I want to add String or Double to new Chunk. package org.apache.spark.h2o.utils import water.fvec.{NewChunk, Frame, Chunk} import water._ import water.parser.ValueString class ReplaceNa[T >: Any](a: T) extends MRTask{ override def map(c: Chunk, nc: NewChunk): Unit = { for (row <- 0 until c.len()) { a match{ case s: ValueString if(c.isNA(row)) => nc.addStr(s) case d: Double if(c.isNA(row)) => nc.addNum(d) } } } } But I got error error: value outputFrame is not a member of Nothing pred.add(new ReplaceNa(3).doAll(1, pred.vec(4)

Improve h2o DRF runtime on a multi-node cluster

白昼怎懂夜的黑 提交于 2019-12-01 12:48:11
I am currently running h2o 's DRF algorithm an a 3-node EC2 cluster (the h2o server spans across all 3 nodes). My data set has 1m rows and 41 columns (40 predictors and 1 response). I use the R bindings to control the cluster and the RF call is as follows model=h2o.randomForest(x=x, y=y, ignore_const_cols=TRUE, training_frame=train_data, seed=1234, mtries=7, ntrees=2000, max_depth=15, min_rows=50, stopping_rounds=3, stopping_metric="MSE", stopping_tolerance=2e-5) For the 3-node cluster (c4.8xlarge, enhanced networking turned on), this takes about 240sec; the CPU utilization is between 10-20%;

h2o.glm lambda search not appearing to iterate over all lambdas

我的梦境 提交于 2019-12-01 06:22:57
Please consider the following basic reproducible example: library(h2o) h2o.init() data("iris") iris.hex = as.h2o(iris, "iris.hex") mod = h2o.glm(y = "Sepal.Length", x = setdiff(colnames(iris), "Sepal.Length"), training_frame = iris.hex, nfolds = 2, seed = 100, lambda_search = T, early_stopping = F, family = "gamma", nlambdas = 100) When I run the above, I expect that h2o will iterate over 100 different values of lambda. However, running length(mod@allparameters$lambda) will show that only 79 values of lambda were actually tested. These 79 values are the first 79 values in the sequence:

h2o.glm lambda search not appearing to iterate over all lambdas

雨燕双飞 提交于 2019-12-01 05:27:37
问题 Please consider the following basic reproducible example: library(h2o) h2o.init() data("iris") iris.hex = as.h2o(iris, "iris.hex") mod = h2o.glm(y = "Sepal.Length", x = setdiff(colnames(iris), "Sepal.Length"), training_frame = iris.hex, nfolds = 2, seed = 100, lambda_search = T, early_stopping = F, family = "gamma", nlambdas = 100) When I run the above, I expect that h2o will iterate over 100 different values of lambda. However, running length(mod@allparameters$lambda) will show that only 79

Implementing custom stopping metrics to optimize during training in H2O model directly from R

房东的猫 提交于 2019-11-30 15:07:47
I'm trying to implement the FBeta_Score() of the MLmetrics R package : FBeta_Score <- function(y_true, y_pred, positive = NULL, beta = 1) { Confusion_DF <- ConfusionDF(y_pred, y_true) if (is.null(positive) == TRUE) positive <- as.character(Confusion_DF[1,1]) Precision <- Precision(y_true, y_pred, positive) Recall <- Recall(y_true, y_pred, positive) Fbeta_Score <- (1 + beta^2) * (Precision * Recall) / (beta^2 * Precision + Recall) return(Fbeta_Score) } in the H2O distributed random forest model and I want to optimize it during the training phase using the custom_metric_func option. The help

MOTRONA FU252

蹲街弑〆低调 提交于 2019-11-30 12:52:58
AUTONICS E40S-6-3000-3-T-24 Autonics -E40S6-3000-3-T-24 VEM B21R 100LX 4MLEN TPM40/5083/3KW VEM B21R 180LA/22KW BUCHI V-700 pump jokab safty EDEN 20-046-00 Eva jokab safty EDEN 20-046-06 Adam M12 Jokab Safety 2TLA020046R0600 old: 20-046-06 Dr. Dietrich Muller GmbH Thermigrease TG 20032 HONEYWELL STG74L-E1G000-1-0-AHS-11S-A-10A0 Honeywell GmbH STG74L GAUGE IL LRL -14.7 URL 500 PSI STG74L-E1G000-1-0-AHD-11S-A-10A0-00-0000 HONEYWELL STD720-E1AC4AS-1-0-AHS-11S-A-10A0 Honeywell GmbHSTD720 DP LRL -400 URL 400 H2O STD720-E1AC4AS-1-0-AHD-11S-A-10A0-00-0000 HONEYWELL STG77L-E1G000-1-0-AHS-11S-A-10A0

R: Plot trees from h2o.randomForest() and h2o.gbm()

最后都变了- 提交于 2019-11-30 05:05:49
Looking for an efficient way to plot trees in rstudio, H2O's Flow or in local html page from h2o's RF and GBM models similar to the one in the image in link below. Specifically, how do you plot trees for the objects, (fitted models) rf1 and gbm2 produced by code below perhaps by parsing h2o.download_pojo(rf1) or h2o.download_pojo(gbm1)? # # The following two commands remove any previously installed H2O packages for R. # if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) } # if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") } # # Next, we download

R H2O - Memory management

懵懂的女人 提交于 2019-11-30 04:56:38
问题 I'm trying to use H2O via R to build multiple models using subsets of one large-ish data set (~ 10GB). The data is one years worth of data and I'm trying to build 51 models (ie train on week 1, predict on week 2, etc.) with each week being about 1.5-2.5 million rows with 8 variables. I've done this inside of a loop which I know is not always the best way in R. One other issue I found was that the H2O entity would accumulate prior objects, so I created a function to remove all of them except

How to get sparse matrices into H2O?

元气小坏坏 提交于 2019-11-29 13:59:49
I am trying to get a sparse matrix into H2O and I was wondering whether that was possible. Suppose we have the following: test <- Matrix(c(1,0,0,1,1,1,1,0,1), nrow = 3, sparse = TRUE) and assuming my local H2O is localH2O , I can't seem to do the following: as.h2o(test) It gives the error: cannot coerce class "structure("dgCMatrix", package = "Matrix")" to a data.frame . That seems to be pretty logical, however assuming that test is so big that I can't transform it into a dataframe, how am I suppose to load this into H2O? Using a sparse matrix representation it is only 500MB or so. How can I

R: Plot trees from h2o.randomForest() and h2o.gbm()

此生再无相见时 提交于 2019-11-29 02:54:29
问题 Looking for an efficient way to plot trees in rstudio, H2O's Flow or in local html page from h2o's RF and GBM models similar to the one in the image in link below. Specifically, how do you plot trees for the objects, (fitted models) rf1 and gbm2 produced by code below perhaps by parsing h2o.download_pojo(rf1) or h2o.download_pojo(gbm1)? # # The following two commands remove any previously installed H2O packages for R. # if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) } #