h2o

alternative to `as.h2o()` for small data?

这一生的挚爱 提交于 2019-12-11 14:53:31
问题 I have the opposite issue to most people with as.h2o() , though the resulting problem is the same. I have to convert and feed a series of single row vectors just 19 columns wide to an h2o autoencoder. Each vector takes 0.29 seconds approx to convert using as.h2o() , which is causing a major bottleneck. Can anyone suggest an alternative approach that might be faster? (For various reasons I have no alternative to sending single row vectors one by one, so aggregating the data in matrices before

Error with H20Context running PySparkling with Spark 2.1

岁酱吖の 提交于 2019-12-11 14:51:50
问题 I'm getting this error when trying to run a Pysparkling script on an AWS EMR cluster. I can get everything to work when downloading Sparkling water 2.1.8 and running it from a pysparkling shell. However, spark-submit does not seem to work. Error: NameError: name 'H2OContext' is not defined My spark-submit: spark-submit --packages ai.h2o:sparkling-water-core_2.11:2.1.7,ai.h2o:sparkling-water-examples_2.11:2.1.7 --conf spark.dynamicAllocation.enabled=false spark.py Python file from pysparkling

h2o deep learning weights and normalization

蓝咒 提交于 2019-12-11 14:28:12
问题 I'm exploring h2o via the R interface and I'm getting a weird weight matrix. My task is as simple as they get: given x,y compute x+y. I have 214 rows with 3 columns. The first column(x) was drawn uniformly from (-1000, 1000) and the second one(y) from (-100,100). I just want to combine them so I have a single hidden layer with a single neuron. This is my code: library(h2o) localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE) train <- h2o.importFile(path = "/home/martin

`h2o.cbind` accepts only of H2OFrame objects - R

左心房为你撑大大i 提交于 2019-12-11 13:15:52
问题 I'm trying to ensemble a random forest with logistic regresion with H2O in R. However, an error messages appears in the following code: > localH2O = h2o.init() Successfully connected to http://137.0.0.1:43329/ R is connected to the H2O cluster: H2O cluster uptime: 3 hours 11 minutes H2O cluster version: 3.2.0.3 H2O cluster name: H2O_started_from_R_toshiba_jvd559 H2O cluster total nodes: 1 H2O cluster total memory: 0.97 GB H2O cluster total cores: 4 H2O cluster allowed cores: 2 H2O cluster

Avoiding overfitting with H2OGradientBoostingEstimator

旧巷老猫 提交于 2019-12-11 07:52:57
问题 It appears that the difference between cross-validation and training AUC ROC with H2OGradientBoostingEstimator remains high despite my best attempts using min_split_improvement. Using the same data with GradientBoostingClassifier(min_samples_split=10) results in no overfitting, but I can find no analogue of min_samples_split. Prepare Data from sklearn.datasets import make_classification X, y = make_classification(n_samples=10000, n_features=40, n_clusters_per_class=10, n_informative=25,

R - H2O- How can I get my trained model predictions/probabilities?

ぃ、小莉子 提交于 2019-12-11 06:36:29
问题 I am running a classification model in H2O R. I would like to extract fitted model predictions for my training dataset. Code: train <- as.h2o(train) test <- as.h2o(test) y <- "class" x <- setdiff(names(train), y) family <- "multinomial" nfolds <- 5 gbm1 <- h2o.gbm(x = x, y = y, distribution = family, training_frame = train, seed = 1, nfolds = nfolds, fold_assignment = "Modulo", keep_cross_validation_predictions = TRUE) h2o.getFrame(gbm1@model$cross_validation_predictions[[gbm1@allparameters

Should I need to normalize (or scale) the data for Random forest (drf) or Gradient Boosting Machine (GBM) in H2O or in general? [closed]

蓝咒 提交于 2019-12-11 05:39:30
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed last year . I am creating a classification and regression models using Random forest (DRF) and GBM in H2O.ai. I believe that I don't need to normalize (or scale) the data as it's un-neccessary rather more harmful as it might smooth out the nonlinear nature of the model. Could you please confirm if my

How can I tell if H2O 3.11.0.266 is running with GPUs?

半世苍凉 提交于 2019-12-11 05:37:59
问题 I've installed H2O 3.11.0.266 on a Ubuntu 16.04 with CUDA 8.0 and libcudnn.so.5.1.10 so I believe H2O should be able to find my GPUs. However, when I start up my h2o.init() in Python, I do not see evidence that it is actually using my GPUs. I see: H2O cluster total cores: 8 H2O cluster allowed cores: 8 which is the same as I had in the previous version (pre GPU). Also, http://127.0.0.1:54321/flow/index.html shows only 8 cores as well. I wonder if I don't have something properly installed or

Training model with multiple features who's values are conceptually the same

落花浮王杯 提交于 2019-12-11 05:29:39
问题 For example, say I am trying to train a binary classifier that takes sample inputs of the form x = {d=(type of desk), p1=(type of pen on desk), p2=(type of *another* pen on desk)} Say I then train a model on the samples: x1 = {wood, ballpoint, gel}, y1 = {0} x2 = {wood, ballpoint, ink-well}, y2 = {1}. and try to predict on the new sample: x3 = {wood, gel, ballpoint} . The response that I am hoping for in this case is y3 = {0} , since conceptually it should not matter (ie. I don't want it to

How to deploy distributed h2o flow cluster with docker?

南楼画角 提交于 2019-12-11 05:08:46
问题 I'm able to deploy a h2o cluster with ec2 instances and having the private ip in the flatfile. Doing the same with docker works but I can't figure out what to enter into the flatfile so they can create the cluster. Private IP the container is running on is not working 回答1: Can the containers ping each others ip? When launching h2o are you forcing the interface to use the container ip? java -jar h2o.jar -flatfile flatfile -ip -port Are these docker containers when run exposing the port 54321