h2o | 易学教程

alternative to `as.h2o()` for small data?

阅读更多关于 alternative to `as.h2o()` for small data?

问题 I have the opposite issue to most people with as.h2o() , though the resulting problem is the same. I have to convert and feed a series of single row vectors just 19 columns wide to an h2o autoencoder. Each vector takes 0.29 seconds approx to convert using as.h2o() , which is causing a major bottleneck. Can anyone suggest an alternative approach that might be faster? (For various reasons I have no alternative to sending single row vectors one by one, so aggregating the data in matrices before

Error with H20Context running PySparkling with Spark 2.1

阅读更多关于 Error with H20Context running PySparkling with Spark 2.1

问题 I'm getting this error when trying to run a Pysparkling script on an AWS EMR cluster. I can get everything to work when downloading Sparkling water 2.1.8 and running it from a pysparkling shell. However, spark-submit does not seem to work. Error: NameError: name 'H2OContext' is not defined My spark-submit: spark-submit --packages ai.h2o:sparkling-water-core_2.11:2.1.7,ai.h2o:sparkling-water-examples_2.11:2.1.7 --conf spark.dynamicAllocation.enabled=false spark.py Python file from pysparkling

h2o deep learning weights and normalization

阅读更多关于 h2o deep learning weights and normalization

问题 I'm exploring h2o via the R interface and I'm getting a weird weight matrix. My task is as simple as they get: given x,y compute x+y. I have 214 rows with 3 columns. The first column(x) was drawn uniformly from (-1000, 1000) and the second one(y) from (-100,100). I just want to combine them so I have a single hidden layer with a single neuron. This is my code: library(h2o) localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE) train <- h2o.importFile(path = "/home/martin

`h2o.cbind` accepts only of H2OFrame objects - R

阅读更多关于 `h2o.cbind` accepts only of H2OFrame objects - R

问题 I'm trying to ensemble a random forest with logistic regresion with H2O in R. However, an error messages appears in the following code: > localH2O = h2o.init() Successfully connected to http://137.0.0.1:43329/ R is connected to the H2O cluster: H2O cluster uptime: 3 hours 11 minutes H2O cluster version: 3.2.0.3 H2O cluster name: H2O_started_from_R_toshiba_jvd559 H2O cluster total nodes: 1 H2O cluster total memory: 0.97 GB H2O cluster total cores: 4 H2O cluster allowed cores: 2 H2O cluster

Avoiding overfitting with H2OGradientBoostingEstimator

阅读更多关于 Avoiding overfitting with H2OGradientBoostingEstimator

问题 It appears that the difference between cross-validation and training AUC ROC with H2OGradientBoostingEstimator remains high despite my best attempts using min_split_improvement. Using the same data with GradientBoostingClassifier(min_samples_split=10) results in no overfitting, but I can find no analogue of min_samples_split. Prepare Data from sklearn.datasets import make_classification X, y = make_classification(n_samples=10000, n_features=40, n_clusters_per_class=10, n_informative=25,

R - H2O- How can I get my trained model predictions/probabilities?

阅读更多关于 R - H2O- How can I get my trained model predictions/probabilities?

问题 I am running a classification model in H2O R. I would like to extract fitted model predictions for my training dataset. Code: train <- as.h2o(train) test <- as.h2o(test) y <- "class" x <- setdiff(names(train), y) family <- "multinomial" nfolds <- 5 gbm1 <- h2o.gbm(x = x, y = y, distribution = family, training_frame = train, seed = 1, nfolds = nfolds, fold_assignment = "Modulo", keep_cross_validation_predictions = TRUE) h2o.getFrame(gbm1@model$cross_validation_predictions[[gbm1@allparameters

Should I need to normalize (or scale) the data for Random forest (drf) or Gradient Boosting Machine (GBM) in H2O or in general? [closed]

阅读更多关于 Should I need to normalize (or scale) the data for Random forest (drf) or Gradient Boosting Machine (GBM) in H2O or in general? [closed]

问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed last year . I am creating a classification and regression models using Random forest (DRF) and GBM in H2O.ai. I believe that I don't need to normalize (or scale) the data as it's un-neccessary rather more harmful as it might smooth out the nonlinear nature of the model. Could you please confirm if my

How can I tell if H2O 3.11.0.266 is running with GPUs?

阅读更多关于 How can I tell if H2O 3.11.0.266 is running with GPUs?

问题 I've installed H2O 3.11.0.266 on a Ubuntu 16.04 with CUDA 8.0 and libcudnn.so.5.1.10 so I believe H2O should be able to find my GPUs. However, when I start up my h2o.init() in Python, I do not see evidence that it is actually using my GPUs. I see: H2O cluster total cores: 8 H2O cluster allowed cores: 8 which is the same as I had in the previous version (pre GPU). Also, http://127.0.0.1:54321/flow/index.html shows only 8 cores as well. I wonder if I don't have something properly installed or

Training model with multiple features who's values are conceptually the same

阅读更多关于 Training model with multiple features who's values are conceptually the same

问题 For example, say I am trying to train a binary classifier that takes sample inputs of the form x = {d=(type of desk), p1=(type of pen on desk), p2=(type of *another* pen on desk)} Say I then train a model on the samples: x1 = {wood, ballpoint, gel}, y1 = {0} x2 = {wood, ballpoint, ink-well}, y2 = {1}. and try to predict on the new sample: x3 = {wood, gel, ballpoint} . The response that I am hoping for in this case is y3 = {0} , since conceptually it should not matter (ie. I don't want it to

How to deploy distributed h2o flow cluster with docker?

阅读更多关于 How to deploy distributed h2o flow cluster with docker?

问题 I'm able to deploy a h2o cluster with ec2 instances and having the private ip in the flatfile. Doing the same with docker works but I can't figure out what to enter into the flatfile so they can create the cluster. Private IP the container is running on is not working 回答1: Can the containers ping each others ip? When launching h2o are you forcing the interface to use the container ip? java -jar h2o.jar -flatfile flatfile -ip -port Are these docker containers when run exposing the port 54321