h2o | 易学教程

How to get data into h2o fast

阅读更多关于 How to get data into h2o fast

What my question isnt: Efficient way to maintain a h2o data frame H2O running slower than data.table R Loading data bigger than the memory size in h2o Hardware/Space: 32 Xeon threads w/ ~256 GB Ram ~65 GB of data to upload. (about 5.6 billion cells) Problem: It is taking hours to upload my data into h2o. This isn't any special processing, only "as.h2o(...)". It takes less than a minute using "fread" to get the text into the space and then I make a few row/col transformations (diff's, lags) and try to import. The total R memory is ~56GB before trying any sort of "as.h2o" so the 128 allocated

Error with h2o.predict in R

阅读更多关于 Error with h2o.predict in R

问题 I am getting an error when trying to create deep learning predictions with h2o in R. The error occurs for about one third of predictions with the command h2o.predict. Here is the model setup: localH2O = h2o.init(ip = "localhost", port = 54321, startH2O = TRUE,max_mem_size='20g',nthreads=6) model <- h2o.deeplearning(x = 2:100, y = 1, training_frame = x, l1 = 1e-5, l2 = 1e-5, epochs=500, hidden = c(800,800,100)) prediction <- h2o.predict(model, x[,2:100]) Here is the error that occurs on and

Wrong Euclidean distance H2O calculations R

阅读更多关于 Wrong Euclidean distance H2O calculations R

问题 I am using H2O with R to calculate the euclidean distance between 2 data.frames: set.seed(121) #create the data df1<-data.frame(matrix(rnorm(1000),ncol=10)) df2<-data.frame(matrix(rnorm(300),ncol=10)) #init h2o h2o.init() #transform to h2o df1.h<-as.h2o(df1) df2.h<-as.h2o(df2) if I use normal calculations, i.e. the first row: distance1<-sqrt(sum((df1[1,]-df2[1,])^2)) And If I use the H2O library: distance.h2o<-h2o.distance(df1.h[1,],df2.h[1,],"l2") print(distance1) print(distance.h2o) The

How to parametrize class and implement method depends on type in Scala

阅读更多关于 How to parametrize class and implement method depends on type in Scala

问题 This is what I tried. Depends on what does user put into the function I want to add String or Double to new Chunk. package org.apache.spark.h2o.utils import water.fvec.{NewChunk, Frame, Chunk} import water._ import water.parser.ValueString class ReplaceNa[T >: Any](a: T) extends MRTask{ override def map(c: Chunk, nc: NewChunk): Unit = { for (row <- 0 until c.len()) { a match{ case s: ValueString if(c.isNA(row)) => nc.addStr(s) case d: Double if(c.isNA(row)) => nc.addNum(d) } } } } But I got

Parallel processing in R with H2O

阅读更多关于 Parallel processing in R with H2O

I am setting up a piece of code to parallel processes some computations for N groups in my data using foreach . I have a computation that involves a call to h2o.gbm . In my current, sequential set-up, I use up to about 70% of my RAM. How do I correctly set-up my h2o.init() within the parallel piece of code? I am afraid that I might run out of RAM when I use multiple cores. My Windows 10 machine has 12 cores and 128GB of RAM. Would something like this pseudo-code work? library(foreach) library(doParallel) #setup parallel backend to use 12 processors cl<-makeCluster(12) registerDoParallel(cl)

How to understand the metrics of H2OModelMetrics Object through h2o.performance

阅读更多关于 How to understand the metrics of H2OModelMetrics Object through h2o.performance

问题 After creating the model using h2o.randomForest , then using: perf <- h2o.performance(model, test) print(perf) I get the following information (value H2OModelMetrics object) H2OBinomialMetrics: drf MSE: 0.1353948 RMSE: 0.3679604 LogLoss: 0.4639761 Mean Per-Class Error: 0.3733908 AUC: 0.6681437 Gini: 0.3362873 Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold: 0 1 Error Rate 0 2109 1008 0.323388 =1008/3117 1 257 350 0.423394 =257/607 Totals 2366 1358 0.339689

Can I use autoencoder for clustering?

阅读更多关于 Can I use autoencoder for clustering?

In the below code, they use autoencoder as supervised clustering or classification because they have data labels. http://amunategui.github.io/anomaly-detection-h2o/ But, can I use autoencoder to cluster data if I did not have its labels.? Regards The deep-learning autoencoder is always unsupervised learning. The "supervised" part of the article you link to is to evaluate how well it did. The following example (taken from ch.7 of my book, Practical Machine Learning with H2O, where I try all the H2O unsupervised algorithms on the same data set - please excuse the plug) takes 563 features, and

OSError: Version mismatch while installing h2o?

阅读更多关于 OSError: Version mismatch while installing h2o?

问题 I am new with H2o. Based in the documentation I installed H2o for python $ pip install h2o Then: In: import h2o h2o.init() Out: OSError Traceback (most recent call last) <ipython-input-1-07f8bb8f27db> in <module>() 1 import h2o ----> 2 h2o.init() /usr/local/lib/python3.5/site-packages/h2o/h2o.py in init(ip, port, start_h2o, enable_assertions, license, nthreads, max_mem_size, min_mem_size, ice_root, strict_version_check, proxy, https, insecure, username, password, max_mem_size_GB, min_mem_size

ROC on multiple test sets in h2o (python)

阅读更多关于 ROC on multiple test sets in h2o (python)

问题 I had a use-case that I thought was really simple but couldn't find a way to do it with h2o. I thought you might know. I want to train my model once, and then evaluate its ROC on a few different test sets (e.g. a validation set and a test set, though in reality I have more than 2) without having to retrain the model. The way I know to do it now requires retraining the model each time: train, valid, test = fr.split_frame([0.2, 0.25], seed=1234) rf_v1 = H2ORandomForestEstimator( ... ) rf_v1

OSError: Version mismatch while installing h2o?

阅读更多关于 OSError: Version mismatch while installing h2o?

I am new with H2o. Based in the documentation I installed H2o for python $ pip install h2o Then: In: import h2o h2o.init() Out: OSError Traceback (most recent call last) <ipython-input-1-07f8bb8f27db> in <module>() 1 import h2o ----> 2 h2o.init() /usr/local/lib/python3.5/site-packages/h2o/h2o.py in init(ip, port, start_h2o, enable_assertions, license, nthreads, max_mem_size, min_mem_size, ice_root, strict_version_check, proxy, https, insecure, username, password, max_mem_size_GB, min_mem_size_GB, proxies, size) 849 nthreads=nthreads,max_mem_size=max_mem_size,min_mem_size=min_mem_size,ice_root