h2o

Error with H2O in R - can't connect to local host

随声附和 提交于 2019-11-28 12:11:06
I can't get the h2o to work in my R. It shows the following error. Have no clue what it means. Previously it gave me an error because I didn't have Java 64 bit version. I downloaded the 64bit - restarted my pc - and started the process again and now it gives me this error. Any suggestions? library(h2o) ---------------------------------------------------------------------- Your next step is to start H2O: > h2o.init() For H2O package documentation, ask for help: > ??h2o After starting H2O, you can use the Web UI at http://localhost:54321 For more information visit http://docs.h2o.ai ------------

How to get sparse matrices into H2O?

我的未来我决定 提交于 2019-11-28 07:39:34
问题 I am trying to get a sparse matrix into H2O and I was wondering whether that was possible. Suppose we have the following: test <- Matrix(c(1,0,0,1,1,1,1,0,1), nrow = 3, sparse = TRUE) and assuming my local H2O is localH2O , I can't seem to do the following: as.h2o(test) It gives the error: cannot coerce class "structure("dgCMatrix", package = "Matrix")" to a data.frame . That seems to be pretty logical, however assuming that test is so big that I can't transform it into a dataframe, how am I

H2o model performance metric and gains chart customization

吃可爱长大的小学妹 提交于 2019-11-28 05:41:01
问题 I see h2o model performance metric contains AUC, logloss etc. There is one model performance metric called lift_top_group, is it lift on top decile? Also can user specify the band for h2o to output gains chart such as top 5%, 5%-10%, 10%-15% ....... The function I can find is h2o.gainsLift 回答1: You can see all quantile groups in the output (Flow has a nice display). The top group is the top 1%, and lift_top_group refers to that. It can be used for early stopping. All other information from

Loading data bigger than the memory size in h2o

本小妞迷上赌 提交于 2019-11-28 00:39:45
问题 I am experimenting with loading data bigger than the memory size in h2o. H2o blog mentions: A note on Bigger Data and GC: We do a user-mode swap-to-disk when the Java heap gets too full, i.e., you’re using more Big Data than physical DRAM. We won’t die with a GC death-spiral, but we will degrade to out-of-core speeds. We’ll go as fast as the disk will allow. I’ve personally tested loading a 12Gb dataset into a 2Gb (32bit) JVM; it took about 5 minutes to load the data, and another 5 minutes to

Fastest way to read in 100,000 .dat.gz files

≡放荡痞女 提交于 2019-11-27 20:45:38
I have a few hundred thousand very small .dat.gz files that I want to read into R in the most efficient way possible. I read in the file and then immediately aggregate and discard the data, so I am not worried about managing memory as I get near the end of the process. I just really want to speed up the bottleneck, which happens to be unzipping and reading in the data. Each dataset consists of 366 rows and 17 columns. Here is a reproducible example of what I am doing so far: Building reproducible data: require(data.table) # Make dir system("mkdir practice") # Function to create data create

Predict classes or class probabilities?

天大地大妈咪最大 提交于 2019-11-27 16:04:09
I am currently using H2O for a classification problem dataset. I am testing it out with H2ORandomForestEstimator in a python 3.6 environment. I noticed the results of the predict method was giving values between 0 to 1(I am assuming this is the probability). In my data set, the target attribute is numeric i.e. True values are 1 and False values are 0. I made sure I converted the type to category for the target attribute, I was still getting the same result. Then I modified to the code to convert the target column to factor using asfactor() method on the H2OFrame still, there wasn't any change

Error with H2O in R - can't connect to local host

回眸只為那壹抹淺笑 提交于 2019-11-27 06:51:43
问题 I can't get the h2o to work in my R. It shows the following error. Have no clue what it means. Previously it gave me an error because I didn't have Java 64 bit version. I downloaded the 64bit - restarted my pc - and started the process again and now it gives me this error. Any suggestions? library(h2o) ---------------------------------------------------------------------- Your next step is to start H2O: > h2o.init() For H2O package documentation, ask for help: > ??h2o After starting H2O, you

Print “pretty” tables for h2o models in R

五迷三道 提交于 2019-11-27 04:34:25
There are multiple packages for R which help to print "pretty" tables (LaTeX/HTML/TEXT) from statistical models output AND to easily compare the results of alternative model specifications. Some of these packages are apsrtable , xtable , memisc , texreg , outreg , and stargazer (for examples see here: https://www.r-statistics.com/2013/01/stargazer-package-for-beautiful-latex-tables-from-r-statistical-models-output/ ). Is there any comparable R package that does support the models of the h2o package? Here is an example of two simple GLM models with h2o which I like to print beside each other as

Predict classes or class probabilities?

99封情书 提交于 2019-11-26 22:25:28
问题 I am currently using H2O for a classification problem dataset. I am testing it out with H2ORandomForestEstimator in a python 3.6 environment. I noticed the results of the predict method was giving values between 0 to 1(I am assuming this is the probability). In my data set, the target attribute is numeric i.e. True values are 1 and False values are 0. I made sure I converted the type to category for the target attribute, I was still getting the same result. Then I modified to the code to

Fastest way to read in 100,000 .dat.gz files

大憨熊 提交于 2019-11-26 20:26:26
问题 I have a few hundred thousand very small .dat.gz files that I want to read into R in the most efficient way possible. I read in the file and then immediately aggregate and discard the data, so I am not worried about managing memory as I get near the end of the process. I just really want to speed up the bottleneck, which happens to be unzipping and reading in the data. Each dataset consists of 366 rows and 17 columns. Here is a reproducible example of what I am doing so far: Building