h2o | 易学教程

Error with H2O in R - can't connect to local host

阅读更多关于 Error with H2O in R - can't connect to local host

I can't get the h2o to work in my R. It shows the following error. Have no clue what it means. Previously it gave me an error because I didn't have Java 64 bit version. I downloaded the 64bit - restarted my pc - and started the process again and now it gives me this error. Any suggestions? library(h2o) ---------------------------------------------------------------------- Your next step is to start H2O: > h2o.init() For H2O package documentation, ask for help: > ??h2o After starting H2O, you can use the Web UI at http://localhost:54321 For more information visit http://docs.h2o.ai ------------

How to get sparse matrices into H2O?

阅读更多关于 How to get sparse matrices into H2O?

问题 I am trying to get a sparse matrix into H2O and I was wondering whether that was possible. Suppose we have the following: test <- Matrix(c(1,0,0,1,1,1,1,0,1), nrow = 3, sparse = TRUE) and assuming my local H2O is localH2O , I can't seem to do the following: as.h2o(test) It gives the error: cannot coerce class "structure("dgCMatrix", package = "Matrix")" to a data.frame . That seems to be pretty logical, however assuming that test is so big that I can't transform it into a dataframe, how am I

H2o model performance metric and gains chart customization

阅读更多关于 H2o model performance metric and gains chart customization

问题 I see h2o model performance metric contains AUC, logloss etc. There is one model performance metric called lift_top_group, is it lift on top decile? Also can user specify the band for h2o to output gains chart such as top 5%, 5%-10%, 10%-15% ....... The function I can find is h2o.gainsLift 回答1: You can see all quantile groups in the output (Flow has a nice display). The top group is the top 1%, and lift_top_group refers to that. It can be used for early stopping. All other information from

Loading data bigger than the memory size in h2o

阅读更多关于 Loading data bigger than the memory size in h2o

问题 I am experimenting with loading data bigger than the memory size in h2o. H2o blog mentions: A note on Bigger Data and GC: We do a user-mode swap-to-disk when the Java heap gets too full, i.e., you’re using more Big Data than physical DRAM. We won’t die with a GC death-spiral, but we will degrade to out-of-core speeds. We’ll go as fast as the disk will allow. I’ve personally tested loading a 12Gb dataset into a 2Gb (32bit) JVM; it took about 5 minutes to load the data, and another 5 minutes to

Fastest way to read in 100,000 .dat.gz files

阅读更多关于 Fastest way to read in 100,000 .dat.gz files

I have a few hundred thousand very small .dat.gz files that I want to read into R in the most efficient way possible. I read in the file and then immediately aggregate and discard the data, so I am not worried about managing memory as I get near the end of the process. I just really want to speed up the bottleneck, which happens to be unzipping and reading in the data. Each dataset consists of 366 rows and 17 columns. Here is a reproducible example of what I am doing so far: Building reproducible data: require(data.table) # Make dir system("mkdir practice") # Function to create data create

Predict classes or class probabilities?

阅读更多关于 Predict classes or class probabilities?

I am currently using H2O for a classification problem dataset. I am testing it out with H2ORandomForestEstimator in a python 3.6 environment. I noticed the results of the predict method was giving values between 0 to 1(I am assuming this is the probability). In my data set, the target attribute is numeric i.e. True values are 1 and False values are 0. I made sure I converted the type to category for the target attribute, I was still getting the same result. Then I modified to the code to convert the target column to factor using asfactor() method on the H2OFrame still, there wasn't any change

Error with H2O in R - can't connect to local host

阅读更多关于 Error with H2O in R - can't connect to local host

问题 I can't get the h2o to work in my R. It shows the following error. Have no clue what it means. Previously it gave me an error because I didn't have Java 64 bit version. I downloaded the 64bit - restarted my pc - and started the process again and now it gives me this error. Any suggestions? library(h2o) ---------------------------------------------------------------------- Your next step is to start H2O: > h2o.init() For H2O package documentation, ask for help: > ??h2o After starting H2O, you

Print “pretty” tables for h2o models in R

阅读更多关于 Print “pretty” tables for h2o models in R

There are multiple packages for R which help to print "pretty" tables (LaTeX/HTML/TEXT) from statistical models output AND to easily compare the results of alternative model specifications. Some of these packages are apsrtable , xtable , memisc , texreg , outreg , and stargazer (for examples see here: https://www.r-statistics.com/2013/01/stargazer-package-for-beautiful-latex-tables-from-r-statistical-models-output/ ). Is there any comparable R package that does support the models of the h2o package? Here is an example of two simple GLM models with h2o which I like to print beside each other as

Predict classes or class probabilities?

阅读更多关于 Predict classes or class probabilities?

问题 I am currently using H2O for a classification problem dataset. I am testing it out with H2ORandomForestEstimator in a python 3.6 environment. I noticed the results of the predict method was giving values between 0 to 1(I am assuming this is the probability). In my data set, the target attribute is numeric i.e. True values are 1 and False values are 0. I made sure I converted the type to category for the target attribute, I was still getting the same result. Then I modified to the code to

Fastest way to read in 100,000 .dat.gz files

阅读更多关于 Fastest way to read in 100,000 .dat.gz files

问题 I have a few hundred thousand very small .dat.gz files that I want to read into R in the most efficient way possible. I read in the file and then immediately aggregate and discard the data, so I am not worried about managing memory as I get near the end of the process. I just really want to speed up the bottleneck, which happens to be unzipping and reading in the data. Each dataset consists of 366 rows and 17 columns. Here is a reproducible example of what I am doing so far: Building