h2o

Parallel processing in R with H2O

让人想犯罪 __ 提交于 2019-12-05 00:52:57
问题 I am setting up a piece of code to parallel processes some computations for N groups in my data using foreach . I have a computation that involves a call to h2o.gbm . In my current, sequential set-up, I use up to about 70% of my RAM. How do I correctly set-up my h2o.init() within the parallel piece of code? I am afraid that I might run out of RAM when I use multiple cores. My Windows 10 machine has 12 cores and 128GB of RAM. Would something like this pseudo-code work? library(foreach) library

H2O R api: retrieving optimal model from grid search

£可爱£侵袭症+ 提交于 2019-12-04 22:44:05
问题 I'm using the h2o package (v 3.6.0) in R, and I've built a grid search model. Now, I'm trying to access the model which minimizes MSE on the validation set. In python's sklearn , this is easily achievable when using RandomizedSearchCV : ## Pseudo code: grid = RandomizedSearchCV(model, params, n_iter = 5) grid.fit(X) best = grid.best_estimator_ This, unfortunately, does not prove as straightforward in h2o. Here's an example you can recreate: library(h2o) ## assume you got h2o initialized... X

ValueError: Invalid header value 'H2O Python client/2.7.9 (default, Apr 2 2015, 15:33:21) \\n[GCC 4.9.2]'

為{幸葍}努か 提交于 2019-12-04 21:28:51
I am trying to initialize H2O from Python. I am using python 2.7.9. I followed the steps below to get h2o python module: pip install requests pip install tabulate # Remove any preexisiting H2O module. pip uninstall h2o # Next, use pip to install this version of the H2O Python module. pip install http://h2o-release.s3.amazonaws.com/h2o-dev/master/1109/Python/h2o-0.3.0.1109-py2.py3-none-any.whl I get this error when I call h2o.init(). No instance found at ip and port: localhost:54321. Trying to start local jar... No jar file found. Could not start local instance. Traceback (most recent call last

Any difference between H2O and Scikit-Learn metrics scoring?

本小妞迷上赌 提交于 2019-12-04 19:21:05
I tried to use H2O to create some machine learning models for binary classification problem, and the test results are pretty good. But then I checked and found something weird. I tried to print the prediction of the model for the test set out of curiosity. And I found out that my model actually predicts 0 (negative) all the time, but the AUC is around 0.65, and precision is not 0.0. Then I tried to use Scikit-learn just to compare the metrics scores, and (as expected) they’re different. The Scikit learn yielded 0.0 precision and 0.5 AUC score, which I think is correct. Here's the code that I

How to load table from SQL server using H2o in R?

孤街醉人 提交于 2019-12-04 15:15:17
I try to load table into R using h2o but had the following error my_data <- h2o.import_sql_table(my_sql_conn, table, username, password) ERROR: Unexpected HTTP Status code: 500 Server Error (url = http://localhost:54321/99/ImportSQLTable ) java.lang.RuntimeException [1] "java.lang.RuntimeException: SQLException: No suitable driver found for jdbc:mysql://10.140.20.29/MySQL?&useSSL=false\nFailed to connect and read from SQL database with connection_url: jdbc:mysql://10.140.20.29/MySQL?&useSSL=false" Can someone help me with this? Thank you so much! You need a supported JDBC (Build on JDBC 42

Can I use autoencoder for clustering?

本秂侑毒 提交于 2019-12-04 14:01:24
问题 In the below code, they use autoencoder as supervised clustering or classification because they have data labels. http://amunategui.github.io/anomaly-detection-h2o/ But, can I use autoencoder to cluster data if I did not have its labels.? Regards 回答1: The deep-learning autoencoder is always unsupervised learning. The "supervised" part of the article you link to is to evaluate how well it did. The following example (taken from ch.7 of my book, Practical Machine Learning with H2O, where I try

what is the different between h2o.ensemble and h2o.stack in package h2oEnsemble

邮差的信 提交于 2019-12-04 07:41:55
Accoding to the Description of function: h2o.stack: This function creates a "Super Learner" (stacking) ensemble using a list of existing H2O base models specified by the user. h2o.ensemble: This function creates a "Super Learner" (stacking) ensemble using the H2O base learning algorithms specified by the user. They are two different ways to construct an ensemble. They have a different interface, but they produce the exact same type of object in the end. The h2o.stack() function takes as input a list of already trained (and cross-validated) H2O models, so all it needs to do is the metalearning

What do you need to watch out for when using cross-validation with GLM lambda search?

情到浓时终转凉″ 提交于 2019-12-04 06:59:01
问题 Regarding h2o.glm lambda search not appearing to iterate over all lambdas, I read the question as complaining that lambda was too high; they tried setting early_stopping=F in the hope that might fix that "bug". Isn't it the case that the original behaviour was a feature, not a bug? And if that is correct, then you should always use early_stopping=T when using cross-validation with GLM, otherwise the error estimate from cross-validation is useless; you also risk over-fitting. (My main question

h2o.saveModel throwing exception with directory on Windows 8

二次信任 提交于 2019-12-04 06:08:48
问题 I'm using h2o version 3.0.0.22 in R and I'm trying to save my model. But I can't seem to figure out what format is expected. I've tried all sorts of variations but getting all sorts of different exceptions. h2o.saveModel(model, dir="c:/temp", name= "my.model") ERROR: Unexpected HTTP Status code: 400 Bad Request (url = http://127.0.0.1:54321/3/Models.bin/DeepLearningModel__8412f3abf1699b5593a55c6861c8468d?dir=c%3A%2Ftemp%2Fmy.model&force=0) java.lang.IllegalArgumentException [1] "water.persist

Why connection is terminating

自古美人都是妖i 提交于 2019-12-04 06:06:17
I'm trying a random forest classification model by using H2O library inside R on a training set having 70 million rows and 25 numeric features.The total file size is 5.6 GB. The validation file's size is 1 GB. I have 16 GB RAM and 8 core CPU on my system. The system successfully able to read both of the files in H2O object. Then I'm giving below command to build the model: model <- h2o.randomForest(x = c(1:18,20:25), y = 19, training_frame = traindata, validation_frame = testdata, ntrees = 150, mtries = 6) But after few minutes (without generating any tree), I'm getting following error: "Error