h2o

How to get data into h2o fast

こ雲淡風輕ζ 提交于 2019-12-09 16:47:12
问题 What my question isnt: Efficient way to maintain a h2o data frame H2O running slower than data.table R Loading data bigger than the memory size in h2o Hardware/Space: 32 Xeon threads w/ ~256 GB Ram ~65 GB of data to upload. (about 5.6 billion cells) Problem: It is taking hours to upload my data into h2o. This isn't any special processing, only "as.h2o(...)". It takes less than a minute using "fread" to get the text into the space and then I make a few row/col transformations (diff's, lags)

Transforming h2o model into non-h2o one

倖福魔咒の 提交于 2019-12-08 16:49:07
问题 I know that there is possibility to export/import h2o model, that was previously trained. My question is - is there a way to transform h2o model to a non-h2o one (that just works in plain R)? I mean that I don't want to launch the h2o environment (JVM) since I know that predicting on trained model is simply multiplying matrices, applying activation function etc. Of course it would be possible to extract weights manually etc., but I want to know if there is any better way to do it. I do not

How to generate and save POJO from H2O using Python

妖精的绣舞 提交于 2019-12-08 13:20:39
问题 I have a model created in H2O using Python. I want to generate a POJO of that model, and save it. Say my model is called model_rf. I have tried: h2o.save_model(model_rf, path='./pojo_test', force=False) This create a directory called "pojo_test", which contains a whole bunch of binary files. I want a java file though, something like model_rf.java, that is the POJO itself. I tried: h2o.download_pojo(model_rf, path='./pojo_test_2', get_jar = True) Which gave the error message: IOError: [Errno 2

How do I specify the positive class in an H2O random forest or other binary classifier?

心已入冬 提交于 2019-12-08 13:16:47
问题 I am building a binary classification model in H2O with Python. My 'y' values are 'ok' and 'bad'. I need the metrics to be computed with ok = negative class = 0 and bad = positive class = 1. However, I do not see any way to set this in H2O. For example here is the output of the predictions and confusion matrix: confusion matrix bad ok Error Rate bad 3859 631 0.1405 (631.0/4490.0) ok 477 1069 0.3085 (477.0/1546.0) Total 4336 1700 0.1836 (1108.0/6036.0) >>> predictions.head(10) predict bad ok 0

Reconstruction MSE calculation using h2o.anomaly function from H2O R package

若如初见. 提交于 2019-12-08 11:06:06
问题 I was trying to perform Autoencoder for anomaly detection. I used H2O R package to generate reconstruction MSE for a sample data using h2o.anomaly function. However, I have also tried to manually calculate it by myself according the the MSE formula from the documentation link below: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/performance-and-prediction.html#mse-mean-squared-error The training data consisting of three features and 5 rows that I used to build the model is as below: head(train

h2o: iterate through rows

喜你入骨 提交于 2019-12-08 06:14:53
问题 I know h2o's internal data model is column oriented (namely an H2OFrame is a collection of H2OVec). However, the library I'd like to use requires to iterate through the rows of an H2OFrame. Is there a clean way to get an iterator on the rows or do I need to resort to indexing like iris = h2o.import_file(path=".../iris_wheader.csv") for i in xrange(iris.nrow): foo( iris[i,:].as_data_frame(use_pandas=False)[1] ) I know it's going to be slow, I'm using h2o.h2o.export_file when possible. 回答1: You

Is there efficient way to convert Pandas DataFrame to H2O Frame?

你。 提交于 2019-12-08 04:52:59
问题 I have a Pandas data frame and I need to convert it to H2O frame. I use the following code- Code: # Convert pandas dataframe to H2O frame start_time = time.time() input_data_matrix = h2o.H2OFrame(input_df) logger.debug("3. Time taken to convert H2O Frame- " + str(time.time() - start_time)) Output: 2019-02-05 04:38:55,238 logger DEBUG 3. Time taken to convert H2O Frame- 9320.119945764542 The data frame (i.e. input_df ) size 183K x 435 with no null or NaN values. It is taking around 2 hours. Is

What do the two numbers for accuracy, precision, F1, etc. mean?

自作多情 提交于 2019-12-08 04:39:15
问题 My Random Forest model code concludes with: print('\nModel performance:') performance = best_nn.model_performance(test_data = test) accuracy = performance.accuracy() precision = performance.precision() F1 = performance.F1() auc = performance.auc() print(' accuracy.................', accuracy) print(' precision................', precision) print(' F1.......................', F1) print(' auc......................', auc) and this code produces the following output: Model performance: accuracy...

Increase h2o.init timeout

给你一囗甜甜゛ 提交于 2019-12-08 04:13:22
How can I increase the h2o startup timeout when starting an h2o server via R? I have a multinode AWS EC2 cluster, where I start a separate h2o server on each node. After startup, some EC2 nodes can be a bit slow and I'd rather increase the timeout than to re-run the h2o initialization code on these nodes. What I am currently doing is along the lines of library(doParallel) library(foreach) workers=parallel::makePSOCKcluster(workerIPs,master=masterIP) registerDoParallel(workers) foreach(i=seq_along(workers),.inorder=FALSE,.multicombine=TRUE) %dopar% { library(h2o) h2o.init(nthreads=-1) paste0

Increase h2o.init timeout

谁都会走 提交于 2019-12-08 03:26:10
问题 How can I increase the h2o startup timeout when starting an h2o server via R? I have a multinode AWS EC2 cluster, where I start a separate h2o server on each node. After startup, some EC2 nodes can be a bit slow and I'd rather increase the timeout than to re-run the h2o initialization code on these nodes. What I am currently doing is along the lines of library(doParallel) library(foreach) workers=parallel::makePSOCKcluster(workerIPs,master=masterIP) registerDoParallel(workers) foreach(i=seq