h2o | 易学教程

How to get data into h2o fast

阅读更多关于 How to get data into h2o fast

问题 What my question isnt: Efficient way to maintain a h2o data frame H2O running slower than data.table R Loading data bigger than the memory size in h2o Hardware/Space: 32 Xeon threads w/ ~256 GB Ram ~65 GB of data to upload. (about 5.6 billion cells) Problem: It is taking hours to upload my data into h2o. This isn't any special processing, only "as.h2o(...)". It takes less than a minute using "fread" to get the text into the space and then I make a few row/col transformations (diff's, lags)

Transforming h2o model into non-h2o one

阅读更多关于 Transforming h2o model into non-h2o one

问题 I know that there is possibility to export/import h2o model, that was previously trained. My question is - is there a way to transform h2o model to a non-h2o one (that just works in plain R)? I mean that I don't want to launch the h2o environment (JVM) since I know that predicting on trained model is simply multiplying matrices, applying activation function etc. Of course it would be possible to extract weights manually etc., but I want to know if there is any better way to do it. I do not

How to generate and save POJO from H2O using Python

阅读更多关于 How to generate and save POJO from H2O using Python

问题 I have a model created in H2O using Python. I want to generate a POJO of that model, and save it. Say my model is called model_rf. I have tried: h2o.save_model(model_rf, path='./pojo_test', force=False) This create a directory called "pojo_test", which contains a whole bunch of binary files. I want a java file though, something like model_rf.java, that is the POJO itself. I tried: h2o.download_pojo(model_rf, path='./pojo_test_2', get_jar = True) Which gave the error message: IOError: [Errno 2

How do I specify the positive class in an H2O random forest or other binary classifier?

阅读更多关于 How do I specify the positive class in an H2O random forest or other binary classifier?

问题 I am building a binary classification model in H2O with Python. My 'y' values are 'ok' and 'bad'. I need the metrics to be computed with ok = negative class = 0 and bad = positive class = 1. However, I do not see any way to set this in H2O. For example here is the output of the predictions and confusion matrix: confusion matrix bad ok Error Rate bad 3859 631 0.1405 (631.0/4490.0) ok 477 1069 0.3085 (477.0/1546.0) Total 4336 1700 0.1836 (1108.0/6036.0) >>> predictions.head(10) predict bad ok 0

Reconstruction MSE calculation using h2o.anomaly function from H2O R package

阅读更多关于 Reconstruction MSE calculation using h2o.anomaly function from H2O R package

问题 I was trying to perform Autoencoder for anomaly detection. I used H2O R package to generate reconstruction MSE for a sample data using h2o.anomaly function. However, I have also tried to manually calculate it by myself according the the MSE formula from the documentation link below: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/performance-and-prediction.html#mse-mean-squared-error The training data consisting of three features and 5 rows that I used to build the model is as below: head(train

h2o: iterate through rows

阅读更多关于 h2o: iterate through rows

问题 I know h2o's internal data model is column oriented (namely an H2OFrame is a collection of H2OVec). However, the library I'd like to use requires to iterate through the rows of an H2OFrame. Is there a clean way to get an iterator on the rows or do I need to resort to indexing like iris = h2o.import_file(path=".../iris_wheader.csv") for i in xrange(iris.nrow): foo( iris[i,:].as_data_frame(use_pandas=False)[1] ) I know it's going to be slow, I'm using h2o.h2o.export_file when possible. 回答1: You

Is there efficient way to convert Pandas DataFrame to H2O Frame?

阅读更多关于 Is there efficient way to convert Pandas DataFrame to H2O Frame?

问题 I have a Pandas data frame and I need to convert it to H2O frame. I use the following code- Code: # Convert pandas dataframe to H2O frame start_time = time.time() input_data_matrix = h2o.H2OFrame(input_df) logger.debug("3. Time taken to convert H2O Frame- " + str(time.time() - start_time)) Output: 2019-02-05 04:38:55,238 logger DEBUG 3. Time taken to convert H2O Frame- 9320.119945764542 The data frame (i.e. input_df ) size 183K x 435 with no null or NaN values. It is taking around 2 hours. Is

What do the two numbers for accuracy, precision, F1, etc. mean?

阅读更多关于 What do the two numbers for accuracy, precision, F1, etc. mean?

问题 My Random Forest model code concludes with: print('\nModel performance:') performance = best_nn.model_performance(test_data = test) accuracy = performance.accuracy() precision = performance.precision() F1 = performance.F1() auc = performance.auc() print(' accuracy.................', accuracy) print(' precision................', precision) print(' F1.......................', F1) print(' auc......................', auc) and this code produces the following output: Model performance: accuracy...

Increase h2o.init timeout

阅读更多关于 Increase h2o.init timeout

How can I increase the h2o startup timeout when starting an h2o server via R? I have a multinode AWS EC2 cluster, where I start a separate h2o server on each node. After startup, some EC2 nodes can be a bit slow and I'd rather increase the timeout than to re-run the h2o initialization code on these nodes. What I am currently doing is along the lines of library(doParallel) library(foreach) workers=parallel::makePSOCKcluster(workerIPs,master=masterIP) registerDoParallel(workers) foreach(i=seq_along(workers),.inorder=FALSE,.multicombine=TRUE) %dopar% { library(h2o) h2o.init(nthreads=-1) paste0

Increase h2o.init timeout

阅读更多关于 Increase h2o.init timeout

问题 How can I increase the h2o startup timeout when starting an h2o server via R? I have a multinode AWS EC2 cluster, where I start a separate h2o server on each node. After startup, some EC2 nodes can be a bit slow and I'd rather increase the timeout than to re-run the h2o initialization code on these nodes. What I am currently doing is along the lines of library(doParallel) library(foreach) workers=parallel::makePSOCKcluster(workerIPs,master=masterIP) registerDoParallel(workers) foreach(i=seq