How to convert my H2O prediction to a data.frame in a fast way

问题

I am using H2O, on a large dataset, 8 Million rows and 10 col. I trained my randomForest using h2o.randomForest. The model was trained fine and also prediction worked correctly. Now I would like to convert my predictions to a data.frame. I did this :

A2=h2o.predict(m1,Tr15_h2o)
pred2=as.data.frame(A2)

but it is too slow, takes forever. Is there any faster way to do the conversion from H2o to data.frame or data.table?

回答1:

Here is some code which demonstrates how to use the data.table package on the backend, along with some benchmarks on my macbook:

library(h2o)
h2o.init(nthreads = -1, max_mem_size = "16G")
hf <- h2o.createFrame(rows = 10000000)

options("h2o.use.data.table"=FALSE)  #no data.table
system.time(df <- as.data.frame(hf))
# user  system elapsed 
# 224.387  13.274 272.252

options("datatable.verbose"=TRUE)
options("h2o.use.data.table"=TRUE)  # use data.table
system.time(df2 <- as.data.frame(hf))
# user  system elapsed 
# 50.686   4.020  82.946

You can get more detailed info when using data.table if you turn on this option: options("datatable.verbose"=TRUE).

回答2:

We have seen this issue with large prediction datasets when exporting to prediction dataframe or converting them to other types takes long time. I have opened the following JIRA to track it now:

https://0xdata.atlassian.net/browse/PUBDEV-4166

回答3:

Yes there are some new options to turn on using data.table::fread to speed it up. Type h2o:::as.data.frame.H2OFrame to see the small amount of R source code containing the options, or H2O release notes. Please also try latest fread from dev which is now parallel as of yesterday.

Once users have reported success we can turn the default on by default.

来源：https://stackoverflow.com/questions/42865609/how-to-convert-my-h2o-prediction-to-a-data-frame-in-a-fast-way

标签

performance

dataframe

h2o