How to convert my H2O prediction to a data.frame in a fast way

烂漫一生 提交于 2019-12-24 02:33:43

问题


I am using H2O, on a large dataset, 8 Million rows and 10 col. I trained my randomForest using h2o.randomForest. The model was trained fine and also prediction worked correctly. Now I would like to convert my predictions to a data.frame. I did this :

A2=h2o.predict(m1,Tr15_h2o)
pred2=as.data.frame(A2)

but it is too slow, takes forever. Is there any faster way to do the conversion from H2o to data.frame or data.table?


回答1:


Here is some code which demonstrates how to use the data.table package on the backend, along with some benchmarks on my macbook:

library(h2o)
h2o.init(nthreads = -1, max_mem_size = "16G")
hf <- h2o.createFrame(rows = 10000000)

options("h2o.use.data.table"=FALSE)  #no data.table
system.time(df <- as.data.frame(hf))
# user  system elapsed 
# 224.387  13.274 272.252

options("datatable.verbose"=TRUE)
options("h2o.use.data.table"=TRUE)  # use data.table
system.time(df2 <- as.data.frame(hf))
# user  system elapsed 
# 50.686   4.020  82.946

You can get more detailed info when using data.table if you turn on this option: options("datatable.verbose"=TRUE).




回答2:


We have seen this issue with large prediction datasets when exporting to prediction dataframe or converting them to other types takes long time. I have opened the following JIRA to track it now:

https://0xdata.atlassian.net/browse/PUBDEV-4166




回答3:


Yes there are some new options to turn on using data.table::fread to speed it up. Type h2o:::as.data.frame.H2OFrame to see the small amount of R source code containing the options, or H2O release notes. Please also try latest fread from dev which is now parallel as of yesterday.

Once users have reported success we can turn the default on by default.



来源:https://stackoverflow.com/questions/42865609/how-to-convert-my-h2o-prediction-to-a-data-frame-in-a-fast-way

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!