Saving H2o data frame

南笙酒味 提交于 2019-12-10 15:24:56

问题


I am working with 10GB training data frame. I use H2o library for faster computation. Each time I load the dataset, I should convert the data frame into H2o object which is taking so much time. Is there a way to store the converted H2o object ? (so that i can skip the as.H2o(trainingset) step each time I make trails on building models )


回答1:


After the first transformation with as.h2o(trainingset) you can export / save the file to disk and later import it again.

my_h2o_training_file <- as.h2o(trainingset)
path <- "whatever/my/path/is"
h2o.exportFile(my_h2o_training_file , path = path)

And when you want to load it use either h2o.importFile or h2o.importFolder. See the function help for correct usage.

Or save the file as csv / txt before you transform it with as.h2o and load it directly into h2o with one of the above functions.




回答2:


as.h2o(d) works like this (even when client and server are the same machine):

  1. In R, export d to a csv file in a temp location
  2. Call h2o.uploadFile() which does an HTTP POST to the server, then a single-threaded import.
  3. Returns the handle from that import
  4. Deletes the temp csv file it made.

Instead, prepare your data in advance somewhere(*), then use h2o.importFile() (See http://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.importFile.html). This saves messing around with the local file, and it can also do a parallelized read and import.

*: For speediest results, the "somewhere" should be as close to the server as possible. For it to work at all, the "somewhere" has to be somewhere the server can see. If client and server are the same machine, then that is automatic. At the other extreme, if your server is a cluster of machines in an AWS data centre on another continent, then putting the data into S3 works well. You can also put it on HDFS, or on a web server.

See http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/importing-data.html for some examples in both R and Python.



来源:https://stackoverflow.com/questions/54417507/saving-h2o-data-frame

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!