Is there efficient way to convert Pandas DataFrame to H2O Frame?

你。 提交于 2019-12-08 04:52:59

问题


I have a Pandas data frame and I need to convert it to H2O frame. I use the following code-

Code:

# Convert pandas dataframe to H2O frame
start_time = time.time()
input_data_matrix = h2o.H2OFrame(input_df)
logger.debug("3. Time taken to convert H2O Frame- " + str(time.time() - start_time))

Output:

2019-02-05 04:38:55,238 logger DEBUG 3. Time taken to convert H2O Frame- 9320.119945764542

The data frame (i.e. input_df) size 183K x 435 with no null or NaN values.

It is taking around 2 hours. Is there any better way to perform this operation?


回答1:


  1. Save the pandas data frame to a csv file. (Skip this step if you loaded it from a csv file in the first place, and haven't done any data munging on it, of course.)

  2. Put that csv file somewhere the h2o server can see it. (If you are running client and server on the same machine, this is already the case.)

  3. Use h2o.import_file() (in preference to h2o.upload_file() or h2o.H2OFrame())

The h2o.import_file() is the quickest way to get data into H2O, but the file must be visible by the server. When dealing with a remote cluster, this might mean uploading it to that servers file system, or putting it on a web server, or an HDFS cluster, or on AWS S3, etc, etc.

(The reason h2o.upload_file() is slower is that it will do an HTTP POST of the data, from client to server, and h2o.H2OFrame() is slower because it exports your pandas data to a temp csv file, then uses h2o.upload_file(), then deletes the temp file afterwards.)



来源:https://stackoverflow.com/questions/54541358/is-there-efficient-way-to-convert-pandas-dataframe-to-h2o-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!