h2o: iterate through rows

问题

I know h2o's internal data model is column oriented (namely an H2OFrame is a collection of H2OVec). However, the library I'd like to use requires to iterate through the rows of an H2OFrame.

Is there a clean way to get an iterator on the rows or do I need to resort to indexing like

iris = h2o.import_file(path=".../iris_wheader.csv")
for i in xrange(iris.nrow):
    foo( iris[i,:].as_data_frame(use_pandas=False)[1] )

I know it's going to be slow, I'm using h2o.h2o.export_file when possible.

回答1:

You can do a row-wise apply. iris.apply(foo,1)

Where foo is some lambda that h2o understands (there are some limits on what can go in there, but all basic math ops should work fine).

Cliff

回答2:

In addition to what Cliff said (which is the faster way), you can also pull the entire data frame into the Python space and then iterate on it.

pd_frame = h2o_frame.as_data_frame(use_pandas=True)

If you don't want Pandas in the end:

np_array = h2o_frame.as_data_frame(use_pandas=True).as_matrix()

A little more about your library might help answer the question better.

来源：https://stackoverflow.com/questions/33876256/h2o-iterate-through-rows

标签

python

h2o

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!