问题
I know h2o's internal data model is column oriented (namely an H2OFrame is a collection of H2OVec). However, the library I'd like to use requires to iterate through the rows of an H2OFrame.
Is there a clean way to get an iterator on the rows or do I need to resort to indexing like
iris = h2o.import_file(path=".../iris_wheader.csv")
for i in xrange(iris.nrow):
foo( iris[i,:].as_data_frame(use_pandas=False)[1] )
I know it's going to be slow, I'm using h2o.h2o.export_file
when possible.
回答1:
You can do a row-wise apply.
iris.apply(foo,1)
Where foo
is some lambda that h2o understands (there are some limits on what can go in there, but all basic math ops should work fine).
Cliff
回答2:
In addition to what Cliff said (which is the faster way), you can also pull the entire data frame into the Python space and then iterate on it.
pd_frame = h2o_frame.as_data_frame(use_pandas=True)
If you don't want Pandas in the end:
np_array = h2o_frame.as_data_frame(use_pandas=True).as_matrix()
A little more about your library might help answer the question better.
来源:https://stackoverflow.com/questions/33876256/h2o-iterate-through-rows